Re: System management

Dan Razzell (razzell@cs.ubc.ca)
Thu, 29 Aug 1996 14:14:44 UTC-0700

[References to comments made by Juergen in his reply.]

Interesting ideas, though probably an order of magnitude more ambitious
than what I'm attempting. I absolutely have to concentrate on pragmatism
over formalism here, as my workload will get heavier rather than lighter as
long as I have the additional distraction of developing system management
tools.

I find it easy to agree with Juergen on the value of small systems,
incremental development, and the converse dangers of excess toward
grand solutions. As a general observation, I'd say that although there may
be good scientific reasons driving the search for a Grand Unified Theory
of a given physical principle, it doesn't follow that in Computer Science
we must develop a Grand Unified Theory over every computational artifact
we've ever constructed, much less attempt to build an engine that embodies
that theory. There must be some value to having modest ambitions, and
then reaching them.

That said, let's move on to practical things.

> You say "I'm developing". How far is your work? Are you still thinking,
> like me? Or do you have something implemented?

I had to fight fairly hard to get a month off to work on this exclusively,
the month being August 1996. Here's what I've done, just to give a sense
of realism to the project:

1) The first week was mostly spent getting people to actually leave me
alone as they had committed to do. Sound familiar?

2) The second week was spent catching up with developments in Tcl over the
year since I last had a chance to look at doing this project, getting
the latest versions of Tcl, Tk, and Scotty/Tkined compiled and installed
in a form that would be easy to customize, looking at the Scotty/Tkined
code and starting to play with it so as to understand the flow of control.
It is very nice to have sockets and events in Tcl now so these don't have
to be added by hand as I've done in the past. And as I mentioned earlier,
Scotty/Tkined provides a suitable engine and many of the basic pieces
for distributed system management already.

3) In the third week I added the features I reported earlier, primarily to
prove to myself that Tkined would extend without too much pain if that
should ever prove necessary, and also of course to add some behaviors
I thought necessary for further development.

4) In the fourth week, I've got a working prototype of my system management
tool. All the necessary elements are in place, although mostly in a
crude form that simply shows proof of concept. However, I believe this
is critical to motivating further support for this project. The fancier
stuff can be added incrementally since we can now demonstrate that the
basic framework is adequate. At this point, I could do dangerous things
by remote control, a very satisfying development as well as a boost to
the primal male ego. But at this point, so could anyone else who
happened to know of that particular TCP port.

5) Yesterday I put in a first attempt at an encryption model, along with
the choice of several simple algorithms including a private key DES.
I spent several hours debugging what turned out to be a trivial problem.
On the server end, I was encrypting the Tcl server channel, not the
connection channels, so of course nothing was getting through properly.
Just a typical development day, as we all know. Anyway it now works.

6) Today I plan to redo the encryption model in a way that ought to be less
invasive of the Tcl channel code. I'll know more when it's done. The
real test of the model will come with the inclusion of public key
algorithms, since the key exchange has to fit cleanly into the way
the socket connections are set up between server and client.

There is a lot more work to do, and my month is almost at an end. But as
it stands, I have software that, in a limited way, I can actually use to
perform arbitrary system management on arbitrary hosts. It performs much
faster and with less intervention than our existing methods, so with a
little more refinement and some cleanup before I forget what I've done,
I expect to use it for daily system management. I'm also quite
encouraged to pursue the project further.

> I would prefer RPC like communications as provided by CORBA, not socket
> stuff.

Think carefully about the tradeoffs here. The higher the layer of network
services that you depend on, the lower the chances that you will be able
to manage a host that has developed network service problems. The same
reasoning explains why SNMP uses a connectionless transport [1].

Conversely, the requirements of distributed system management are quite
open ended, and call for a connection oriented protocol. One could do
it over message oriented protocols, but as Juergen [2] points out, only
by effectively reimplementing a connection oriented layer in the
application. For the project at hand, I find TCP sockets to be sufficient,
widely supported, and since they became available, easy to use in Tcl.

> Management information is kept in the map in form of attributes of
> the objects. Since the map is stored as a plain file it's hard to
> maintain these information or to use it for other purposes or to fetch
> them from other repositories such as databases or simple files.
> ...
> I think, there should be a general repository interface not only to
> NIS but covering a wide range of sources from files and DNS to
> relational or object oriented databases.

In this project, there has been no assumption as to where the information
is to be held, but in fact it would make little sense to maintain its
definitive value inside one specific management tool, especially when in
many cases it has been generated elsewhere.

One certainty is that for system management to work, sites have to be
modelled somewhere other than simply in the existential properties of their
components. In practical terms, the model is usually not found all in one
form, and parts of it are often represented redundantly. So a good system
management tool is not only able to look for differences between the model
and the real world, but also to detect and perhaps to repair internal
inconsistencies in various parts of the model.

Ideally, the model would be presented and checked in some unified way.
For the moment though, it is sufficient for a management tool just to
be able to get at the various pieces and check them on an ad hoc basis.
That would already be an improvement over the status quo, which at most
sites is checking for consistency by hand.

This illustrates the point at which we began. There has to be a immediate
benefit from this project if it is to prove itself worth pursuing. That's
why I've concentrated on building a rough prototype rather than a
perfect abstraction and no useful code. It's also necessary that the
thing be capable of refinement and extension as far as we want to take it.
I think I've convinced myself of that over the past few weeks, but time
will surely tell.

Anyway, I hope these comments have been useful, or if not useful, at least
entertaining. Send my kind regards to Wolfgang Bibel if you happen to see
him. I believe he's still at Darmstadt?

References:

[1] Marshall T. Rose,
"The Simple Book"
ISBN 0-13-451659-1

[2] J. Schönwälder, H. Langendörfer,
Tcl Extensions for Network Management Applications
<ftp://ftp.ibr.cs.tu-bs.de/pub/local/papers/tcltk-95.ps.gz>