Re: System management

Gerd Aschemann (ascheman@informatik.th-darmstadt.de)
Thu, 29 Aug 1996 12:00:42 +0200 (MET DST)

(I'm answering to the original posting, but do know J=FCrgen's answer
-- it's only to refer to some original questions, not quoted in
J=FCrgen's posting.)

Dan Razzell wrote:
>
>First of all, congratulations are in order for Juergen and various
>contributors for writing great code. It's really a joy to work with
>software that shows such care and craftsmanship. I've been having a
>lot of fun just walking through the code and trying out different ways
>of using it and extending it for what I hope will be generally interesti=
ng
>ends.
>

I agree very much with this point, but also have some critics for=20
scotty/tkined which will be pointed out later -- please do not
misunderstand this (Systems won't improve without critical
investigation from time to time).

I have been looking for tkined in the last time for two reasons:
1) I want to improve systems management here (which can be assisted
very well by scotty/tkined).
2) I want to do my PhD thesis in this field. I prefer the term
"management of distributed systems" since it covers not only network
and systems management but also distributed application management,
enterprise management etc. though these terms are mostly not well
defined and subject to change due to current research work. But even
in the field of "simple" systems management a lot of work needs to be
done.=20

My main interest is bound to the second point, so maybe we have a
different view to some points but hopefully come together. My ideas
in this area are not worked out enough for a "coming out" but some
points may be sufficient for this discussion. I mostly think about
configuration management and have my experience and daily work as an
administrator of a unix network as background. This could be a good
environment to prove my ideas but they should not be limited to this
scenario due to fulfill scientific requirements. I want to specify a
distributed system and it's services. For management of such a system
also specification of constraints or policies will be necessary.
Policies will be applied not only to single hosts but especially to
domains. Domains could be seen as "set of hosts", but additionaly
cover other aspects.=20

Well, in this sense configuration management does not necessarily
need a graphical user interface or a runtime environment for real
time application of configurations. But configuration is not done
once in the implementation time of a well planned network. In fact it
is part of the daily work of an administrator. He should be able to
describe (specify) and change the "interesting" portions of the
system in real time. Also specifications should be (partially)
derived from knowledge within other repositories, such as inventories,
telephone lists, etc. Some examples could make these points clearer:

1) For each of the unix boxes, network printers and X terminals one
or more persons are responsible. This can be set into the
"internet.mgmt.mib-2.system.sysContact" snmp variable of each system.
It can be maintained via snmp or the snmp configuration file of the
agent. But where should it be maintained and how is it kept
consistent? If I also specify the room number and telephone number of
the person, it must be consistently changed, if the person changes his
office.=20

2) A diskless client requires a "boot server" and a NFS server for
it's root and swap partition. In most sites a single system is
configured for these services (having a large disk, exporting parts
of it to the client with root access, running rarpd, running
bootparamd, running tftpd, using a lot of files or NIS maps:
/etc/hosts, /etc/bootparams, /etc/ethers, /etc/{exports,dfs/dfstab},
...). Some of these services could be distributed or centralized or
centraly managed (with NIS or other services). Only rarp requires the
reachability of a server by ethernet broadcast.
If you want to add a new client or maybe migrate one to another
server it's very much work. Sure, for standard cases the vendor
provides some installation procedures. But, eg. for SunOS these
procedures do not work together with NIS and maintain local files
etc. Also they do not confirm to your concept of a "distributed" boot
server. Migration of a diskless client must be done manually, there
is no shell script from the UNIX vendor for this task. So there is a
lot of work to be done manually and a lot of mistakes can be made.

3) If you have a lot of file servers, you do not want to maintain the
NFS export-lists manually. You want to generate them from a database and
distribute them. This interferes with maintenance of diskless
clients in point 2.

What is the relationship of all this to tkined? Well, I would dream
of selecting a server in a map, popup it's services menu, select one
of them, and execute the "migrate to another server" action. Even
with aggregated services as the above mentioned boot service (rarpd,
bootparamd, tftpd). Of course this would require writing a
scotty/tkined application. But a lot of the work within this
application could be assisted by the management system. It could
provide a management database and help maintain it.

The current implementation of tkined/scotty is not sufficient for this
task. I see two main problems:

1) Management information is kept in the map in form of attributes of
the objects. Since the map is stored as a plain file it's hard to
maintain these information or to use it for other purposes or to fetch
them from other repositories such as databases or simple files. Eg.
the ip address of an object needs not be stored in the map. I will
find it in /etc/hosts, NIS or DNS. These repositories will be
maintaind by the system administrator, tkined should look there.

2) Scripts are bound to the map and "running" tkineds. Eg. currently
every administrator here must run the tkined with the reachability
test. If one device goes down and tkined starts blinking, some people
start checking the problem. This should be synchronized by a
management system. Also if we find a hard error on the device, it must
be deleted from the list for reachability checking by stopping the
script, selecting all devices except the broken one and restarting the
script. Additionaly it has to be saved and loaded again by the other
administrators or this work has to be done by every operator himself.
When the device comes up again, the procedure must be done reverse.

A solution to these problems should be done by a "real client server
application" as mentioned by J=FCrgen. A solution should (in the long
term) offer additional features like distributed management by
geographical (subdomains) or functional (database, monitoring,
transactions, ...) decomposition. Fault tolerance such as replicated
servers should also become possible. This seems to be a large task
and up to now it was to large for me, since I additionaly want to
concentrate on some aspects such as the repository problem in order
to finish my thesis in the next three years ;-)

>From my point of view an implementation of such a work should be
based on a middleware or platform like DCE or CORBA (maybe ONC+, but
it's more or less "proprietary" and not widely available), since they
provide standardized and easy to use communication facilities,
security services (currently only DCE), object orientation (CORBA,
but DCE extensions are also available) and other services needed in a
distributed system (DCE: time services, etc.). I would prefer CORBA
for different reasons (currently it has no security services but this
will come up in the next time or we could set up our own security
services):=20

a) It's object oriented and offers implementation in different
languages. It should not be hard to include Tcl, in doubt via the C
interfaces of both CORBA and Tcl. Is there a language binding for
Tcl? In preparation?=20

b) CORBA or mostly CORBA conformant implementations are available for
free (ILU, Electra) or even free for academic use
(Visigenic/PostModernComputing) or cheap (I heard of Iona's
implementation to be cheap).=20

c) CORBA is considered for a future management architecture, see the
Joint Inter Domain Management (JIDM) work by Xopen and NMF.

d) CORBA is coming up as platform for agents, especially since there
are JAVA bindings.

>I would like to work on this project in a way that ends up being of bene=
fit
>to everyone. So I have a couple of questions for the developers on this
>list. First though, it would probably help to provide a bit of backgrou=
nd.
>Basically what I'm developing is a framework for managing Unix hosts
>or more generally, machines with similar capabilities. This consists
>of:

You say "I'm developing". How far is your work? Are you still
thinking, like me? Or do you have something implemented? How much
work would it be to change this implementations for a CORBA (or other
platform if you agree with my thoughts)? The same question to J=FCrgen
and others? (J=FCrgen said in his answer that he has already
implemented some client server stuff.)

>1) A simple, extensible, interpreter on each target host which knows how=
to
> perform various generic tests and operations (for example, check if a=
ny
> users are logged in, attempt to remount filesystems, and so on).
> I'd already settled on Tcl for this, because it fits well with the
> existing command languages used by most systems. It's so great that
> the socket stuff was finally moved into Tcl!

I would prefer RPC like communications as provided by CORBA, not
socket stuff.

>2) A user interface running at some management station, which knows how
> to (a) form sets of hosts based on various criteria, (b) transmit
> generic commands to such hosts and collect responses, and (c) manage
> errors, timeouts, and other variability in how the hosts respond.
> Scotty is a wonderful fit for the purpose, and I was delighted to
> find that Tkined has most of the necessary user interface already
> in place in a very clean form.

I agree with that. But Tkined would be "reduced" to a graphical front
end and needs a lot of changes to handle the client server API.

>3) A secure protocol connecting the management station and managed hosts=
,
> since the commands being sent out are extremely powerful and the data
> itself often needs to be secure.

This would be very necessary! As long as CORBA does not offer such
services, we could use other mechanisms like Kerberos.

>With that preamble I can now ask you folks about the following questions=
:
>
>1) So far I've convinced myself that in order to represent the various
> tools and other entities that are necessary for this kind of manageme=
nt,
> the Tkined user interface has to be extended to allow Interpreters as=
=20
> graphical objects. It was either that or introduce an entirely new i=
ned
> type that would be a graphical object capable of performing arbitrary
> computations, and it seemed to me that the Interpreter type already h=
ad
> everything but a graphical appearance.

I didn't think much about that and didn't look very much into that
part of Tkined.=20

>2) The netdb stuff in Scotty is useful but not a complete interface to
> the NIS database. For system administration, it can be necessary for
> some sites to use various standard maps such as netgroup (getnetgrent=
),
> group (getgrent), passwd (getpwent), shadow (getspent) for which a
> programmatic interface exists. Some sites may maintain additional NI=
S
> maps for which the only access may be commands such as ypcat and ypma=
tch.
>
> Would it be stylistically appropriate to extend the netdb code to inc=
lude
> these maps? Would it be better, or additionally useful, to supply
> an interface to the yp commands?

I think, there should be a general repository interface not only to
NIS but covering a wide range of sources from files and DNS to
relational or object oriented databases.

>3) Any preferences or comments concerning encryption?

See above.

>4) Finally, as general guidance, how much do network administrators care
> about system maintenance? If Tkined was capable of both, would you
> rather try to keep them functionally separate or unified? Right now,
> I have no idea what this would concretely mean, but I'm sure it will
> have an influence on how I approach the design.

I am mostly a system adminstrator, but as mentioned above are also
interested in other "layers", such as networks, distributed
applications etc.=20

Some background to our situation here: Since 1988 I built up a part
of the CS Departement UNIX network. Starting with one machine it now
has about 60 unix boxes and a lot of PCs, peripherals, X terminals.
Being alone in the first years I now share my work with some other
staff members and research assistants, but feel most responsible for
the whole system. It is distributed over three buildings, network
management (routers etc.) is done by another group (from the
university's "central computing center") working together with us.
Also there is a strong relation to other parts of the CS department
like the group responsible for student computing services (~ 100 unix
systems).=20

--=20
Gerd Aschemann --- aschemann@Informatik.TH-Darmstadt.de
Ver=F6ffentlichen hei=DFt Ver=E4ndern (Carmen Thomas)