Re: Stress (?) fault in straps

Cameron Laird (claird@Starbase.NeoSoft.COM)
Sun, 15 Jun 1997 09:38:52 -0500 (CDT)

From owner-tkined@ibr.cs.tu-bs.de Fri Apr 25 12:58:52 1997
Date: Fri, 25 Apr 1997 11:08:11 -0500 (CDT)
From: Cameron Laird <claird@Starbase.NeoSoft.COM>
.
.
.
I've got a mildly reproducible fault in straps I'm probably
going to work to isolate next week. This is the most clear-
cut manifestation: Scotty2.1.5 under Solaris 2.5: sometimes
after heavy trap loads, straps seems to "seize". I mean by
that a couple of different symptoms: a new snmp session
requested to bind a trap handler will never return; and ex-
isting bindings superficially are OK, and seem to be in
communication with straps, but in fact they never receive
packets.

I think I can make it happen under other OSs. I'm sure it's
real; I've been observing it on and off for months. Ideal
would be for some reader to report that a diagnosis and fix
have already been identified. I haven't noticed any mention
of this since the release of 2.1.5, though, so I assume I'll
have to do it myself. The purpose of this notice in that
case would be just to alert others to what's coming.
.
.
.
I haven't solved it yet. I'm closer, though.

Here's the latest. Under Solaris, but probably not BSD,
HP-UX, SunOS, ..., I can start up a simple Scotty session,
and begin receiving TRAPs. I direct a burst of a thousand
TRAPs, over a short interval, at my Solaris host (a fairly
well-endowed SPARCstation-20). The straps process becomes
very quiet--it uses little CPU, and doesn't seem to be doing
any I/O. When I start a new Scotty session, and attempt to
bind a trap-handler (that is, evaluate Tnm_SnmpTrapOpen()),
the connect call hangs. It never (well, not for thirty min-
utes) returns. So, one not-specific-to-Scotty question is
this: if I have an AF_INET SOCK_STREAM socket that appears
to be in good shape, is it even permissible for a connect()
on it not to return? (Juergen, why the funny sockaddr_un
manipulations, when you could write the routine in terms of
sockaddr? Is this an attempt to streamline OS-dependencies?
I recommend that you comment such points as this. I realize
that all this may go away with the next release, but the
general principle holds.)

I'd sure appreciate help from those more socket-oriented
than I. This has been plaguing me for months, and I sure
want to get to the bottom of it. The source in question, by
the way, appears in tnm/snmp/tnmSnmpNet.c.

Next up for me: I'm going inside straps more deeply to see if
I can figure out what it's doing.

Cameron Laird http://starbase.neosoft.com/~bodi/nesi.html
Network Engineered Solutions +1 713 763 8366
claird@NeoSoft.com +1 281 996 8546 FAX
Houston WWW Business Guide: http://starbase.neosoft.com/~bodi/HouGuide.html

--
!! This message is brought to you via the `tkined & scotty' mailing list.
!! Please do not reply to this message to unsubscribe. To subscribe or
!! unsubscribe, send a mail message to <tkined-request@ibr.cs.tu-bs.de>.
!! See http://wwwsnmp.cs.utwente.nl/~schoenw/scotty/ for more information.