Re: Stress (?) fault in straps

Michael I Schwartz (mschwart@du.edu)
Sun, 15 Jun 1997 13:28:22 -0600

I'm offering a suggestion here based on behavior started with or before
Solaris 2.3 for the syslog daemon itself.

Beginning under Solaris, sometime before 2.3, the BSD nature of the syslog
daemon stopped. That is, the bsd syslog daemon never loses messages; the
solaris syslog daemon does lose them under heavy load. For example, a burst
of 10000 distinct syslog messages--say, 1000 each from 10 processes over an
8 second period--will cause about 300 to be logged under the Solaris syslog
daemon, and thus 9000+ to be lost. (A burst of 100 messages doesn't lose
any, BTW).

Sun has a reason for this behavior, which we discovered after porting the
BSD syslog daemon to solaris (an easy job, by the way). In versions 2.3 and
2.4 of Solaris, the BSD syslog daemon exploits a flaw in the streams system
that can panic the kernel in the presence of multiple CPUs. Under 2.5.1 the
behavior is changed somewhat--instead of panicking the kernel, it seems to
"freeze" on a CPU, which sounds similar to your symptom.

If straps is built over the BSD syslog mechanism, you may be exploiting the
same flaw in the streams system.

Michael

At 09:38 AM 6/15/97 -0500, you wrote:
> From owner-tkined@ibr.cs.tu-bs.de Fri Apr 25 12:58:52 1997
> Date: Fri, 25 Apr 1997 11:08:11 -0500 (CDT)
> From: Cameron Laird <claird@Starbase.NeoSoft.COM>
> .
> .
> .
> I've got a mildly reproducible fault in straps I'm probably
> going to work to isolate next week. This is the most clear-
> cut manifestation: Scotty2.1.5 under Solaris 2.5: sometimes
> after heavy trap loads, straps seems to "seize". I mean by
> that a couple of different symptoms: a new snmp session
> requested to bind a trap handler will never return; and ex-
> isting bindings superficially are OK, and seem to be in
> communication with straps, but in fact they never receive
> packets.
>
> I think I can make it happen under other OSs. I'm sure it's
> real; I've been observing it on and off for months. Ideal
> would be for some reader to report that a diagnosis and fix
> have already been identified. I haven't noticed any mention
> of this since the release of 2.1.5, though, so I assume I'll
> have to do it myself. The purpose of this notice in that
> case would be just to alert others to what's coming.
> .
> .
> .
>I haven't solved it yet. I'm closer, though.
>
>Here's the latest. Under Solaris, but probably not BSD,
>HP-UX, SunOS, ..., I can start up a simple Scotty session,
>and begin receiving TRAPs. I direct a burst of a thousand
>TRAPs, over a short interval, at my Solaris host (a fairly
>well-endowed SPARCstation-20). The straps process becomes
>very quiet--it uses little CPU, and doesn't seem to be doing
>any I/O. When I start a new Scotty session, and attempt to
>bind a trap-handler (that is, evaluate Tnm_SnmpTrapOpen()),
>the connect call hangs. It never (well, not for thirty min-
>utes) returns. So, one not-specific-to-Scotty question is
>this: if I have an AF_INET SOCK_STREAM socket that appears
>to be in good shape, is it even permissible for a connect()
>on it not to return? (Juergen, why the funny sockaddr_un
>manipulations, when you could write the routine in terms of
>sockaddr? Is this an attempt to streamline OS-dependencies?
>I recommend that you comment such points as this. I realize
>that all this may go away with the next release, but the
>general principle holds.)
>
>I'd sure appreciate help from those more socket-oriented
>than I. This has been plaguing me for months, and I sure
>want to get to the bottom of it. The source in question, by
>the way, appears in tnm/snmp/tnmSnmpNet.c.
>
>Next up for me: I'm going inside straps more deeply to see if
>I can figure out what it's doing.
>
>Cameron Laird http://starbase.neosoft.com/~bodi/nesi.html
>Network Engineered Solutions +1 713 763 8366
>claird@NeoSoft.com +1 281 996 8546 FAX
>Houston WWW Business Guide: http://starbase.neosoft.com/~bodi/HouGuide.html
>--
>!! This message is brought to you via the `tkined & scotty' mailing list.
>!! Please do not reply to this message to unsubscribe. To subscribe or
>!! unsubscribe, send a mail message to <tkined-request@ibr.cs.tu-bs.de>.
>!! See http://wwwsnmp.cs.utwente.nl/~schoenw/scotty/ for more information.
>
>
Michael I. Schwartz "Be very quiet...for it goes
mschwart@du.edu without saying"
The Phantom Tollbooth

--
!! This message is brought to you via the `tkined & scotty' mailing list.
!! Please do not reply to this message to unsubscribe. To subscribe or
!! unsubscribe, send a mail message to <tkined-request@ibr.cs.tu-bs.de>.
!! See http://wwwsnmp.cs.utwente.nl/~schoenw/scotty/ for more information.