Re: scotty agent: processing getbulk PDUs

Juergen Schoenwaelder (schoenw@ibr.cs.tu-bs.de)
Fri, 9 Feb 1996 19:39:47 +0100

Hi!

Robert Premuz <rpremuz@srce.hr> said:

Robert> My understanding of the C code in the snmp/snmpAgent.c file in the
Robert> distribution, and the GetRequest function especially, tells me that
Robert> the getbulk PDUs are processed in the same way as the getnext PDUs.
Robert> It means that the agent does not take into account the number of
Robert> non-repeater variables and the max. number of repetitions for
Robert> repeating variables.

Right. This is still missing. Would you like to volunteer and write
the missing code? :-)

Robert> I have one more wish regarding this issue. It's about how the
Robert> snmp# walk command is implemented.

Robert> As I can see in the snmp/snmpTcl.c file in the distribution, the
Robert> SNMPWalk function implements the command and the following piece of
Robert> code explains the secret:

Robert> pdu->type = SNMPv2_GETBULK;
Robert> pdu->request_id = ++session->reqid;

Robert> /*
Robert> * Set the non-repeaters and the max-repetitions for the getbulk
Robert> * operation. I do not know if 16 is a good constant. Perhaps we
Robert> * should start with a small value and increase in every loop?
Robert> */

Robert> pdu->error_status = 0;
Robert> pdu->error_index = (16 / oidc > 0) ? 16 / oidc : 1;

Robert> So, if my understanding is right, non-repeaters is set to 0, while
Robert> max-repetitions is set to (16 / oidc > 0) ? 16 / oidc : 1 , where
Robert> oidc is the number of OIDs in varbind list given to the command to
Robert> walk through. In that way, the max. number of varbinds returned by
Robert> the agent in its response to the getnext request is limited to 16.

Yes, thats right.

Robert> Now, I suggest some changes in the function:

Robert> 1) Define the magic constant 16 by a #define statement in some .h
Robert> file. I wonder why such a small value is used. Was the intention to
Robert> reduce the size of the returned packet?

This would be easy.

Robert> 2) It would be nice if this magic constant could be set to some other
Robert> value by an option, e.g. -maxoids n, when invoking the command. If
Robert> the option is not used, then the default value of 16 can be used.

I don't think that this options really does make much sense at the Tcl
level. I really don't want to bother how the walk is done internally.
And setting parameters is not an easy thing to get done right (see below).

Robert> 3) The above comment says: "Perhaps we should start with a small
Robert> value and increase in every loop?" This would be really great
Robert> although it needs more programmer's time for implementation.

It is easy to implement - the problem is the way you increase the
number of repetitions and the performance effects you will get. I
played around with some agents (ISODE and CMU) and discovered that
getbulk processing really slows down some agents (because they start
seeking in kernel memory etc. which is really expensive and if you
request to get e.g. 64 variables but the walk ends after e.g. 4
variables you have lost a lot of time). Here is a small test script to
measure plain getbulk performance of your agent:

proc measure {alias} {
set s [snmp session -alias $alias]
puts "alias = $alias"
puts "agent = [lindex [lindex [$s get sysDescr.0] 0] 2]"
foreach n "8 16 24 32 40 48 56 64" {
set vbl [$s getbulk 0 $n system]
set t [time "$s getbulk 0 $n system" 100]
puts "n = $n\tlength = [llength $vbl]\ttime = $t"
}
$s destroy
}

Feel free to play with it. Overall I had the impression that an
`oversized' getbulk for a small walk could even degrade performance.
For example, I just changed the walk command to increase the number of
repetitions starting from 8 to 48 in steps of 8. A walk from an SGI
machine to one of our suns over the whole mib-2 did end in a disaster:
The manager started to send retries because the agent was not able to
get the information out of the kernel in time. Unfortunately, the
agent did not cache the results. So every retry was read from the
kernel again - very slow. Yes, you could adjust the timeout parameter
but you won't be able to do this fine tuning for every agent on your
network.

I have now changed the code to use the values 8, 16, 24 which yields
reasonable results in our environment. But the sad result is that you
should not expect too much from using getbulk requests. A performance
gain of factor 2 is usually possible, but it is IMHO difficult to
adjust all parameters to get more speed out of SNMP. You can easily
degrade performance by using bad parameter combinations for a given
agent and network setup.

I think that we will gain the biggest performance benefits by
extracting index variables out of instance identifiers. This reduces
the total number of variables retrieved which is the most important
factor. However, this needs more internal support in the scotty
implementation and is left to be done.

Robert> That's all for now. I hope my thoughts said in all those
Robert> English words were understandable to you.

I hope the same for my reply. :-)

Juergen