Minutes of the 22nd NMRG meeting
IETF 68, Prague, Czech Republic
23 March 2007
Minutes: Vladislav Marinov, Juergen Schoenwaelder

Participants:

The meeting was attended by about 40 people. The names were recorded
on the rooster which went to the IETF secretariat.

Agenda:

09:00 Agenda bashing and administrivia
      Juergen Schoenwaelder (Chair)
09:10 Report from the NMRG workshop in Utrecht
      Aiko Pras
09:30 Last call review draft-irtf-nmrg-snmp-measure-01.txt
      Juergen Schoenwaelder
10:00 SNMP Traffic Analysis Update
      Juergen Schoenwaelder
10:30 Distributed monitoring algorithms
      Alberto Gonzalez Prieto
11:00 Discussion
      Everybody
11:20 Wrapup
      Juergen Schoenwaelder (Chair)

Abbreviations:

JS: Juergen Schoenwaelder
AP: Aiko Pras
BW: Bert Wijnen 
DP: David Perkins
RB: Randy Bush
LD: Luca Deri
DH: David Harrington
SL: Simon Leinen
DR: Dan Romascanu
AG: Alberto Gonzalez Prieto
AB: Andy Bierman
PD: Petre Dini

1. Introduction to NMRG (JS)

   The agenda was updated since the NETCONF RBAC people from France
   could not make it to Prague meeting. The relevant I-D is 
   draft-cridlig-netconf-rbac-00.txt.

2. EMANICS / NMRG Workshop Report (AP)

   AP reviewed the EMANICS/NMRG workshop in October 2006 on future
   research directions. See his slides for details. About 25% of the
   attendees were at the IAB plenary the night before the NMRG
   meeting; so AP decided to go over all slides (and not just the ones
   not presented at the plenary).

   C: Your bullet that "Network management is risk management" was
      very interesting. There is always risk in uncertainty, we are
      not sure what happens in the network.
 
   Q: About the economic slides - is there work in the area? 

   A: Yes, one of the EMANICS work packages is economic management;
      but work is still focusing on SLAs and legal issues. I want to
      see answers of this question myself in the future.

3. Last call review draft-irtf-nmrg-snmp-measure-01.txt (JS)

   This document was last called and 15 reviews were submitted. Bert
   Wijnen acts as the shepherd for this document. JS went through all
   reviews, discussing those issues he felt worth bringing up (i.e.,
   the edits may not be fully understood); see the issues list in the
   proceedings. Below are the actions agreed on during the meeting:

   - Change "inform" to "inform-request" but leave "trap2"

   - JS to contact Frank - he wants to see a figure but it is unclear
     what the figure should show

   - Change the requirement to keep trace sources to a should keep
     rather then a must keep. Explain why it is good to keep the
     original data sets but also explain issues arising from this
     material being sensitive

   - The anonymization section should provide sufficient pointers for
     people interested in anonymization but it is not the goal of the
     document to provide an extensive discussion of this topic; the
     document is intended to make the operator communities aware that
     we want trace data; making the document too long will make it
     harder to use it for its intended purpose. Decided to add a
     summary what the state of the art in the area is

   - The main reason to use names for protocol operations is that they
     are frequently read by humans for debugging and analysis while
     the error-status value is usually of secondary importance (as is
     the version number).

     [Ed: the more I think about it, it may actually be useful to use
     strings rather than numbers]

   - Add a statement how the relaxng compact notation schema can be
     converted into other XML schema languages

   - Add a section with warnings about generalizing conclusions from
     biased data sets; conclusions derived from a biased small data
     set might not be universally true

   - Change the base type mapping so as to not loose information (that
     is report Counter32 and friends (including Opaque) correctly; do
     not use the SMIng base type model).

   - Clarify that the IPv6 address type is just there to capture src
     and dst addresses (do we have to add MAC addresses?)
 
   - State that it was a design decision to keep conversion tools MIB
     agnostic. Hence, fancy OID formats can't be done. (JS explained
     that he wrote a separate tool called smixlate to handle this.)

   - Document that packets containing mal-formed SNMP messages or
     messages which are neither SNMPv1/SNMPv2c/SNMPv3 can be dropped
     and that implementations should report the number of such dropped
     messages

   - Add examples for CSV and XML formatted packets

   - Add rough numbers about trace sizes, e.g., a flow polling N
     interfaces (inOctets, outOctets) every 5 minutes causes traces of
     size xxx pcap, yyy csv, zzz xml. It would be nice to extract such
     a flow to get the numbers...

   - Keep XML format since it retains all information; the CSV format
     only retains a subset of the information.

   - Rewrite section 2.5 so that it stresses the point that some trace
     providers may prefer to not give away trace data but are willing
     to execute analysis scripts if they can be checked that they do
     not harm; this requires to use high-level languages typically
     understood by operators (e.g. Perl - make clear Perl is just an
     example).

   - Keep section 3 as it was generally considered to be useful,
     especially for explaining how traces may be used and to give an
     idea which questions will be answered with this research

   - Explain that pcap is a bad format since it requires complex
     analysis code (IP reassembly, BER decoding) and thus makes it
     difficult to verify an analysis script does not do harm

3. Trace Analysis Update (JS)

   JS presented an update of the trace analysis work done at Jacobs
   University and the University of Twente. See the slides for the
   details of the presentation. Below are some questions that came up
   during the presentation.

   Q: You did observe SNMPv3 messages?

   A: Yes, but very sporadic; might have been an attempt to test if an
      agent supports SNMPv3

   Q: Were there responses to the SNMPv3 messages?

   A: I do not recall precisely, but I assume so

   Q: There were only very few informs in traces that use SNMPv2c

   A: Yes, it seems that GetBulk is a stronger incentive to switch to
      SNMPv2c in the traces we have collected so far.

   Q: Can we extrapolate in larger aspect?

   A: No, the data set is way too small for generalizations

   Q: Which applications were used?

   A: We didn't ask for this information, don't even know if we can expect
      operators to know

   Q: Operators should provide us with information what happened on the
      network during the trace collection period. It will be useful to
      capture what applications were used

   A: We want to identify communication patterns, without knowing
      which applications generated them. Note that we already try to
      track meta data but we do not necessarily expect it to be
      complete

   Q: Is the flow definition consistent with the IPFIX definition?
 
   A: No because they use port numbers, but many management scripts
      kick short lived snmp retrieval processes and thus dynamic port
      numbers are assigned for the manager processes

   C: There was also one long trace, huge in size. It turned out to be
      someone walking into a timefilter table and never getting out,
      staying there for days.

   Q: At some point it was mentioned that there are some dumb
      applications that simply retrieve read-only data

   A: Probably applications should look at the semantics of the data
      and then caching is not that hard; of course, a generic MIB
      browser or data poller will not do this

   C: Applications do not support caching anyway

   C: Probably caching is not worth the cost of optimization because
      sometimes routers get rebooted and reconfigured and it is not
      worth the risk, i.e., configuration might be changed suddenly so
      the cost of optimizing applications w.r.t. to retrieving
      read-only data might not be worth because of the risk

4. Distributed Monitoring and Aggregation (KTH)

   AG presented the work done at KTH on distributed monitoring and
   aggregation based on a live demo. The slides are recorded in the
   proceedings.

Questions:

   Q: SL, as an operator, is this useful for your environment?

   A: Our network is not so big, no need to optimize queries, you can
      optimize the number of samples in order to improve the mean
      error objective.

   Q: I would be interested in short term samples, your algorithm is
      adaptive, the network is running stable, you adapt so that
      network is queried every second hour. What about short time
      changes? What will happen if there is a failure after several
      minutes?

   A: We are not polling. The protocol follows the changes, when
      difference from the previous value is significant, a message
      will be sent immediately.

   Q: What about polling overhead? Routers should do useful work and
      polling introduces overhead.

   A: Aggregation is not performed in the router. None of the routers
      have to process more than 1 message per second.

   Q: You are using routers for points of observations. Is there any
      research about what are the better observation points for
      different applications?

   A: I am not aware of this.

5. Open Discussion

  Q: [AB] Discussion during the IETF ops-area meetings concentrated on
     NETCONF, XSD, data model contexts, concerns about document
     control, life cycle management. Is that something the IRTF can
     help with or IETF should do that. Is XSD good enough so we can
     move on?

  C: [DP] Work needs to be done. It is a serious issue when people
     need to do work on modeling and have no language. Work on finding
     a good language needs to be done quickly. Otherwise, every
     company will have its own way of doing things in different
     languages.

  Q: [AP] Do you want standard or research?

  A: [DP] IETF => standard

  Q: XML schema is a optimal language. What is the problem?

  A: [DP] Why it is not sufficient will be explained in standardization.

  C: Cannot wait for research activities. Think it is IETF activity.

  C: [DR] Focus should be what NMRG should do, what specific questions
     on research we should focus on, what data modeling techniques and
     languages are available. Suggest about research discussion on
     NETCONF data modeling. NMRG should prepare a state of art about
     what happens in research.

  C: [JS] SMING lesson is that being protocol independent gets
     difficult, perhaps people can write what the pitfalls are.
     Agreeing on syntax is part of engineering not research.

  C: Get more and more data from companies, research how much meta-data
     is used and how much it is read by humans, how much they read
     descriptive data.

  C: Perhaps we can produce a lessons learned document from the SMING
     experiment before Chicago

  C: [PD] I think a good topic for research should be manageability
     and in particular self-manageability, self, self... I didn't see
     guidelines how to build autonomic systems and components, how to
     build and manage them. This will be good topic for
     research. Suggest to form sub-group and come up with draft
     document. Some work in the IETF will be implemented as system
     entities which sooner or later support self manageability. Some
     documents in IETF have management related aspects. I would be
     interested in small subsection where entities have SELF inside?

  C: [JS] Do you mean interaction between control loops?

  C: Self-management networks are interesting as there you say how you
     can predict unexpected things.

  C: Still use aggregation, traffic aggregation, what properties were
     aggregated, combined. When you have 2 entities that interact some
     of their features disappear.

  C: Behavior can be harmful for the network

  C: [JS] Document what should be considered when autonomic work is
     done, such as control loop interactions

  C: Where is the authoritative control coming from: box or centralized
     entity?

  C: Acceptance of autonomic management correlates with policy based
     network approach acceptance. Should we research policy based network?