25 NMRG Meeting, October 30, 2008
			 LRZ Munich, Germany

Organization:

  The meeting was organized by Ramin Sadre and Aiko Pras, both from
  the University of Twente. The meeting was hosted by the Leibnitz
  Computing Center (LRZ Munich) and the local organizer was Helmut
  Reiser. The meeting minutes were produced by Juergen Schoenwaelder.
  The meeting was attended by about 40 people. The names were recorded
  on the NMRG meeting web page [1].

Agenda:

  09:30	Keynote
  	Benoit Claise, Cisco Systems

  10:40 Design of an IP Flow Record Query Language
  	Vladislav Marinov, Jacobs University
  11:00 Using SQL databases for flow processing
  	Anna Sperotto, University of Twente
  11:30 A Distributed Architecture for IP Traffic Analysis
  	Cristian Morariu, University of Zurich
  12:00 Towards 10G NetFlow Monitoring using Commodity Hardware
  	Luca Deri, ntop.org

  13:30 Hardware Acceleration of NetFlow Monitoring
  	Jiri Novotny, Masaryk University
  14:10 Hierarchical Flow Aggregation - Problems and Open Questions
        Christoph Sommer, University Erlangen
  14:30 Flow-based TCP Connection State Detection
        Tobias Limmer, University Erlangen
  14:55 Self-management of Hybrid Networks: Can we Trust NetFlow Data?
        Tiago Fioreze, University of Twente

  15:50 Flow Dependency Discovery and Analysis
        Olivier Festor, INRIA
  16:15 SIPFIX: Using IPFIX for VoIP Monitoring
        Sven Anderson, University of Goettingen / NECLAB
  16:50 Discussion

Presentations:

  The presentation slides are available from the meeting web page [1].

Discussions:

* Keynote
  (Benoit Claise, Cisco Systems)

  Q: Are separate templates a good idea? Would you keep templates in a
     future version of IPFIX or have template information more inline?
  A: Yes, I believe templates are useful.

* Design of an IP Flow Record Query Language
  (Vladislav Marinov, Jacobs University)

  Q: Does 'flags = S' match also 'flags = SA'? A victim will response
     and so the flow records to the victim will contain also Ack packets?
  A: This is not a problem since different branches are used to match
     scanning activities and attack activities.

  Q: You claim is this language is simple to use? How do you evaluate
     that this is really the case?
  A: Need to get an implementation finished and then we will evaluate
     using more examples.

  Q: Can we rely on TCP flags? Is usage of TCP flags reliable?
  A: Depends on the concrete platform; some platforms are known
     to have limitations in this space.

  Q: Is the language for online or offline analysis?
  A: At the moment the target is offline analysis.

* Using SQL databases for flow processing
  (Anna Sperotto, University of Twente)

  Q: Did you try to partition the data in time?
  A: The database version we used (mysql) did not support partitioning
     in time, although newer versions seem to support this.

  Q: Why did index srcip/dstip? How many unique srcip/dstip pairs did
     you have in your database?
  A: Probably about 20 million and yes indexing on such a large value
     set is not something useful to do.

  Q: If you drop the indexes, you lose all the advantages of SQL.
     We split data into one hour intervals and use a single index.
  A: We work with a static snapshot and for us the indexes were
     killing the database performance.

  Q: Why do you use SQL in the first place and not use nfsen data
     and tools directly?
  A: We do offline analysis on a fixed data set and we need to run
     many different queries because students work on very different
     tasks.
  A: When we started, we wanted to use something standard that
     students are familiar with (and we were not aware of nfsen at
     that time) and so we did pick SQL.

  C: We use 4 GB/h in a testbed and SQL does not work for us with
     200 million records per day.

* A Distributed Architecture for IP Traffic Analysis
  (Cristian Morariu, University of Zurich)

  Q: How do you identify different flows that belong to the same session?
  A: We use the 5 tuple and some of the payload. The DIPStorage is target
     to a specific application and we are now trying to generalize the
     approach to make it less application specific.

  Q: Do you plan to make changes in the measurement process?

  Q: Can you use payload as part of the keys?
  A: Yes, but be careful about shooting yourself into the foot.

  Q: Can routers not calculate hashes that are more compact?
  A: Technically yes, but there are legal constraints with this.

  Q: Can you solve the problem of multiple flows with NetFlow 5,
     that is two routers seeing the same flow?
  A: We solve it by routing records to the same analysis node which
     will be in charge to correlate the flow records.

  Q: What happens if you do host pair analysis?
  A: No, such a query would go to the overlay indexed by source IP
     port and then be processed in the overlay.

  Q: The question is - how many different partition keys do you need?
  A: We don't know yet.

  Q: Where will you be in one year from now?
  A: We will have evaluation results. We have a testbed with 5-6
     routers and two traffic generators and we plan to do experiments
     in a lab setting. It will be nice to work with real networks...

  Q: How do you plan to deal with anonymization and privacy?
  A: Outside of the scope of this work; anonymization happens before
     data hits our system.

  Q: Isn't it generally the better approach to think about the query
     first and then partition the data instead of partitioning the
     data blindly in the hope that the partitioning will later turn
     out to be useful?
  A: We want to be query agnostic and try to find a distributed
     approach to it and we started from specific scenarios.

  Q: What happens if you add new nodes? Does node churn not introduce
     major overhead? Can an attacker try to exploit this?
  A: Only a problem as long as there is an ongoing query during the
     join/leave operation.

  Q: What is the cost in terms of increased storage and network
     bandwidth used and how does this scale?
  A: The bandwidth for the query is not the bottleneck.

* Towards 10G NetFlow Monitoring using Commodity Hardware
  (Luca Deri, ntop.org)

  Q: How is the functionality used in practice?
  A: There is an API which allows to push a filter from user space
     into kernel space.

  Q: How many packets can the system handle on 10GB?
  A: 6 million per second, but not into user space. The latest NICs
     can handle large volumes, but I do not have the hardware to do
     my own measurements.

* Hardware Acceleration of NetFlow Monitoring
  (Jiri Novotny, Masaryk University)

  Q: What is the price tag for this hardware solution?
  A: Starts with 10k Euro.

  Q: Where do probes go - routers or separate boxes?
  A: There are pros and cons - separate boxes is good for router
     stability, but lots of separate boxes can increase operational
     costs.

  Q: Where should anonymization be done? Is this a collector issue
     or a meter feature?
  A: The FlowMon probe does it on the meter because it is cheap to
     do and does not require much processing power.

* Hierarchical Flow Aggregation - Problems and Open Questions
  (Christoph Sommer, University Erlangen)

  Q: Can you give an example of the problem to be solved?
  A: I need to transport which operations have been applied
     by mediators to flow records.

  C: Should mediators communicate inband the processing applied to the
     flow record stream or is it simpler to keep everything outband to
     the configuration plane?

  C: We need a single common way of specifying processing chains.

  Q: Why should a collector know that some mediator did aggregate data?
  A: Aggregation is not the issue, filtering is the issue since
     filtering means the flow records are less complete.

  Q: Is it possible to implement a generic mediator that can filter on
     arbitrary IPFIX attributes?
  A: Filtering using a direct match is not the difficult part as long
     as you have equality matches. Aggregation is more difficult.

* Flow-based TCP Connection State Detection
  (Tobias Limmer, University Erlangen)

  Q: What is the definition of a "failed TCP connection"?
  A: A failed TCP connection is a connection that did not
     transfer payload.

  Q: Can you figure out which end did the passive open and which end
     did the active open?
  A: Look at the timing of the packets or flows - but this requires of
     course timestamps with adequate resolution (in the microsecond
     range).

  Q: What is the correlation of idle times and packet gaps?
  A: Good question to look at in the future.

* Self-management of Hybrid Networks: Can we Trust NetFlow Data?
  (Tiago Fioreze, University of Twente)

  Q: To what extent are you surprised by the results?
  A: I am surprised by the impact on the duration.

  C: We did the same study and we arrived at the same results.

  Q: George Varghese has an algorithm to only sample the Elephant
     flows. Can you compare your work with this?
  A: I am not aware of this particular work but I am happy to look
     at it. Thanks for the pointer.

  Q: Some probes try to do adaptive sampling giving preference to
     packets belonging to new flows - how does the sampling strategy
     impact the results?
  A: We have not investigated this yet.

  Q: What was the sampling rate for the result with the duration?
  A: SURFnet 1:100, Geant 1:1000

  Q: To what did you compare the sampled data to come up with the
     figures?
  A: We had access to non-sampled flow records in addition to the
     sampled flow records.

* Flow Dependency Discovery and Analysis
  (Olivier Festor, INRIA)

  Q: What is the exact application you have in mind?
  A: Planning of renumbering activities, security monitoring (changes
     in the dependency graph)

* SIPFIX: Using IPFIX for VoIP Monitoring
  (Sven Anderson, University of Goettingen / NECLAB)

  Q: Why is IPFIX the right tool for the problems you are solving?
     Are there not technologies like for example RTCP XR metrics
     [RFC3611] that solve some of your problems more accurately and
     more easily? (Examples II)
  A: I want to do this in a distributed way and not rely on end
     systems to help me.

  Q: How do distributed meters calculate the same media flow
     descriptors?
  A: The SIP meter exports the media flow descriptor which has
     sufficient information so that the media flows can be correlated.

  Q: How do the media gateways measure the delay?
  A: I assume the media gateways can measure the delay somehow. Pings
     are not reliable since they might take different paths.

* Annotated Flow Traces Discussion
  (Aiko Pras, University of Twente)

  The question was raised whether it would be used to have some
  labelled flow traces? There was quite some interest in such
  traces.

  C: It would help the community to settle on a specific format, e.g.,
     the nfdump format.

  C: We have done such an analysis on a short trace (30 minutes) and
     it was a lot of work. We can't share the results.

  Q: Does anonymization solve this problem, e.g., if we replace IP
     addresses with hashes?
  A: You lose information and for some analysis tasks this information
     is important (e.g., prefix relationships or sequential ordering).

  C: Anonymization is application specific and not good enough to get
     commercial network operators to provide traces. We have
     experience that traffic reaching Google is very different from
     what we observe on research networks or university networks.

Actions:

  - Aiko Pras will write a workshop report for JNSM [2]
  - Juergen Quittek takes the lead to organize a special issue
    in some journal (e.g., JNSM [2])
  - Perhaps some EU project ideas can be developed...

References:

  [1] http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2008/munich/
  [2] http://www.csee.umkc.edu/jnsm/