25 NMRG Meeting, October 30, 2008 LRZ Munich, Germany Organization: The meeting was organized by Ramin Sadre and Aiko Pras, both from the University of Twente. The meeting was hosted by the Leibnitz Computing Center (LRZ Munich) and the local organizer was Helmut Reiser. The meeting minutes were produced by Juergen Schoenwaelder. The meeting was attended by about 40 people. The names were recorded on the NMRG meeting web page [1]. Agenda: 09:30 Keynote Benoit Claise, Cisco Systems 10:40 Design of an IP Flow Record Query Language Vladislav Marinov, Jacobs University 11:00 Using SQL databases for flow processing Anna Sperotto, University of Twente 11:30 A Distributed Architecture for IP Traffic Analysis Cristian Morariu, University of Zurich 12:00 Towards 10G NetFlow Monitoring using Commodity Hardware Luca Deri, ntop.org 13:30 Hardware Acceleration of NetFlow Monitoring Jiri Novotny, Masaryk University 14:10 Hierarchical Flow Aggregation - Problems and Open Questions Christoph Sommer, University Erlangen 14:30 Flow-based TCP Connection State Detection Tobias Limmer, University Erlangen 14:55 Self-management of Hybrid Networks: Can we Trust NetFlow Data? Tiago Fioreze, University of Twente 15:50 Flow Dependency Discovery and Analysis Olivier Festor, INRIA 16:15 SIPFIX: Using IPFIX for VoIP Monitoring Sven Anderson, University of Goettingen / NECLAB 16:50 Discussion Presentations: The presentation slides are available from the meeting web page [1]. Discussions: * Keynote (Benoit Claise, Cisco Systems) Q: Are separate templates a good idea? Would you keep templates in a future version of IPFIX or have template information more inline? A: Yes, I believe templates are useful. * Design of an IP Flow Record Query Language (Vladislav Marinov, Jacobs University) Q: Does 'flags = S' match also 'flags = SA'? A victim will response and so the flow records to the victim will contain also Ack packets? A: This is not a problem since different branches are used to match scanning activities and attack activities. Q: You claim is this language is simple to use? How do you evaluate that this is really the case? A: Need to get an implementation finished and then we will evaluate using more examples. Q: Can we rely on TCP flags? Is usage of TCP flags reliable? A: Depends on the concrete platform; some platforms are known to have limitations in this space. Q: Is the language for online or offline analysis? A: At the moment the target is offline analysis. * Using SQL databases for flow processing (Anna Sperotto, University of Twente) Q: Did you try to partition the data in time? A: The database version we used (mysql) did not support partitioning in time, although newer versions seem to support this. Q: Why did index srcip/dstip? How many unique srcip/dstip pairs did you have in your database? A: Probably about 20 million and yes indexing on such a large value set is not something useful to do. Q: If you drop the indexes, you lose all the advantages of SQL. We split data into one hour intervals and use a single index. A: We work with a static snapshot and for us the indexes were killing the database performance. Q: Why do you use SQL in the first place and not use nfsen data and tools directly? A: We do offline analysis on a fixed data set and we need to run many different queries because students work on very different tasks. A: When we started, we wanted to use something standard that students are familiar with (and we were not aware of nfsen at that time) and so we did pick SQL. C: We use 4 GB/h in a testbed and SQL does not work for us with 200 million records per day. * A Distributed Architecture for IP Traffic Analysis (Cristian Morariu, University of Zurich) Q: How do you identify different flows that belong to the same session? A: We use the 5 tuple and some of the payload. The DIPStorage is target to a specific application and we are now trying to generalize the approach to make it less application specific. Q: Do you plan to make changes in the measurement process? Q: Can you use payload as part of the keys? A: Yes, but be careful about shooting yourself into the foot. Q: Can routers not calculate hashes that are more compact? A: Technically yes, but there are legal constraints with this. Q: Can you solve the problem of multiple flows with NetFlow 5, that is two routers seeing the same flow? A: We solve it by routing records to the same analysis node which will be in charge to correlate the flow records. Q: What happens if you do host pair analysis? A: No, such a query would go to the overlay indexed by source IP port and then be processed in the overlay. Q: The question is - how many different partition keys do you need? A: We don't know yet. Q: Where will you be in one year from now? A: We will have evaluation results. We have a testbed with 5-6 routers and two traffic generators and we plan to do experiments in a lab setting. It will be nice to work with real networks... Q: How do you plan to deal with anonymization and privacy? A: Outside of the scope of this work; anonymization happens before data hits our system. Q: Isn't it generally the better approach to think about the query first and then partition the data instead of partitioning the data blindly in the hope that the partitioning will later turn out to be useful? A: We want to be query agnostic and try to find a distributed approach to it and we started from specific scenarios. Q: What happens if you add new nodes? Does node churn not introduce major overhead? Can an attacker try to exploit this? A: Only a problem as long as there is an ongoing query during the join/leave operation. Q: What is the cost in terms of increased storage and network bandwidth used and how does this scale? A: The bandwidth for the query is not the bottleneck. * Towards 10G NetFlow Monitoring using Commodity Hardware (Luca Deri, ntop.org) Q: How is the functionality used in practice? A: There is an API which allows to push a filter from user space into kernel space. Q: How many packets can the system handle on 10GB? A: 6 million per second, but not into user space. The latest NICs can handle large volumes, but I do not have the hardware to do my own measurements. * Hardware Acceleration of NetFlow Monitoring (Jiri Novotny, Masaryk University) Q: What is the price tag for this hardware solution? A: Starts with 10k Euro. Q: Where do probes go - routers or separate boxes? A: There are pros and cons - separate boxes is good for router stability, but lots of separate boxes can increase operational costs. Q: Where should anonymization be done? Is this a collector issue or a meter feature? A: The FlowMon probe does it on the meter because it is cheap to do and does not require much processing power. * Hierarchical Flow Aggregation - Problems and Open Questions (Christoph Sommer, University Erlangen) Q: Can you give an example of the problem to be solved? A: I need to transport which operations have been applied by mediators to flow records. C: Should mediators communicate inband the processing applied to the flow record stream or is it simpler to keep everything outband to the configuration plane? C: We need a single common way of specifying processing chains. Q: Why should a collector know that some mediator did aggregate data? A: Aggregation is not the issue, filtering is the issue since filtering means the flow records are less complete. Q: Is it possible to implement a generic mediator that can filter on arbitrary IPFIX attributes? A: Filtering using a direct match is not the difficult part as long as you have equality matches. Aggregation is more difficult. * Flow-based TCP Connection State Detection (Tobias Limmer, University Erlangen) Q: What is the definition of a "failed TCP connection"? A: A failed TCP connection is a connection that did not transfer payload. Q: Can you figure out which end did the passive open and which end did the active open? A: Look at the timing of the packets or flows - but this requires of course timestamps with adequate resolution (in the microsecond range). Q: What is the correlation of idle times and packet gaps? A: Good question to look at in the future. * Self-management of Hybrid Networks: Can we Trust NetFlow Data? (Tiago Fioreze, University of Twente) Q: To what extent are you surprised by the results? A: I am surprised by the impact on the duration. C: We did the same study and we arrived at the same results. Q: George Varghese has an algorithm to only sample the Elephant flows. Can you compare your work with this? A: I am not aware of this particular work but I am happy to look at it. Thanks for the pointer. Q: Some probes try to do adaptive sampling giving preference to packets belonging to new flows - how does the sampling strategy impact the results? A: We have not investigated this yet. Q: What was the sampling rate for the result with the duration? A: SURFnet 1:100, Geant 1:1000 Q: To what did you compare the sampled data to come up with the figures? A: We had access to non-sampled flow records in addition to the sampled flow records. * Flow Dependency Discovery and Analysis (Olivier Festor, INRIA) Q: What is the exact application you have in mind? A: Planning of renumbering activities, security monitoring (changes in the dependency graph) * SIPFIX: Using IPFIX for VoIP Monitoring (Sven Anderson, University of Goettingen / NECLAB) Q: Why is IPFIX the right tool for the problems you are solving? Are there not technologies like for example RTCP XR metrics [RFC3611] that solve some of your problems more accurately and more easily? (Examples II) A: I want to do this in a distributed way and not rely on end systems to help me. Q: How do distributed meters calculate the same media flow descriptors? A: The SIP meter exports the media flow descriptor which has sufficient information so that the media flows can be correlated. Q: How do the media gateways measure the delay? A: I assume the media gateways can measure the delay somehow. Pings are not reliable since they might take different paths. * Annotated Flow Traces Discussion (Aiko Pras, University of Twente) The question was raised whether it would be used to have some labelled flow traces? There was quite some interest in such traces. C: It would help the community to settle on a specific format, e.g., the nfdump format. C: We have done such an analysis on a short trace (30 minutes) and it was a lot of work. We can't share the results. Q: Does anonymization solve this problem, e.g., if we replace IP addresses with hashes? A: You lose information and for some analysis tasks this information is important (e.g., prefix relationships or sequential ordering). C: Anonymization is application specific and not good enough to get commercial network operators to provide traces. We have experience that traffic reaching Google is very different from what we observe on research networks or university networks. Actions: - Aiko Pras will write a workshop report for JNSM [2] - Juergen Quittek takes the lead to organize a special issue in some journal (e.g., JNSM [2]) - Perhaps some EU project ideas can be developed... References: [1] http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2008/munich/ [2] http://www.csee.umkc.edu/jnsm/