15th NMRG-Meeting (January 8th, 2004, IU Bremen) ================================================ Participants: ------------- # Marcus Brunner (NEC Europe, Germany) # Luca Deri (ntop.org, Italy) # Olivier Festor (LORIA-INRIA, France) # Torsten Klie (TU Braunschweig, Germany) # Aad van Moorsel (?) # George Pavlou (University of Surrey, England) # Aiko Pras (University of Twente, The Netherlands) # Juergen Quittek (NEC Europe, Germany) # Juergen Schoenwaelder (International University Bremen, Germany) # Radu State (LORIA-INRIA, France) # Frank Strauss (TU Braunschweig, Germany) Minutes: Torsten 1. Path-coupled signaling for traffic measurement (Marcus) ---------------------------------------------------------- Passive and active measurement technologies are available for measuring hop-by-hop properties of traffic along its path through the Internet. Passive technologies can measure these properties accurately, but configuring them for the measurement of a particular traffic flow at all hops requires significant overhead for measurement configuration. This problem does not apply to active measurements, such as traceroute, because probing packets automatically follow the same path as the traffic flow to be measured. However, active techniques measure properties/conditions of the injected traffic, which may differ from those of the traffic of interest. Marcus showed an approach [1] that tries to combine the two ways of measurement. It uses signaling for configuring passive hop-by-hop measurements along the path of a traffic flow of interest. Implementations use a pre-standard IETF NSIS protocol. Discussion: AP: What do you consider a high speed link? MB: A link with more than 1GBit. AP: Why is using signaling for measurement configuration better than using traceroute for example? MB: Traceroute gives you the location of the current flow which may change. With signaling, you measure only what you really are interested in. AP: Are these route changes a real issue or are they just theoretically a problem? MB: They are a real problem but not much looked at. JQ: It is simpler because you do not have to implement the transport or the basic security level. Another reason is that it must be simpler because it is meant to be an end user technology and not for someone who is monitoring the network anyway. So it is important not to require any knowledge of the topology. LD: What are your probes? MB: There are 3 implementations, one on Linux (kernel space), one on Linux (user space) and another one on an IXP node. LD: Is it possible to apply this to a real network? It looks more like a configuration for routers. So why yet another protocol instead of a configuration in a MIB? JQ: We are only interested to have it on routers. Dedicated probes is not our focus. The basic idea of the NSIS protocol is to have signaling support on routers. JQ: There are problems with load balancing (data and signaling may have different paths). The same with MPLS. There are scalability problems as well (like in RSVP). MB: In order to save memory the number of managed flows can be restricted. AP: How do you do time-stamping? MB: There is GPS in the routers. AP: What about routers "without daylight"? MB: The problem has not been solved. Maybe using NTP can help. This is definitively an issue. AP: What about the accuracy without GPS? JS: It depends on the accuracy of the kernel mechanisms (more information on the NTP web-site [2]). JS: You have an implementation of the NSIS signaling protocol? MB: It is just a pre-standard prototype implementation. JS: Does the protocol specify how you detect route changes? Or is it an implementation issue? JQ: It works on a refresh base. You have to refresh your routing configuration regularly and when the routing changes the refresh is also rerouted. MB: There is no agreement so far. It depends on the situation (for example mobile vs. more stable environment). JS: Are there other fundamental differences between NSIS and RSVP? MB: The most fundamental change was the split into the generic part and application-oriented part. RSVP is targeted to QoS which has been given up. Multicast is not included. JS: How about security? MB: Security is another hot topic in NSIS and SIP. It is tight to the business model and the trust relationships. OF: If it is not possible to do measurements on all hops in the path, is it possible to go back (like in RSVP) and make a proposal to do measurements every two hops, for example? MB: You could implement that. It is part of the signaling/measurement application. The base NSIS protocol allows you to do that. OF: Can collection be done with signaling as well? MB: Yes. MB: In the future we want to look at what kinds of measurements can be done this way and what are the benefits. AP: The most attractive idea is that you can give these kinds of services to you end users without the need of giving them access to the internals of your network. JS: I think the end user model has scalability problems. I think, it could be used as a service for neighbor ISP because you do not have to give them insight on your network topology. MB: It can be the end user. If it is the service, the user pays for it, so the price will solve the scalability problem. AP: A disadvantage of the approach is that the measurement has to be done within the router. Is there not a problem with the computing power? Signaling may increase the load. This may be an obstacle for deployment. JQ: The routers have an "overload brake" (so that the router stops signaling and measurement when a certain load threshold has been reached). There may be a problem with availability, but the service is not meant to be available for all time. 2. Improving Passive Packet Capture: Beyond Device Polling (Luca) ----------------------------------------------------------------- Passive packet capture is necessary for many activities including network debugging and monitoring. With the advent of fast gigabit networks, packet capture is becoming a problem even on PCs due to the poor performance of popular OSs. The introduction of device polling has improved the capture process quite a bit but not really solved the problem. The problem with polling is that the time-stamps may not be accurate if polling is too slow. If polling is fast, very little CPU is left for user space applications. Luca showed an approach [3] to passive packet capture that combined with device polling further improves it and allows, on fast machines, packets to be captured at (almost) wire speed. He proposed a packet ring data structure in the network driver where incoming packets are copied to. The packet will not be queued into kernel data structures. It is important to know where package loss occurs (driver, kernel, or user-space). Discussion: AP: I have obtained different measurement results with a 1GBit card. Why are the differences in package loss between a 1GBit card and a 100MBit card so high? LD: The logic of the faster cards is much more efficient so interrupts are raised less often. 10GBit cards are even more efficient. JQ: The point is that the large number of package loss only occurs with a large sequence of small packets, so it does not happen that often. Thus, libpcap works in many cases but does not always capture all packets. AP: That was our observation that when you have a DNS attack than you lose data. However, we do not care about counting packets if we get an DNS attack. LD: There is another issue that you should consider. When you lose packets you should also know where you lose packets (kernel level, driver level or user space level). If tcpdump, for example, tells you that you do not have package loss it does not mean that you do not lose packets. JS: What are the differences between the implementations of the different OSes? LD: Linux uses soft interrupts which lead to low performance. Windows uses optimized network drivers and deferred interrupts to achieve better performance. MB: Are there differences in the implementation between Linux 2.4 and 2.6? LD: Not much. JS: How much work is it to adapt the device drivers? LD: Very little. You just have to modify the driver to call my function instead of netif_rx and to disable the transmission. MB: Why do you not implement the function at the beginning of the chain in the kernel before the package is filled into the whole structures? Then it would be generally available to each network card. LD: Such a generic approach should be doable. However, it is much easier to just override the system call in the driver. 3. Solving the middle-box problem (Juergen Q.) --------------------------------------------- Firewalls and NATs are middle-boxes and integral components of the Internet infrastructure but they are also obstacles for many communication services including IP telephony, video conferencing, etc. Several alternative approaches for overcoming this problem are currently under investigation. Juergen showed [4] three of them: (1) controlling middle-boxes by more or less central entities like 'call agents', as investigated by the IETF MIDCOM WG [5], (2) path-coupled signaling between terminals and middle-boxes, as investigated by the NSIS WG [6], and (3) smart middle-boxes configuring themselves based on observed signaling messages. A comparison of advantages and disadvantages of the approaches shows that in different scenarios, different approaches are preferable. The first approach is a telco-style solution. It is widely understood because it is close to gateway controllers. The MIDCOM WG was charted to select an existing protocol. They selected SNMPv3 as the appropriate protocol for configuring firewalls and NATs. The reason was that SNMPv3 is a full standard. However, some people claim that SNMP was not designed for that purposes. The second approach is path coupled signaling, as described above in Markus' presentation. The terminals are enabled to open pin holes. The topic is addressed by the NSIS WG. If there were a transport protocol for signaling, it would progress more quickly. This approach is the only one that will work if the SIP signaling path is different from the data path. The third approach is the smart middle-box approach. A smart middle-box is a device that is smart enough to handle everything. There is small-office/home-office firewall available from CISCO. However, Juergen did not test it, because there are no specifications available. A main problem here is that the firewall must be able to support new signaling protocols. Another issue is that the signaling must be path coupled with the session that shall be established. There should be some policy control. Juergen implemented a modular firewall on NET-BSD with loadable kernel modules which extend ipfilter by new rules. The current conclusion is that all three approaches are useful and needed in different environments. A telecommunication operator will like to have the call-agent approach. At home, users definitely want to have smart middle-boxes. In some other scenarios, path coupled signaling is the best solution. However, probably not all three approaches will survive. Discussion: AP: What happens when you forget to close the pin holes? JQ: For approach 1 and 2, there are timeouts. I would call it "middle-box control" rather than "middle-box management" because you do not configure permanent state. AP: Is the smart middle-box approach not the best solution in theory? JQ: All approaches have advantages and disadvantages. The problem is that networks are usually too complicated. For example, the signaling and the voice data stream do not necessarily take the same way. I also consider it the best solution, except for this disadvantage. Another problem is that it needs quite a lot of computing power on the firewall or NAT because it has to be aware of all the protocols involved. AP: Is the functionality not included in iptables? So implementing it should be quite simple. JQ: Yes, but you also need to implement the protocol parser. AP: [related to approach 1] What happens if the gateway is owned by the end user (for example at home networks)? The end user will have to trust his operator. Furthermore, you need authentication etc. JS: I completely disagree, the firewall does not want to trust the IP phone. AP: I was thinking about the firewall at my home. There will be more home user firewalls than firewalls of companies.At home, I trust my IP phone. JQ: It is possible to have an own SIP server which controls the personal firewall. The call-agent will only control firewalls which are somewhere else (SIP proxy chaining). AP: Then I will need a SIP proxy at home, but I do not want another box there. JQ: It is possible to run the SIP proxy as a process on your PC (such as your personal firewall) or in a box that already provides firewall functionality. AP: If you put the functionality into the same box, where is the difference between a smart middle-box? JQ: In that case, the two approaches are almost merged. However, if you look at companies where you have 50 users but only a single SIP server in one firewall it is fine. AP: People who have ADSL at home will start using IP telephony, so this issue will become important. JQ: Firewalls must be extended with SIP proxy servers (or opened). JS: What happens to your personal firewall if you get an incoming call that wants to go through your firewall? JQ: You will listen on the SIP port and then you will have to open your firewall. OF: Today, in France, if you use the current offers on VoIP over ADSL, you have the phone access directly on the box that you have received from your provider. You cannot put a firewall between your VoIP signaling. AP: ADSL is often offered by other companies than your telephone company, who are competitors, so they try to give you locked solutions so that you cannot switch the operator. OF: There are applications which configure your firewall using UPnP. JQ: UPnP is also one of the protocols here. MB: Our approach was not targeted to the home user with one firewall and one phone, but UPnP is. JQ: UPnP is not well suited if you have a larger office. LD: What about security? JQ: If a remote phone says "open your firewall for me" the incoming call will have to be acknowledged. This can be done via signaling. In order to be more secure the protocol could be extended to restrict the opened UDP traffic to the source address of the calling phone. LD: With IPv6, you will not have NAT but you will have firewalls so your approach will be still useful. JQ: NAT will still be used with IPv6. GP & LD: NAT will disappear. JQ: NAT will still be used because huge companies such as IBM, Sun, HP use NAT although they have enough IP addresses. However, it will not be used that intensively as today. AP: If you have a lot of machines at home - Morris said yesterday that there might be 100000 machines at home - your network is more easy to manage using NAT. LD: You can connect to all machines using the same address. AP: Why does the MIDCOM WG not use NETCONF instead of SNMPv3? JQ: It is a fast and dynamic configuration issue which you do on a call per call base. I am not sure if NETCONF is the right tool here. AP: No, it is not. JS: Another reason is that SNMPv3 is a full standard. Formally, it is better to use something that already exists than something that might exist sometimes. JQ: SNMP was not designed to do these kind of things. AP: SNMP was designed to do that but it is not happening. JQ: If it is not happening, there is probably something wrong with the design. JS: It is completely contradictory that you do a protocol evaluation which concludes that SNMP is the right choice and afterwards without really knowing the details you find out that SNMP was not really designed for the purpose. JQ: It can serve the purpose but that does not mean that it has to be specifically designed to serve the purpose. The other protocols were also not designed to serve the purpose (COPS, for example). JS: What was the technical argument behind the statement "SNMP was not designed for that purpose"? JQ: Transactions are possible with SNMP but not really convenient. It is possible to do everything in a single set operation, but only if it fits. AP: How much data do you usually need? Maybe it will fit. JS: It might fit but it is a problem of the problem. With the security enabled, the space in a set operation will be quite limited. AP: What about scalability? If the security features of SNMPv3 are used, key management will become very complicated. JQ: Key management is a problem anyway, also with the other protocols that were considered. AP: What about the future of the WG? JQ: The WG probably will be closed soon after a MIB will be released. AP: Is the constraint of not defining a new protocol a good argument for smart middle-boxes? If you are not allowed to define a protocol. do not define and use a protocol. JQ: We defined a simple protocol as an Internet draft. However, the area director said we first have to define a MIB and then we are allowed to publish the new protocol as an informational RFC. OF: [related to approach 3] The intelligence of the firewall must be configured somehow. Is this possible? AP: The firmware of a firewall can be updated in the same way as usual operating systems. AP: What protocols can be used? JQ: ftp, h.323, SIP, rtcp AP: Which ones have been done? JQ: None, but ftp should be quite simple to do. AP: ftp has. MB: No, iptables does not support ftp. JQ: Well, ftp is simple to implement and we used ipfilters and not iptables. AP: Was UPnP not specifically designed to solve the mentioned problems? JQ: UPnP is the best choice for the single computer home environment. Therefore, it is also discussed in the MIDCOM WG. But there are scenarios where UPnP is not sufficient. 4. Using Distributed Object Technologies for Network Management (George) The use of distributed object technologies (DOT) for network management has been intensively researched in the mid- to late-1990's. The X/Open-NMF JIDM produced guidelines for translating SNMP, SMI, and OSI-SM GDMO models to CORBA IDL and using CORBA as the access mechanism. This approach though was never adopted per se, but variations of it have been used mostly in telecommunication environments. It has recently become evident that a semantic rather than syntactic approach for converting SNMP SMI and GDMO models to distributed object interface specifications is the way forward. George's presentation [7] reviews the state-of-the-art in using distributed object technologies for network management and will propose a framework that circumvents their usual problems, making potentially possible to adopt distributed objects for Internet management. For all other cases than table retrieval, plain Get request is sufficient. However, this is not true for SNMPv1 because of the lack of proper error handling. A disadvantage of DOT is that the default is to have one get method per object attribute. In case of large object populations, this leads to sub-optimal information retrieval. George therefore proposes a different mapping. For all attributes (i.e. properties) use one method per object. Dynamic counters and probably time attributes should be grouped. Discussion: JS: Some of my code does not use Get at all. GP: Why do you use GetNext to retrieve single instance objects? JS: The reason is that when you ask for a list of objects and one object is not present then your whole get operation fails. AP: This is only true for SNMPv1. JS: Well, I am talking to real agents. Anyway, it is possible to generate stubs out of MIBs. GP: Yes, that is what WS people do. BTW, is there any SNMP API with stubs? JS: Yes, I have written one (smidump -f scli). The SNMP protocol is very simple and it requires some engineering on top of it to make it usable. The interesting thing is, that in all the years with SNMP people have not done this. GP: I completely agree. AP: We as researchers keep making the same mistakes. When we talk about simple, we talk about simple design. But if you want to have something simple it should be simple to use. The main advantage of WS is that it is easy to use because you get it for free everywhere. GP: If you take WS with a plain SOAP API it is very difficult to use. AP: Why should there not be a Get operation in the WSDL file that retrieves all the data in a large XML file on which XPath and XQuery can be used to select more specific data? GP: The proposal is not only applicable to WS. It is transparent to all DOTs. AP: There is another advantage. This is easy to parse with existing software. With MS Excel, for example, you can write it down in four lines. If you have an XML document you will have the XML handling yourself. For users, this is far more difficult. AvM: HP decided to go use a grid service approach because they think there will not be consensus on the grouping of objects. That is why they prefer to deal with large XML documents. In general, the XML based approach seems to be more useful in a large scale operator environment where experts do the management. GP: Network management is not that different to other distributed systems topics. AP: Management is not different than anything else so we should use the same technologies. However, we may have our specialized WSDL files. AvM: The Question is what should be standardized. AP: The question is what will be used. I think that the users will decide for what they find easy to use. George's approach is simpler to use than shipping large XML documents. However, if you manage sophisticated machines within an operator environment where you have skilled people with the XML approach you are more powerful. But if we have 100000 computers per human being we cannot manage them by programming with XML documents. It does not scale anymore. AvM: Why do you consider WS to have strong typing? GP: On the SOAP level, it is loose typing. But if you use stubs, you will get strong typing. AP: Is it better to ship entire tables or is it better to retrieve single entries? GP: If you ship entire tables it is possible to have faster agents. With the retrieval of single entries it is possible to save bandwidth. However, bandwidth is not an issue these days. If you need to get single objects you can implement it. However, then you will need a naming etc. and it will get complicated. AP: Would it not be useful to have a possibility to retrieve an entire table and specifying arguments that select rows which have certain properties? GP: I was trying to keep it simple. It shall be dine at the manager's side. GP: Why did MIB designers invented linked replies? Why did they not include all the data in a single big reply? AP: Because the reply could consume MBs of data. JS: Because the model allows asking multiple agents in one request. That makes it impossible to put everything into one reply. OF: If you have an application level routing on distinguished names, the request can be forwarded to different agents depending on the prefix. JS: [shows his TCP-MIB API, which has been generated by a compiler] I agree with the statement that for real management applications the API is the key. In my API, there is a stub function to which you pass a mask and which then retrieves the desired data from the MIB. I allow the application to chose the data. For read/write stuff, such as the tcp table, for example, you get another stub function that retrieves the whole table. You can mask some columns if you want to. Another stub function is "get one entry". With this API you can write management applications without knowing anything about SNMP. The compiler was written in 2000. However, people are still programming with Get and GetNext API calls. AP: I think vendors will include WS into their devices like they have put web servers for manual configuration. Probably, these WS will not be standardized. 5. Performance Evaluation of Web Services as Management Technology (George) Web Services has been recently emerging as an XML-based technology for distributed access to Internet services. A careful examination reveals that Web Services is a technology with many similarities to distributed objects, so it could also be used for network management. This could be possibly done through the framework presented in the previous talk, which avoids potential scalability problems. In this presentation, George first identifies the similarities of WS and distributed object technologies. He then examines the usability and suitability of WS for network management and presents a performance evaluation of selected scenarios in comparison to SNMP and CORBA. Aiko did almost similar measurements but got different results. Discussion: GP: We used WASP and not gSOAP, because GSOAP is highly optimized software and there is no highly optimized software for CORBA. To ensure a fair comparison, we used a "lighter" API. AP: This is an important point. With WS you can get highly optimized software for free. AvM: WS are not OO technology, at least not at the moment (WSDL 1.1) because inheritance and statefulness is missing. Maybe those will be added in WSDL 1.2 or 2.0. It would be better to compare CORBA with Grid Services (OGSI and OGSA) because WS is service oriented technology whereas GS is OO technology. AP: It is an interesting observation that sometimes it takes longer to retrieve values from kernel space than to do the entire protocol handling. So protocol handling often is only a minor issue. If you use NET-SNMP, for example, which does not do any caching, it will retrieve value after value out of the interface table even when you look at the whole table. Therefore, if you make a Get on 100 values from the interface table you will make 100 single kernel polls. GP: We started our implementations in Java. Later, we recoded them in C++ due to the overhead of Java. AP: We did some implementations in Java because we wanted to run the applications on mobile phones. The problem with Java is that it is very difficult to get low level informations about the amount of memory that you use. AP: I have different measurement results that lead me to different conclusions. One reason is that George used hard-wired data with the TCP implementation (e.g. counters). So, in my figures, SNMP is much worse than anything else. JS: This is due to a bad behavior of the SNMP implementation which may lead to a large number of get requests there are different results with non hard-wired data. AvM: SNMP is bad in your cases. JS: No, the implementation is bad. The quality of the SNMP implementations after 12 years of SNMP is totally disappointing, so we should stop. AP: WS without compression leads to very large amount of data that has to be shipped, much more than SNMP or CORBA use. Compressed WS are very good with respect to network usage. AP: When using compression with WS, the efficiency increases if more data is retrieved. My conclusion is that it is impossible to conclude which technology is better in general because it depends on the use case and the used software packages. If you just want to retrieve sysUpTime from a router, do it via SNMP. If you want to retrieve the entire interface table for 500 customers that are connected to your ADSL multiplexer, forget about SNMP, because compressed WS are far more efficient. My measurements also show that the time for the protocol is neglectable compared to the time of the data retrieval. JS: It is not valid to make comparisons with bad implementations. AP: We want to compare what is practically available and does we do not care about theoretical comparisons. References --------------------------- [1] http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2004/bremen/brunner.pdf [2] http://www.ntp.org [3] http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2004/bremen/deri.pdf [4] http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2004/bremen/quittek.pdf [5] http://www.ietf.org/html.charters/midcom-charter.html [6] http://www.ietf.org/html.charters/nsis-charter.html [7] http://www.ibr.cs.tu-bs.de/projects/nmrg/meetings/2004/bremen/pavlou1.pdf