To manage or not to manage: Addressing the benefit
	       overhead tradeoff in network management

			 Danny Raz, Technion

The increased complexity of networking infrastructure and protocols and 
the desire to provide high quality services at the lowest possible cost, 
drive many organizations to deploy more network and system management 
tools in their networks.  It is often argued that due to the high 
complexity of management, a much more cost effective way to assure 
performance is just to acquire more resources.

This is particularly true for performance management of Information 
Technology (IT), where the goal is to coordinate networked resources in 
such a way that the business-level objectives are met at all times, at a 
lowest possible cost, and with optimum capacity. As Service-Oriented 
Architecture (SOA) spreads as a popular way of organizing and providing 
distributed capabilities to solve business problems, cost-effective 
performance management becomes essential. Practicing IT administrators 
know well that committing more resources to management improves the 
overall quality of service up to a certain point, after which management 
costs start dominating the total cost of ownership and management 
off-sets its own advantages. Thus, although it is naturally desirable 
that the network would perform at the highest possible level, this may 
not be the best solution due to the associated cost.

Time is now mature for the research community to address this 
fundamental tradeoff in a rigorous way by showing exactly how much 
effort should be invested into management to gain the maximal benefit.  
In order to do that, one should accurately define both the cost 
associated with the management process and the expected benefit.  Of 
course, considering the overall benefit of general management systems 
and all aspects of the associated overhead may be impossible due to the 
variety of different aspects involved and different network conditions.  
However, when applying to specific tasks, within the network management 
domain, one can rigorously define this tradeoff, and then provide a 
general tool to find optimal working points for such systems.

Consider for example a service that is being provided by a set of 
servers over the network. The goal of the service provider is to provide 
the best service (say, minimizing the service time) given the amount of 
available resources (e.g., the number of servers).  The provider can add 
a load sharing system (for example as suggested in RFC 2391) and improve 
the service time.  However, the same resources (budget) could be used to 
add additional servers to the system and thus provide better service to 
end customers.  The dilemma here is between adding more computational 
power and adding management abilities, where the goal is to achieve the 
best improvement in the overall system performance. Note that in order 
to be effective, the load sharing system needs updated load information 
from the servers. Handling such load information requests requires small 
but nonzero resources (e.g., CPU) from each server. Thus, it is not easy 
to predict the actual amount of improvement expected from preferring a 
specific configuration. Yet, for this concrete example, one can 
formalize the cost and expected benefit and define an optimal working point.

As indicated by this example, it is important to identify just the right 
amount of resources that should be allocated to management tasks (such 
as monitoring) in order to maximize the overall system performance. In 
additional to being an important and interesting research direction, 
this approach can be proven to provide practical tool that can help in 
providing cost effective services to the community.