Internet Traffic Characterization


George C. Polyzos
Computer Systems Laboratory, Department of Computer Science and Engineering,
University of California, San Diego, La Jolla, CA 92093-0114
Email: polyzos@cs.ucsd.edu; http://www-cse.ucsd.edu/~users/polyzo

We have recently concluded a three year study (1992-95) of Internet traffic. This talk will focus on some of the observations we made as well as on the architecture for data collection for the NSFNET, the backbone of the Internet during that period, and its limitations, particularly for network forecasting and planning and network performance modeling and analysis. Our statistics reflect operational collection of traffic and network registration data, both initially designed to support short term engineering and planning needs.

Even though the basic architecture of the NSFNET was similar to (and strongly influenced from) the ARPANET, significant changes have been made over time. Particularly important from our perspective is the shift in mission from that of an experimental network, which the ARPANET was, to an operational network. Therefore, it is not surprising that there has been relatively little work reported on the performance and traffic composition and profile of the Internet or the NSFNET as a whole (rather than specific end-to-end experiments through the Internet).

We reported on a measurement study of the T1 NSFNET in [1]. We first presented the measurement environment and the data collection approach and then discussed the limitations of the collected statistics to meaningfully characterize the network performance and in particular to associate instantaneous and average performance with the corresponding traffic levels and profiles. One of the surprising findings of the study is the degree of favoritism that existed on the NSFNET; for example, 0.28% of network pairs accounted for 46.9% of the traffic on the backbone (for the month of May 1992). We have also found very high link utilizations, typically over 50% over 15 minute averaging intervals for peak hours, and in extreme cases over 80%. Such observations and increasing performance concerns have led to the replacement of the T1 network with the T3 backbone, a 30-fold increase in link bandwidth.

We presented the data collection architecture for the T3 NSFNET, compared it with that for the T1 NSFNET and concentrated on difficulties with using the collected statistics for long-term traffic forecasting in [6]. We report on long-term traffic growth and growth in the number of attached networks; for both the best fit was quadratic. We also point out the ever increasing diversity of application profiles and the lack of tools and coordination for keeping track of the growth. In addition, we made some observations about international aspects of the traffic; e.g., in some cases significant amounts of traffic were sent from some countries to themselves through the NSFNET backbone. We have also developed ARIMA time-series models of long-term NSFNET traffic volume with good forecasting capabilities [5].

An observation of discrepancies in traffic measurements obtained through two different mechanisms, reported in [6], prompted interest in sampling for traffic capture, measurement and reporting. We presented a methodology for evaluating various sampling approaches and applied it to two specific targets (packet size and interarrival time distribution) in [4]. One of the applications that motivated our study of sampling is network accounting. In [2] we discuss two other aspects of accounting and pricing in the context of a "free market Internet": decentralized attribution of network resource consumption and the provision of multiple service qualities based on multiple priorities supported by the IP precedence field. We proposed an interim solution that can be implemented with minor modifications to the existing infrastructure.

Even though it is rather obvious that unidirectional latencies between two end systems are not necessarily symmetrical, the only widely available measurement tool, ping (and other similar utilities), only provide round-trip times; furthermore, it was widely believed in the network operations community that delays are practically symmetrical. In [3] we dispelled this myth through measurements and pointed out that new interactive continuous media applications that demand performance guarantees need to establish them in a unidirectional sense.

We have developed a parameterizable methodology for profiling Internet traffic flows at a variety of granularities. For example, end-point granularities include by destination network, network pair, destination host, host pair, host and port quadruple, etc. Our methodology differs from that used by most other studies that have concentrated on end-point definitions of flows in terms of state derived from observing the explicit opening and closing of TCP connections. Instead, our model defines flows based on traffic satisfying various temporal and spatial locality conditions, as observed at internal points of the network. This approach to flow characterization bridges the gap between connectionless IP and connection oriented communications and helps address some central problems in networking based on the Internet model. Among them are route caching optimizations for IP datagram forwarding, resource reservation at multiple service levels, usage based accounting, and the integration of IP traffic over an ATM fabric.

In [7] we first define the parameter space and then concentrate on metrics characterizing both individual flows as well as the aggregate flow profile. We present measurements based on case studies we undertook, which yield significant insights into some aspects of Internet traffic, including demonstrating (i) the brevity of a significant fraction of IP flows at a variety of traffic aggregation granularities, (ii) that the number of host- pair IP flows is not significantly larger than the number of destination network flows, and (iii) that schemes for caching traffic information could significantly benefit from using application information. Note that observation (ii) is important for various recently proposed resource reservation schemes and service disciplines that guarantee quality of service that require servicing traffic based on connections or network or host pairs.


[1]     K.C. Claffy, G.C. Polyzos, and H.-W. Braun, "Traffic Characteristics of         the T1 NSFNET," Proc. Joint Conference of the IEEE Computer and 
        Communications Societies (INFOCOM'93), San Francisco, CA, pp. 885-892, 
        March-April 1993.

[2]     H.-W. Braun, K.C. Claffy, and G.C. Polyzos, "A Framework for Internet 
        Accounting", Proc. SICON'93, Singapore, pp. 847-851, September 1993.

[3]     K.C. Claffy, G.C. Polyzos, and H.-W. Braun, "Measurement Considerations         for Assessing Unidirectional Latencies," Internetworking: Research and 
        Experience, vol.4, no. 3, pp. 121-132, September 1993. 

[4]     K.C. Claffy, G.C. Polyzos, and H.-W. Braun, "Application of Sampling 
        Methodologies to Network Traffic Characterization," ACM Computer 
        Communications Review, vol. 23, no. 4, pp. 194-203, October 1993.

[5]     N.K. Groschwitz and G.C. Polyzos, "A Time-Series Model of Long-term 
        Traffic on the NSFNET Backbone," Proc. IEEE International Conference on         Communications (ICC'94), New Orleans, LA, pp. 1400-1404, May 1994.

[6]     K.C. Claffy, H.-W. Braun, and G.C. Polyzos, "Tracking Long-term Growth 
        of the NSFNET," Communications of the ACM, vol. 37, no. 8, pp. 34-45, 
        August 1994. (Special Issue on Internet Technology.)

[7]     K.C. Claffy, H.-W. Braun, and G.C. Polyzos, "A Parameterizable 
        Methodology for Internet Traffic Flow Profiling," IEEE Journal on 
        Selected Areas in Communications, vol. 13, no. 8, pp. 1481-1494,
        October 1995.