Pipeline Publishing, Volume 3, Issue 4

	This Month's Issue:
	New Frontiers

Skype:
The Future of Traffic Detection and Classification

article page | 1 | 2 | 3

other P2P protocols). In some cases the telecom Marketing departments are highly interested in what percentage of their customers are using Skype so that they can decide whether or not to launch their own commercial VoIP service. In other cases, unpredictable bandwidth consumption and security issues are concerning enterprise IT managers- the customers of the telecom carrier. Many of these enterprise IT managers are responding by requiring that the carrier actually block Skype traffic before it hits their private networks.

Challenge: Detection of Skype Traffic

In general, effective Internet traffic detection and classification requires three key elements:

1. Accuracy: the technique should have low false positive (identifying other protocols as targeted protocol X);

2. Scalability: the technique must be able to process large traffic volumes in the order of several hundred thousands to several million connections at a time, with good accuracy, and yet not be computationally expensive;

3. Robustness: traffic measurement in the middle of the network has to deal with the effects of asymmetric routing (two directions of a connection follow different paths), packet losses and reordering.

There are usually tradeoffs in terms of the level of accuracy, scalability and robustness that can be achieved relative to the detection of any given protocol or service.

"To overcome the issues with port-based detection, a new technique has emerged based on payload-signature methods..."

developmental history and partially a result of the proprietary nature of many protocols. For example, most P2P protocols are both proprietary and constantly evolving. Some of these (Gnutella for instance) provide some documentation, but it is often incomplete, or not up-to-date. To make matters worse, there are various implementations of Gnutella clients, some of which do not comply with the specifications in the documentation (raising potential interoperability issues). For application detection and classification to be accurate, it is important to identify signatures that span all the variants (or at least the dominantly used ones). However, it is increasingly common to see new applications (such as Skype or GCN) employing 128-bit or 256-bit encryption techniques to defend the privacy of the information exchanged between their users. As a consequence, the payload-signature method fails when traffic is encrypted, because the signatures in the packet payload are scrambled by the encryption.

Skype offers a combination of challenges that make it notoriously difficult to detect with scalable, accurate algorithms:

• The Skype agent does not run on any standard source port. Skype randomly selects a source port for the agent to run on, then communicates via either TCP or UDP, or both. The choice of the protocol that Skype uses depends on whether the agent is behind a

One current classification practice consists of TCP/UDP port number application identification using known TCP/UDP port numbers to identify traffic flows. This method is highly scalable since only the TCP/UDP port numbers must be recorded to identify a particular application. It is also highly robust since a single packet is sufficient to make a successful identification. Unfortunately port number-based identification is increasingly inaccurate primarily due to the fact that P2P networks tend to intentionally disguise their generated traffic in order to circumvent filtering firewalls (as well as legal issues associated with organizations like the Recording Industry Association of America). Most P2P networks now operate on top of custom-designed proprietary protocols and their clients can easily operate on any port number - even HTTP’s port 80, making port-based detection schemes incapable of accurate and robust classification of Internet protocols.

To overcome the issues with port-based detection, a new technique has emerged based on payload-signature methods, in which packet payloads are processed for patterns or signatures that univocally identify any given protocol. One challenge facing payload-signature techniques on telecom networks is the high speed at which such pattern matching algorithms must be executed, e.g. 2.5Gbps (OC48) and above. It is therefore critical to design algorithms that can efficiently perform pattern matching while simultaneously dealing with memory and CPU limitations. Another key challenge is the lack of openly available, reliable protocol specifications. This is partially due to

proxy/NAT or has a public IP address. The destination IP addresses are not the same every time Skype runs, and the destination port numbers are also not standard.

• All communication via Skype is encrypted. This also means that phone numbers called (SkypeOut) or other data are also encrypted. In many cases, there is no direct communication between end users in Skype. All communication passes through intermediate nodes, and these nodes may be different for every call.

• Skype is a peer-to-peer protocol, which means that the peers (IP addresses) to which a Skype agent connects are many and the network is very dynamic, so these peers (and thus their IP addresses) keep changing.

• Skype provides voice, chat, file transfer and video services. It appears that all of these services are passed together, making it difficult to separate out voice, from chat, from video, etc.

To accurately detect and classify these unfriendly applications, it is necessary to provide a systematic methodology that overcomes the lack of well-known port numbers or user payload signatures. Instead, any new methodology should analyze flow connections at the transport layer (Layer 4) to extract and profile key features from the packet streams processed. Such a method could be referred to as “classification in the dark.”

article page | 1 | 2 | 3

© 2006, All information contained herein is the sole property of Pipeline Publishing, LLC. Pipeline Publishing LLC reserves all rights and privileges regarding
the use of this information. Any unauthorized use, such as copying, modifying, or reprinting, will be prosecuted under the fullest extent under the governing law.