Inverted Capacity Extended Engineering Experiment (ICE3)
Traffic Classification
Traffic classification is often mentioned in the context of prioritisation. On
non-broadband links, prioritisation was unimportant as the link capacity was too
low to truly support multiple concurrent network flows. As broadband became more
widely adopted, through the availability of ADSL and DOCSIS technologies, managing
network traffic became more important. Concurrent networked applications often
interacted with each other causing both real, and user-perceived, issues with
network and application performance. These issues became evident when traffic was
competing for the contrained bandwidth over the broadband last-mile link.
In order to improve outcomes for both network operators and users, prioritisation
schemes would be deployed to manage which traffic had priority when accessing the
constrained link.
It could be argued that in a high-speed broadband
environment traffic classification and subsequent prioritisation is no longer
relevant, however their are many reasons that traffic classification is still a
useful tool.
Classification and Prioritisation in a High-Speed Broadband Environment
In an ICE3-type environment, the last-mile link may no longer be the
bottleneck link. This does not imply that bottlenecks no longer exist in the
network, just that they move to an alternate location, possibly bringing a new
set of issues. We expect that it may still be relevant to deploy traffic
prioritisation schemes within the networks of the future.
Even so, there will always be scope for obtaining an understanding of what
applications are running over existing and future network links. A better
understanding will lead to better management of existing networks, and planning for
future deployments.
Classification for Analysis Purposes
We have previously discussed why we need to perform an analysis
on network traffic how we can process the data. However as more bandwidth becomes
available to users, new types of networked applications - beyond web and email -
are becoming more prevalent.
These applications include real-time media based applications (such as VoIP and
Skype), gaming applications, and Peer-2-Peer
(P2P) applications (such as BitTorrent). These
newer applications also use more complex protocols, that often deploy random port
numbers and encrypted data payloads (for privacy and security reasons). This means
that it can be difficult to determine which network traffic is generated by these
applications.
We need to turn to more advanced traffic classification - some of which have been
developed by CAIA - using Machine Learning techniques. These techniques allow for
classification of unknown traffic, without having to analyse the packet contents
in detail to determine the generating application type. This also means that less
processing power is required to perform the analysis.
The outcomes of traffic classification can be used to allocate traffic to different
applications and application types on the network. This information can be used to:
- Further investigate how different application classes interact with each other
in different parts of the network
- Better determine which applications are being used in the network
- Observe how the availability of high-speed broadband changes the mix of
network traffic generated
Traffic Classification Using Machine Learning Techniques
Applications such as Skype and
BitTorrent utilise random port numbers and may
encrypt data payloads. This makes it impossible to classify traffic based solely
on source/destination information, or on the contents of the network communications.
As such, we have to consider alternate means to determine which application class
the network traffic belongs to.
CAIA has pioneered work into real-time classification of network traffic using
statiscal based Machine Learning techniques. Using this approach, statistical
properties of known application class traffic are calculated to obtain a
"fingerprint" of the traffic properties. This fingerprint is then provided
to a Machine Learning tool in a Training Stage to build a classifier. Later,
we calculate the same statistical properties of unknown traffic which the previously
built classifier can use to determine which application generated the traffic.
This approach was later modified to enable classification before the network flow
had terminated, essentially allowing real-time, or near real-time, classification.
The initial work in this area was performed by Dr.
Thuy Nguyen as part of her PhD. This work has been further developed in a number
of projects undertaken at CAIA:
- DSTC - Is a Cisco funded
(URP) program to explore the potential
of using Machine Learning techniques to develop a dyanmic traffic classification
system
- ANGEL - Within the scope
of the SIT-CRC funded by the Australian government, ANGEL provides a prototype
distributed traffic classification system that can use a generic ML-based classifier
to classify unknown traffic and provide instruction to registered nodes (typically
gateway routers/modems) to automatically prioritise flows which match a particular
classification
- LIFE - Looks primarily at Lawful
Interception on network traffic. A lot of the work on Skype and VoIP classification
also finds a home here
- DIFFUSE - Is a Cisco funded
(URP) program to build a practical implementation
of a traffic classification and prioritisation framework for the FreeBSD Operating System.
Within the scope of ICE3, we are currently using these traffic classification
techniques to develop models to successfully classify BitTorrent, Skype and other VoIP
network traffic.
|