netAI - Example Results
Overview
This page
describes some of the experiments we have done using netAI and the
results obtained. We only present some key results on this page.
Please fllow the links to the tech reports for more detailed
information.
Detecting Game Traffic
Overview
The Internet is experiencing an increase in the use and
commercialisation of interactive applications such as telephony and
online gaming. Online gaming in particular is expected to become a
large source of income, through either subscription-based games or
dedicated gaming services. Internet Service Providers may also
charge a premium for Quality of Service (QoS)-enhanced accounts
targeted at gamers.
Highly interactive online games, such as First Person Shooter
(FPS) games, have a narrow tolerance to network issues such as
delay, jitter and packet loss necessitating more rigid QoS compared
to the best effort service used for traditional Internet
applications such as web or email. In order for QoS to be effective
however, an accurate and timely method of identifying and
classifying network gaming flows is required. As it is unlikely
that game applications will ever explicitly signal their QoS
demands to the network, the network must identify game flows and
establish adequate QoS for these flows. Once highly interactive
game traffic can be identified it can be given a higher priority
over other traffic in the network.
We evaluate the performance of several machine learning algorithms
for separating network games from generic (i.e. common) network
traffic. Although this is not the main focus we also investigate
how effectively different games can be separated from each
other.
We find that some algorithms are able to separate the different
games from each other and other traffic with very high (>99%)
accuracy. We also find that all of the ML techniques seem to
be fast enough for real-time classification of a fairly large
number of simultaneous flows (at least several thousands per
second). Furthermore, most of the algorithms can train fast enough
to allow for frequent updates of the classifier (training took no
more than half an hour).
Tech Report
A much more detailed tech report will be soon
available for download here.
Features
We classify packets to flows based on source IP and source port,
destination IP and destination port. Flows are bidirectional and
the first packet seen by the classifier determines the forward
direction.
Flows have limited duration. UDP flows are terminated by a 60
second flow timeout, while TCP flows are terminated upon proper
connection teardown (TCP state machine) or after a 60 second
timeout (whichever occurs first). We consider only UDP and TCP
flows that have at least 1 packet in each direction and transport
at least 1 byte of payload.
We distinguish active and idle periods of flows by using an idle
threshold, which is 1 second by default. Periods where no packets
are observed for 1 second or more are treated as idle periods.
We compute the following features: protocol, duration, volume in
bytes and packets, averagesub-flow volume in bytes and packets per
active period (subflows are active parts of a flow, see below),
number of packets with push flag set (only for TCP flows –
always 0 for UDP), packet length (minimum, mean, maximum, standard
deviation), inter-arrival times (minimum, mean, maximum, standard
deviation) active and idle times (minimum, mean, maximum, standard
deviation). Aside from protocol and duration all features are
computed separately in both directions of a flow. Packet length
derived features are based on the IP length excluding link layer
overhead. Inter-arrival times are computed with microsecond
precision.
Dataset
For our evaluation we use gaming data captured by members of
CAIA and some Command and
Conquer traffic captured by
Mark Claypool.
‘Generic’ traffic examples were taken from several
publicly available traffic traces. We predominantly focus on First
Person Shooter (FPS) games as these fast-paced games have the most
stringent QoS requirements. However, we have also included some
data of a Real Time Strategy (RTS) game. The games tested were from
the PC and Xbox platforms. It is important to include Xbox traffic
as current and next-generation console devices such as the Xbox 360
and PlayStation 3 are expected to produce a significant share of
online gaming traffic. The following table summarises the different
games.
We also use a large number of other (non-game flows) taken from
different public trace files available from
NLANR. The other class consists
mostly of web, peer-to-peer (eDonkey, Kazaa), mail (SMTP, POP) and
DNS traffic.
Table
1: Traffic classes used in
experiments
Class
|
Description
|
Genre/platform
|
CCG
|
Command and Conquer:
Generals
|
RTS / PC
|
HL1
|
Half-Life: Death
Match
|
FPS / PC
|
HL2-DM
|
Half-Life 2: Death
Match
|
FPS / PC
|
HL2-CS
|
Half-Life 2: Counter
Strike
|
FPS / PC
|
Q3
|
Quake 3 Death
Match
|
FPS / PC
|
TS
|
Time Splitters
|
FPS / X-Box
|
HALO
|
Halo: Death
Match
|
FPS / X-Box
|
HALO2
|
Halo 2: Death
Match
|
FPS
/ X-Box
|
Other
|
Common network protocols e.g.
HTTP, DNS, P2P
|
|
For further details please refer to the
tech
report.
Results
We use the standard metrics of precision and recall (see the
defintion in the
tech report). We compute
precicision and recall not only based on the number of finstances
but also on the byte volume to evaluate the classification
performance for large traffic flows.
First we evaluate if different algorithms are able to separate the
game traffic from the non-game traffic. We used the following
algorithms: C4.5, Naive Bayes, Bayesian Networks and NBTree. Figure
1 shows the precision and recall across averaged across the two
classes for both number of instances and number of bytes. The
results show very high precision and recall (>99%) for C4.5,
NBTRee and Bayes Net (except volume). Naive Bayes performs
worst.
Figure 1: Mean
precision and recall for game vs. non-game classes
We also evaluate if the same algorithms can separate between the
different games (and the non-game traffic). Figure 2 shows the mean
precision and recall for both number of instances and number of
bytes. Mean precision and recall is slightly less but still high
(~95%) for C4.5 and NBTree. The Bayes Network performs not as good
(~91%) and Naive Bayes again performs far worse than any of the
other algorithms.
Figure 2: Mean precision and recall for
individual game traffic classes vs. non-game traffic
Figure 3 shows precision and recall
for all classes based on the number of instances (top) and byte
volume (bottom) instances for the C4.5 classification algorithm.
Instance-based metrics are very good for all classes except for
Time Splitters (which is the smallest class). Byte-based
performance is also very good although there is a reduction for
some classes. The reduced precision and recall for the HL2-based
games is due to a number of HL2-CS in-game flows being
misclassified as HL2-DM. A number of Q3 flows with larger volumes
were also classified as HL2-DM.
Figure 3: Precision and recall for
individual game traffic classes based on number of instances (top)
and byte volume (bottom)
Figure 4 shows the classification performance in number of
instances per second for all algorithms. It shows the performance
of classifiers trained for 2 classes (game vs. non-game traffic)
and classifiers trained on each game as separate class. As expected
classification performance is better for fewer classes for all
algorithms. All classifiers are reasonably fast for only 2 classes
but only C4.5 and Naive Bayes can classify a larger number of flows
per second when trained on all individual games separately. C4.5 is
the fastest algorithm and even the classifier trained on individual
games can classify ~90,000 flows per second.
Figure 4: Classification
performance of the different algorithms
For further results please have a look at the
tech
report.