As part of a broader organisational restructure, data networking research at Swinburne University of Technology has moved from the Centre for Advanced Internet Architecture (CAIA) to the Internet For Things (I4T) Research Lab.

Although CAIA no longer exists, this website reflects CAIA's activities and outputs between March 2002 and February 2017, and is being maintained as a service to the broader data networking research community.

netAI - Getting Started

Overview

This web page contains some simple examples on how to use netAI. The mentioned datasets and configuration files are part of the distribution. Download this as a PDF

Description

The Network Traffic based Application Identification (netAI) has been developed for identifying the end host applications that are responsible for traffic flows in the network. Unlike previous solutions that identify the application based on port numbers or packet payload (either through protocol decoding or signatures) netAI computes a variety of payload independent features (e.g. packet length statistics) for a traffic flow and uses machine learning (ML) techniques to identify the application that generated a particular traffic flow. ML is a discipline of the wider area of Artificial Intelligence (AI). Before netAI can be used to classify a particular application it must be trained on a representative set of traffic flows of that application. netAI can be used offline (reading packet data from tracefile) and online (live capturing on network interface). NetAI can be used on Linux and BSD operating systems.

This document is a getting started guide for netAI. As netAI is heavily based on two tools called Weka and NetMate we also provide some introduction to them. For installation instructions see the netAI Installation Guide.

Home

The official homepage for netAI is http://caia.swin.edu.au/urp/dstc/netai/. Please check this page for any software updates, FAQ etc. before contacting the developers.

The netAI package can be downloaded here: http://caia.swin.edu.au/urp/dstc/netai/download

Weka

Weka is a machine learning suite written in Java that contains implementations of a large number of machine learning algorithms. We use Weka to create classification models and classify data. Weka can be accessed using a GUI, command line or include as a library within Java software.

really good primer for using Weka on the command line, and also creating java applications around Weka, is located at http://alex.seewald.at/WEKA/ (February 2006).

The GUI version of Weka can be executed with the command:
java –jar weka.jar

netAI does not use the Weka GUI, although the command line syntax for netAI is similar to that used with Weka. Therefore is important to understand how to use Weka at the command line, and there are some nice examples on the aforementioned primer page to help. You might run Weka as follows:
java –Xmx1024M –classpath .:/path/to/weka.jar weka.classifier.to.use –flags

For example, run java –Xmx1024M –classpath /path/to/weka.jar weka.classifiers.trees.J48 for a list of options for the J48 classifier. It is possible to add weka.jar to your java classpath so that you do not need to specify classpath each time.

Datasets used in Weka are in the ‘ARFF’ file format, described at http://www.cs.waikato.ac.nz/~ml/weka/arff.html (February 2006). Example arff files can be obtained from the Weka homepage. We have included some example ARFF formatted files for testing with netAI, and it is important to note the relationship between the ARFF files and the statistics output from NetMate (and how to create ARFF files from NetMate output). 

NetMate

Typically NetMate is either used for live capturing on a network interface (e.g. eth0 on Linux) or reading data from a trace file (e.g. tcpdump). To tell NetMate what to do with incoming packet data a ruleset is needed. The netAI distribution contains an example ruleset in the etc subdirectory called netAI.xml. NetMate can then export statistics via TCP, UDP or write to file.

NetMate can be executed for capturing on a network interface directly:
netmate -i eth0 –r netAI.xml

Live capturing requires running NetMate as superuser unless the network interface has been made accessible for ordinary users. Alternatively NetMate can read packet data from a tcpdump or Endance Record Format file:
netmate –f somedumpfile –r netAI.xml

Both of these commands will make NetMate run in the current console. The NetMate distribution provides a wrapper to run NetMate in the background called nmrsh, for example the following command will start a live capture in the background:
nmrsh start -i eth0 -r netAI.xml

This can be stopped by executing the following command:
nmrsh stop

The manual explains configuration files and switches etc, and can be found at the NetMate homepage at:

http://www.ip-measurement.org/tools/netmate/index.php?p=documentation (February 2006)

Using netAI

Using NetMate and netAI Classifier Seperately

This example has been installed to the default installation directory install. The listing for the directory should look like this:
 
nwilliams@ccurp3:~/tools/netai/install> ll
drwxr-xr-x  2 nwilliams users    4096 2006-01-24 17:33 bin
drwxr-xr-x  4 nwilliams users    4096 2006-01-24 17:33 etc
drwxr-xr-x  3 nwilliams users    4096 2006-01-24 17:32 lib
drwxr-xr-x  3 nwilliams users    4096 2006-01-24 17:33 man
-rw-r--r--  1 nwilliams users  982822 2005-12-08
17:48 netmate-0.9.3.tar.gz
drwxr-xr-x  2 nwilliams users    4096 2006-01-24 17:33 share
drwxr-xr-x  3 nwilliams users    4096 2006-01-24 17:29 src
drwxr-xr-x  4 nwilliams users    4096 2006-01-24 17:32 var
drwxr-xr-x  5 nwilliams users    4096 2005-03-07 02:32 weka-3-4-4
-rw-r--r--  1 nwilliams users 8547671 2005-03-08
10:35 weka-3-4-4

Some directories of interest are listed below:

  •  bin contains the various executables/scripts for running netAI and NetMate 
  •  etc contains configuration files for NetMate and netAI.
  •   man contains the netAI manpage

You shouldn’t have to play with the configuration files for the moment, as it is likely you can accomplish what you want by changing the netAI-rules.xml file.

An example netAI rules file is shown below, and was used to create the example datasets from a tcpdump file:

<?xml version ="1.0" encoding="UTF-8"?>
<!DOCTYPE RULESET SYSTEM "rulefile.dtd">
<RULESET ID="1">
  <!-- global part is the default for all rules -->
  <!-- overwritten by rule specific configuration -->
  <GLOBAL>
     <ACTION NAME="netai_flowstats">
      <PREF NAME="Idle_Threshold">1000000</PREF>
    </ACTION>

     <EXPORT NAME="netai_socket">
      <PREF NAME="FlowID">yes</PREF>
      <PREF NAME="ExportStatus">yes</PREF>
      <PREF NAME="ExportHost">localhost</PREF>
      <PREF NAME="ExportPort">4837</PREF>
      <PREF NAME="ExportProtocol">tcp</PREF>
    </EXPORT>

       <EXPORT NAME="ac_file">
    <PREF NAME="Filename">/home/nwilliams/example_output.txt</PREF>
      <PREF NAME="FlowID">no</PREF>
      <PREF NAME="ExportStatus">no</PREF>
    </EXPORT>
     <!-- export interval in seconds -->
    <PREF NAME="Interval">10</PREF>
   </GLOBAL>
   <RULE ID="1">
    <!-- match all udp/tcp packets -->
    <FILTER NAME="SrcIP">*</FILTER>
    <FILTER NAME="SrcPort">*</FILTER>
    <FILTER NAME="DstIP">*</FILTER>
    <FILTER NAME="DstPort">22,53,8000</FILTER>
    <FILTER NAME="Proto">tcp,udp</FILTER>
     <PREF NAME="auto">yes</PREF>
    <PREF NAME="bidir">yes</PREF>
    <PREF NAME="FlowTimeout">60</PREF>
  </RULE>
 </RULESET>

There are several lines which are of interest.
<EXPORT NAME="netai_socket">

This configures the socket export features of NetMate.
 <PREF NAME="ExportHost">localhost</PREF>

The host to which statistics are exported.
 <PREF NAME="ExportPort">4837</PREF>

The netAI default port on which the ExportHost will be listening, which can be changed if desired.
<EXPORT NAME="ac_file">

Configure NetMate to write statistics to file.
 <PREF NAME="Interval">10</PREF>

Will provide updates to the classifier at 10 second intervals.
 <RULE ID="1">

Instructs NetMate about which flows should be exported (filters) and several other preferences. The various filters can be used to provide a more selective set of statistics, such as for particular hosts or ports. See below for an example of filter use.
<FILTER NAME="DstPort">22,53,8000</FILTER> 
<FILTER NAME="Proto">tcp,udp</FILTER>

These filters were used to isolate ssh, dns and http flows passing through the capture interface. The IP protocols are restricted to TCP and UDP, as these are the protocols used by the applications.

Now to see how to get everything up and running. First try running the netAI_CL.jar file independently of NetMate with the example files to get a feel for using the software, after which you can try out the netAI script examples.

Simple build and classify

This example will build a classifier from a training file and then classify some flows from a statistics file previously generated by NetMate. The results are written to file.

java –jar netAI_CL.jar weka.classifiers.trees.J48 -t example.arff -c 1 -A 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43 -s example.stats

The command can be broken down as follows:
java –jar netAI_CL.jar

Just pointing the JVM to the netAI jar file.
weka.classifiers.trees.J48

This tells netAI_CL that we will be using the J48 algorithm for classification
–t example.arff

This is our training dataset.
-c 1 –A 5,6,7...

-c 1 indicates that the first value after –A is the class index (in this case 5).  –A lists the attributes that you want to use from the data arriving from NetMate. These values are indexed from zero (0).

 Generally you will want to load the classifier from a model file rather than training it from scratch. Depending on the selected machine learning algorithm and the size of the training file, training a new model can take a very long time!

Build and then classify from a statistics file

This example builds a classifier from a model file and then reads in a statistics file generated by NetMate. We want to see the destination port of the flow printed with the prediction, so the –Y 5 argument is used. If you look at the example.stats file, you will notice that the destination port of the flow is the 5th column (starting from 0). You can include other attributes by entering their column in comma separated form. The file is output using -o foo

java –jar netAI_CL.jar weka.classifiers.trees.J48 -m example.model -t example.arff -s example.stats -o foo -c 1  –A 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43 –Y 5

You should see something like the following output:

netAI CL Version 0.1
Outputting features: 5
Output File: foo
Classification Algorithm: weka.classifiers.trees.J48

Model created in 0.1 seconds
Ready to receive from: example.stats
File Completed

Let’s look at some of the commands issued here:
-m example.model -t example.arff

We use a model file to build the classifier this time. The ARFF training file is still included as it contains some information needed by netAI (in fact the –t command should be used in all cases).
-o foo

This time the output is written to a file called’ ‘foo’, which is more useful than dumping everything to the console. Its just a regular text file that you can view using your favorite text editor.
-Y 5

The Y switch allows us to specify what flow information is printed next to the flow prediction. This information is basically the same information specified with –A. Here we simply print out the destination port of the flow. Generally you might want to also see the source/destination IP addresses and ports. 

Build your own model file

For convenience you can build a model from a training file using netAI. You can then use this model to perform testing in the future – without having to wait for training. We’ll make a J48 model, and for kicks we’ll include an algorithm specific parameter (unpruned tree). Remember, however, that whatever options you use when you build the model are permanent, that is you cannot change them at run time.

java –jar netAI_CL.jar weka.classifiers.trees.J48 –c 1 -t data/example.arff –B new-model.model

The main point of interest here is the –B option, which tells netAI that you want to build a model for the given filename. Don’t forget to include the class index in the training file. The output will look something like this:

netAI CL Version 0.1
Training File: data/example.arff
Classification Algorithm: weka.classifiers.trees.J48
Building model from training file, ignoring other parameters

Model created in 0.91 seconds
Saving model to file: new-model.model
Completed

 

The –B option overrides all other options (aside from –t), so remember not to leave it in place when performing tests. 

Build and then classify from tcp

This example builds a classifier from a model file and then waits for a tcp connection from NetMate. The predictions are printed to screen (you can always use the –o option if you want to classify to file). Use the –a option so that active flows are printed to the screen as well.

java –jar netAI_CL.jar weka.classifiers.trees.J48 -m example.model -t example.arff -c 1  –A 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43 –Y 5 -a

The output will look something like this:

netAI CL Version 0.1
Filename not specified, printing results to console
Outputting features: 5
Training File: example.model
Classification Algorithm: weka.classifiers.trees.J48

Model created in 0.89 seconds
Ready to receive from: TCP export

At this stage you will need to start NetMate. Assuming that the configuration file has been set correctly to export statistics via TCP, you might run NetMate using the following command (reads in from tcpdump file and exports via tcp):

netmate -f ~/dumpfiles/2005-12.06-vlan102.dump -r ../etc/netAI/netAI-rules.xml

You should then see some output similar to that shown below.

dns 1 53
dns 1 53
ssh 1 22

Here we have two DNS flows (dst port 53) and one ssh flow (22) correctly classified.

That’s basically it for running the jar file directly. An easier (although with fewer options) way to run netAI is using the netAI script, and there are some examples below. 

Running the netAI script

Build and then classify from tcpdump file

This example builds a classifier from a model file, then uses NetMate to read in data from a tcpdump file for classification. The netAI rulset used when reading from file is: etc/netAI/netAI-rules-stats.xml. The results are saved into results.txt

./netAI start -f 2005-12.06-vlan102.dump –m example.model -t example.arff -A 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43 -c 1 -o results.txt

The output will look something like this before you are returned to the prompt:

Algorithm: weka.classifiers.trees.J48
Attributes: 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43
Class index: 1
Input: 2005-12.06-vlan102.dump
Output: prediction,probability,2,3,4,5
Starting netAI
Extracting flows and features...
Performing machine learning...

Note that there are several default values when using the netAI script. For example, by default the J48 classifier is used (as we have found this classifier to work well with our data). 

Build and then classify from network interface

This example builds a classifier from a model file, then uses NetMate to read in data from a network interface. The results are saved into results.txt. The netAI rulset used when reading from file is: etc/netAI/netAI-rules.xml. You will probably need to be superuser to perform this operation.
 
./netAI start -i eth1 -m example.model -t example.arff -A 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43 -c 1 -o results.txt

The output will look something like this before you are returned to the prompt:

Algorithm: weka.classifiers.trees.J48
Attributes: 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43
Class index: 1
Input: eth1
Output: prediction,probability,2,3,4,5
Starting netAI...
NetMate running with PID 3162
Classifier running with PID 3143

After returning the prompt, netAI will continue to operate in the background. The netAI status command can be used to check the PIDs while the software is running. You can view the classifications as they occur by using tail –f results.txt.

Stop the classifier using the command netAI stop.

That’s all there is to running the script, as the syntax is very similar to using netAI_CL. Don’t forget that the usage can be printed out using netAI –h, while the manpage also lists the options you can use.

Notes

The example data provided with netAI is for introductory purposes and should not be considered suitable for use in real classification scenarios. In fact it is quite possible that when the supplied model is used with new data many of the classifications will be incorrect. For this tool to be useful you should use NetMate and Weka to produce datasets and models more specific to your needs. Using NetMate and Weka are outside the scope of this document, but learning to use these tools is strongly recommended.

Do not forget to check out the manpages for a description of each of the switches and options available.

Acknowledgements

This project has been made possible in part by a grant from the Cisco University Research Program Fund at Community Foundation Silicon Valley.

NetAI uses the WEKA (http://www.cs.waikato.ac.nz/ml/weka/) and NetMate (http://www.ip-measurement.org/tools/netmate/) software packages.  

Authors

Sebastian Zander (szander@swin.edu.au)
Nigel Williams (niwilliams@swin.edu.au)

Copyrights & License

netAI is released under the GNU public license (GPL) version 2. Please see the COPYING file included in the netAI package for details of this license.

Copyright 2005-2006 Swinburne University of Technology, Melbourne, Australia

netAI is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

netAI is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this software; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA

 

 

Last Updated: Tuesday 30-Aug-2011 16:12:11 AEST | Maintained by: Sebastian Zander (szander@swin.edu.au) | Authorised by: Grenville Armitage ( garmitage@swin.edu.au)