Centre closure

As part of a broader organisational restructure, data networking research at Swinburne University of Technology has moved from the Centre for Advanced Internet Architecture (CAIA) to the Internet For Things (I4T) Research Lab.

Although CAIA no longer exists, this website reflects CAIA's activities and outputs between March 2002 and February 2017, and is being maintained as a service to the broader data networking research community.

NetSniff

Anonymisation Features

One of the key difficulties in capturing and analysing network data is that of privacy. This leads to a Catch-22 situation - in order to capture unbiased data, we need the participants involved in generating that data to behave as they normally would. However, if the data generation participants know that their online activity is being monitored, this could affect their typical behaviour. While it is not proper to capture raw network traffic traces without informing the users, it is possible to overcome privacy concerns through anonymisation of captured data.

By randomising the values of user identifiable information, we can protect the privacy of network users. While we are throwing away some data, it is imperative to remember that more often we care more about the type of Internet services accessed, frequency, times of day than we do about the actual server accessed or the actual content downloaded (the fact that an image of 100kB is downloaded is more important than knowing that the image was called "my_dodgy_image.jpg").

Netsniff provides a real-time anonymisation facility. As packets are read and parsed for protocol information, netsniff can anonymise any identifying information, thus protecting the identity of users and their online activity. Further, this anonymisation is consistent, meaning we can determine correlation between data sets (such as correlating a DNS lookup with later HTTP transactions) or determining the number of emails sent to a particular (anonymised) email address.

How To Invoke Anonymisation?

Anonymisation is enabled via the 3 command line arguments (-a, -m and -k). The most important of these is the -a argument which actually enables anonymisation of collected data, the remaining two flags tell netsniff how to anonymise its captured information. The -k option specifies a file in which to read a key to pre-initialise the anonymisation algorithms while the -m option specifies which of three algorithms (cryptopan, nullip or tcpdriv) will be used to anonymise IP Addresses.

IP Address Anonymisation Modes

Netsniff offers a number of different anonymisation modes which can be specified on the command line. Each option causes anonymisation to be performed in a slightly differing manner.

CryptoPan Anonymisation

The CryptoPan library is used with either a specified key (or a default key of "0") to anonymise IP addresses.

Advantages Disadvantages

IP Addresses are anonymised consistently across separate execution runs of the netsniff application.

If the key used is available, it is possible to reverse the IP Address anonymisation and retrieve the original IP Addreses, thus destroying the desired privacy.

NullIP Anonymisation

All IP addresses are replaced with the IP Address 0.0.0.0.

Advantages Disadvantages

Pure anonymisation is achieved, since all IP Addresses will map to the same IP Address, there is no way to reverse the process and retrieve the original information.

It is impossible to determine the number of different servers or IP devices accessed.

It is impossible to differentiate IP addresses on one side of the device running netsniff and other side (client vs. Internet).

Since all IP Addresses will map to the same IP Address, there is no longer the ability to correlate IP Addresses across different applications or data sessions.

Tcpdpriv Style Anonymisation

Netsniff implements an anonymisation scheme similar to that used by the -A50 option on tcpdpriv. This algorithm is a prefix-preserving one - a thorough analysis of prefix-preserving IP address anonymisation is presented here The approach is table based, keeping a lookup table in memory, which is built up gradually as IP addresses are anonymised. This is defined nicely here as:

Suppose that we have a set of <raw, anonymised> binding pairs of IP addresses. To anonymise an IP address (a = a₁a₂...a_n) we first find the pair <x, y> with x = x₁x₂...x_n and y = y₁y₂...y_n with the longest prefix match k on a and x.
a is anonymised to b = b₁b₂...b_n where b₁b₂...b_k = y₁y₂...y_k and b_k+1b_k+2...b_n = rand(0 ... 2^n-k - 1). In netsniff rand() is an alternating series of 0 and 1 bits. Finally the new anonymised address is added to the binding table.

As described here this is problematic - since the anonymisation depends on the traffic sniffed and the order in which IP addresses are seen, it is inconsistent over multiple netsniff sessions. In terms of security this also proves that this algorithm is as robust as is possible for prefix preserving IP address anonymisation.

Advantages Disadvantages

Correlation is possible between multiple networked applications accessing the same server or information about the same server.

Security and reverse address mapping is as robust as possible for this type of anonymisation algorithm.

Network locality and subnet information is not lost due to the prefix-preserving nature of the algorithm

Mapping of IP addresses are not consistent across multiple execution runs of netsniff.

If some address mappings are known, it is possible to retrieve some other partial address mappings, see here for a possible avenue of attack.

String Anonymisation

All other anonymised fields are anonymised using a String Anonymiser. This is initialised with a key specified using the -k command line option or a default random key. String anonymisation is used in output of the ARP, POP3, SMTP, FTP and DNS protocols. The algorithm uses a secure hash function.

As for IP Address Anonymisation, string anonymisation is consistent given the same input string. If a key file is specified on the command line (-k) then it also remains consistent across multiple execution runs of netsniff. This consistency allows correlation of interesting data - eg. while a destination email address will be anonymised, it is possible to see the number of different emails to the same address since that email address will be consistently anonymised.

Last Updated: Tuesday 26-Jun-2007 16:05:51 AEST | Maintained by: Jason But (jbut@swin.edu.au) | Authorised by: Grenville Armitage ( garmitage@swin.edu.au)