Rapid detection of BGP anomalies
Overview
BGP updates should reflect the underlying BGP network states and policy changes. When there is no change in
network topology or routing policies, there should no BGP updates. However, real-world BGP update traffic is of a substantial volume that is much larger than might be expected. There is
substantial background traffic consisting of route announcements followed soon after by withdrawals that do not appear related to underlying network management decisions or events.
In this environment it can be difficult to define what is meant by anomalous BGP traffic. The simplest definition might be any update that does not reflect a change in the underlying BGP network or
routing policy. However, not all anomalous BGP traffic is necessarily harmful. For example, BGP traffic generated by route flapping is anomalous using this definition. Although a waste of resources
such behaviour is not necessarily a threat to BGP's operation. Consequently, we differentiate between BGP traffic which does not threaten BGP's ability to disseminate accurate network
information and harmful anomalous traffic that does.
We define BGP traffic generated by the first type as an instability. We refer to the second type as anomalous traffic. For example,
route flapping where routes are repeatedly announced and soon after withdrawn may cause long term instabilities while traffic engineering involving a change in routing policy such as
preferred paths may result in short term instabilities. Neither is a direct threat to the ability of BGP to communicate reachability information. However, a misconfiguration by BGP
router operators can result in announcing used and/or unused prefixes with potentially disastrous consequences for Internet availability and reliability. In addition to waste resources, unstable BGP traffic has the effect of masking anomalous traffic that indicates potentially harmful accidental or deliberate disruptions.
A technique is needed that can rapidly distinguish between normal background and potentially harmful traffic.
The detection approach of this project is based on the use of Recurrence Quantification Analysis (RQA) to detect BGP anomalies. RQA is a way of extracting hidden information from statistics of dynamic systems. In our
work (Detecting BGP Instability Using Recurrence
Quantification Analysis (RQA)) we have successfully used RQA to rapidly detect BGP instability caused by a high volume of BGP updates as well as hidden abnormal behaviour that may otherwise have passed without observation.
BGP Dataset
We use publicly available BGP control plane datasets to model BGP speakers and detect BGP anomaly. We refer to BGP control plane's traffic as BGP traffic. BGP traffic can be obtained from public download repositories such as RouteViews project
and RIPE NCC. RouteViews collects BGP updates from many sites in north America and has provided BGP
data since 2001, while RIPE peers with many sites in Europe and has provided BGP data since 1999. The total numbers of collectors and peers change over time as a result of adding/removing some vantage points (VPs). The RouteViews
repository provides BGP updates every 15 minutes and BGP routing tables every 2 hours. Until June 2003, RIPE was providing offline BGP updates every 15 minutes with BGP routing tables every eight hours. From 2003 it
offered BGP updates every 5 minutes.
Each of these repositories has multiple collectors which run BGP sessions with several routers, referred to as monitors, in many networks. Figure 1 show an example for BGP topology of the VP rrc12 at RIPE NCC
which was peered with 80 peers on 25th of July 2016. In this example, AS2914 and AS4589 represent peers, AS28573 is a source AS, and AS3257, AS174, AS6453, AS3356, AS4230 are intermediate ASes. When AS28573 sends
a BGP update, AS2914 may receive multiple copies of this update with different paths. For example, when AS28573 announces the IPv6 prefix 2804:14d:908a::/48, AS2914 will receive this prefix with three paths
[4230, 28573], [3356, 4230, 28573], and [6453, 4230, 28573] while the VP rrc12 will receive only one BGP update which is the best route for AS2914 (the path [2914, 4230, 28573] when no routing policies applied). However,
if AS28573 periodically announces then withdraws a prefix during some seconds because of misconfiguration or other causes, the VP rrc12 may not receive the withdrawn message but an update message with an alternative path.
Figure 1 Simple BGP Topology
Detection Approach
Our approach makes use of RQA by first extracting suitable BGP features and then calculating RQA measurements based on those features. Significant variations in the RQA measurements can indicate a change in 'normal' behaviour
that represents BGP anomaly. Multiple BGP features are used in this project such as number of announcements (A), number of withdrawals (W), total number of BGP updates (V), maximum AS-PATH length (M), and the average length of AS-
PATH (AV) calculated every second. These features are extracted continuously from BGP update messages. RQA measurements change in different ways based on changes in the input signal. To measure the changes in BGP features
that detect anomalies, we use RR and V-entr measurements to measure changes in maximum AS-PATH length and average length of AS-PATH while TT, T2, V-entr, and L-MEAN to measure changes in number of announcements,
withdrawals, and total BGP updates. Figure 2 shows the architecture of a system that could be constructed based on our approach, where a significant change in any of RQA measurements indicates a BGP anomaly.
Figure 2 Anomaly Detection Approach