Monitoring Fast Metrics, Part 4/5, Mapping from IP to ASN

Background

Network effects are usually hard to see from the point of view of one client, but when we receive data from a number of clients on the same networks, it's much easier to see what's going on. Yet this begs the question - without knowing the details of every network in the world, how do we know which clients are on the same networks?

Even if we could collect it, knowing the specifics of every network would require managing a vast trove of information that went out of date the moment you collected it. Instead, we can observe that all of these networks are typically connected to each other at the boundaries of the organizations that built them, or "Autonomous Systems"; these systems pass routing information for each block of IP addresses they handle using a protocol called BGP. All of these Autonomous Systems are numbered with a surprisingly named "Autonomous System Number" or ASN, and large and medium sized ISPs typically have one or more of these ASNs.

Device to Server Across ASNs

Simplified Path from Device to Servers across 4 ASNs on the Internet

There is a wealth of information about how all of the public Internet is connected together, but if we simply wish to resolve the ASN from IP addresses collected in our logs, we only need to understand a very small part of it. While ASNs have boundary cases that cause trouble (such as a pair of routers in a garage in the UK that have a single ASN, or a mega multinational ISP that uses a single ASN globally), in their normal cases, they serve well for rolling up data on a per-network basis.

Today's Agenda

There are a number of commercial and public APIs that will do this work for you, but today we'll have a little fun exploring how this works and build our own. Simply stated, we will build a data table that lets us map from any IP address to an ASN, so that we can later aggregate our log data by ASN. To limit the complexity of this task, we are going to bake with the following ingredients today:

  1. A recent copy of the routing table of a single BGP enabled router (in MRT format),
  2. The RIPE bgpdump tool to read that table,
  3. A script to extract the latest route information,
  4. (for labeling) A list of the owner names for all ASNs, and
  5. We will only deal with IPv4 addresses.

The same techniques work with IPv6, but I would rather keep the implementation code simple and stick to the sort of 32-bit math that easily works in Javascript and save the big number math for another day.

The Pieces

I. Tools and data for BGP snapshots
Internet routing is normally passed from router to router between Autonomous Systems via the Border Gateway Protocol (BGP), after which the routers update their internal routing tables marking across which routes to send traffic for a given IP address. A number of the largest BGP routers export these routing tables in large "RIB databases", which are snapshots from that router's point of view of what the routing table looks like at any given time.

To create a basic IP to ASN lookup mapping from this data, let's pick up one of these RIB files and transform it into a format that's easier to query for our uses, namely discovering which ASN owns a given IPv4 address. I'll assume you're doing this on the same linux server that we set up for logging in Part 1, but even a $10 Raspberry Pi Zero W can handle the sorts of processing that we need to do.

Since we're only going to use one RIB file, let's pick up the one from Equinix, a fairly busy set of BGP routers. Choose a recent extract from routeviews that we'll use. For this article, I'll use this one:

# We only need one shell today:
mkdir bgpprocessing
cd bgpprocessing
wget ftp://archive.routeviews.org/route-views.eqix/bgpdata/2017.07/RIBS/rib.20170712.1600.bz2
bzip2 -d rib.20170712.1600.bz2

We're going to need a special tool in order to be able to read the RIB (specifically, binary MRT) files. Let's use the bgpdump tool from RIPE, the European IP address registry, and build it:

# find the latest commit from:
# https://bitbucket.org/ripencc/bgpdump/downloads/
# for example: 6a91fdb1c7568d0b08c3d33f08c8d23b7e2a842c
wget 'https://bitbucket.org/ripencc/bgpdump/get/6a91fdb1c756.zip'
unzip -x 6a91fdb1c756.zip
cd ripencc-bgpdump-6a91fdb1c756
./bootstrap.sh
make -j3
# we can check the binary with:
./bgpdump -T
cd ..

II. Understanding router entries
If you look at the output from bgpdump -m, you'll see a lot of single line records with the fields separated by the pipe operator ("|"). It is easier to work with this text output than to read the binary format itself, so let's look at a few records from this file:

./ripencc-bgpdump-6a91fdb1c756/bgpdump -m rib.20170712.1600 | head -n 5

A typical record will look like this:

TABLE_DUMP2|1499875200|B|206.126.236.47|19151|1.0.128.0/19|19151 174 2914 38040 9737|IGP|206.126.236.47|0|0|174:21000 174:22013 19151:1000 19151:61003 19151:65050|AG|9737 203.113.12.254|

Of which we're most interested in fields 6 and 7. Field 6 shows the network block (the lowest IP address in a range of IPs) along with the CIDR prefix which shows how big the block is. Field 7 shows us the path through ASNs to send traffic to an IP address in the range described by field 6. From our earlier line:

Field 6: "1.0.128.0/19". The network block starts at 1.0.128.0, and this block has 32 - 19 = 13 bits assigned to it, or 8,192 addresses. From this we can surmise that the range is 1.0.128.1 - 1.0.159.255.

Field 7: "19151 174 2914 38040 9737". From the point of view of this router, we need to send traffic through ASN 19151 to ASN 174 to ASN 2914 to ASN 38040 and finally to ASN 9737. We're going to make a gross simplification here and assume that the final ASN, or 9737, is the owner of any IP address in this range.

There is one major caveat to this process - IP address ranges described in each routing entry can (and often do) overlap with each other. Again, for simplification, we will assume that the tightest IP range (the most specific / smallest IP block that covers our IP) is the right one. Routes are replaced all day as links go up and down, so this really only reflects where traffic to that IP happened to be going across that router at that time, but that should suffice for our analysis. In the particular example of the first IP in the above range, 1.0.128.1, we will see that it's actually covered by 5 blocks in this RIB file:

# 0.0.0.0/0 on ASN 11039  <-- this is a catch all route that matches all IPs!
# 1.0.128.0/24 on ASN 23969
# 1.0.128.0/19 on ASN 9737
# 1.0.128.0/18 on ASN 9737
# 1.0.128.0/17 on ASN 9737

We will use 23969 here, since the /24 is the narrowest network block. Fortunately this matches what Hurricane Electric's IP to ASN service says, so we're in business!

Hurricane Electric IP to ASN lookup results

Looking up 1.0.128.1 on Hurricane Electric's website

III. Extracting IP ranges / ASNs to a TSV
Text processing is definitely not what I want to spend your time talking about, so instead I wrote a very simple node.js javascript to process the text output from bgpdump (including some globals and other code naughtiness); I had to strongly resist the urge to perl golf a solution, but I would love to hear what sorts of wonders you can do with RIBs and a compact one-liner!

Note that we convert all IPs in dotted quad notation ("1.2.3.4") into their underlying 32-bit numeric representation ("16909060"). Not only is this more compact, but it will make determining which ranges the IP address fits in easy to calculate (the "hit box" calculation), since we can just compare ordinary numbers instead of fussing around with human readable dotted quads.

# Save the RIB database as text in bgp.mformat:
./ripencc-bgpdump-6a91fdb1c756/bgpdump -m rib.20170712.1600 > bgp.mformat

# load up our JS
cat > asnextracttransform.js <<EOF
var fs = require("fs");
var readline = require("readline");

// borrowed from https://stackoverflow.com/a/23589224
function ipToInt(ip){
    var ipl=0;
    var splitIp = ip.split(".");
    if (splitIp.length != 4) {
        return undefined;
    }
    splitIp.forEach(function(octet) {
        ipl <<= 8;
        ipl += parseInt(octet);
    });
    return(ipl >>>0);
}

function intToIp(ipl){
    return ( (ipl>>>24) + '.' +
        (ipl>>16 & 255) + '.' +
        (ipl>>8  & 255) + '.' +
        (ipl & 255) );
}

var ipAsnMapping = {};

// get one entry for each netblock/size combo
function addRouteEntry(when, asn, netblock, cidrPrefix) {
    var key = "" + netblock + "-" + cidrPrefix;
    if (key in ipAsnMapping) {
        // replace if the next entry is newer or equal
        if (ipAsnMapping[key].when <= when) {
            ipAsnMapping[key] = {when: when, asn: asn, netblock: netblock, cidrPrefix: cidrPrefix};
        }
    } else {
        ipAsnMapping[key] = {when: when, asn: asn, netblock: netblock, cidrPrefix: cidrPrefix};
    }
}

function process(line) {
    var splitLine = line.split("|");
    if (splitLine.length < 6) {
        return;
    }
    //var routePublishTime = new Date(splitLine[1] * 1000);
    var routePublishTime = Number(splitLine[1]);
    var cidrBlock = splitLine[5];
    var netblock = cidrBlock.split("/")[0];
    var blockSize = Number(cidrBlock.split("/")[1]);
    var pathList = splitLine[6].split(" ");
    var finalAsn = 0;
    if (pathList.length > 0) {
        finalAsn = Number(pathList[pathList.length-1]);
    }

    var netblockAsInt = ipToInt(netblock);
    if (typeof netblockAsInt !== "undefined") {
        addRouteEntry(routePublishTime, finalAsn, netblockAsInt, blockSize);
    }
}

function unrollMappingForLookup() {
    Object.keys(ipAsnMapping).forEach(function(key) {
        var entry = ipAsnMapping[key];
        var numberOfAddressesInBlock = Math.pow(2, (32 - entry.cidrPrefix));
        var lastAddress = entry.netblock + numberOfAddressesInBlock - 1;
        console.log([entry.when, entry.asn, entry.netblock, lastAddress, entry.cidrPrefix].join("\t"));
    });
}

function processReadline(filename) {
    var rl = readline.createInterface({
        input: fs.createReadStream(filename)
    });

    rl.on("line", process);
    rl.on("close", unrollMappingForLookup);
}

processReadline("bgp.mformat");
EOF

# process it!
node asnextracttransform.js > iptoasn.tsv

If everything went right, we should now have the IP to ASN tab separated values in iptoasn.tsv, that looks like:

1499875200      11039          0      4294967295         0
1499875200      56203   16778240        16779263        22
1499875200      56203   16778240        16778495        24
1499875200      56203   16778496        16778751        24

If you wanted to get fancy, you could have the script read the RIB file directly and output the data directly to the RDBMS, but fancy is for another day.

IV. Querying the data with SQLite3
With the TSV file, we can load up a simple SQL database with the data and perform some queries. sqlite3 is available on nearly every modern OS out of the box, and it doesn't require any setup work, which is quite convenient:

# still at the shell:
sqlite3 iptoasn.db

-- in the sqlite interpreter:
CREATE TABLE ipasnmap (
  ts INTEGER NOT NULL,
  asn INTEGER NOT NULL,
  netblock INTEGER NOT NULL,
  lastipinblock INTEGER NOT NULL,
  cidrprefix INTEGER NOT NULL,
  PRIMARY KEY (netblock, lastipinblock)
);

.separator "\t"
.import 'iptoasn.tsv' ipasnmap

If we want a handy way to convert dotted-quad IP addresses to their 32-bit numeric representation, we can reuse the ipToInt function in the extract / transform code above and make a simple convertip tool:

# in another shell:
cat > convertip.js <<EOF
// borrowed from https://stackoverflow.com/a/23589224
function ipToInt(ip){
  var ipl=0;
  var splitIp = ip.split(".");
  if (splitIp.length != 4) {
      return undefined;
  }
  splitIp.forEach(function(octet) {
      ipl <<= 8;
      ipl += parseInt(octet);
  });
  return(ipl >>>0);
}

console.log(ipToInt(process.argv[2]));
EOF

node convertip.js 8.8.8.8
# should get the output "134744072"

Then we can use that to query the table, setting the IP number based on the output of convertip.js:

-- back in the sqlite3 interpreter:
SELECT asn FROM ipasnmap
WHERE 134744072 >= netblock
  AND 134744072 <= lastipinblock
ORDER BY cidrprefix DESC,ts DESC
LIMIT 1;

and there we have it!

Next Time

There's a large number of ways to improve the quality of our IP to ASN data, such as by loading multiple route views, analyzing bogon routing, improving subnet aggregation, QCing the data, etc, but if you require that level of precision, I'd consider using one of the aforementioned available alternatives. In the next article, we will tie our new IP to ASN mapping function together with our kafka logs and look at how we can analyze field data.

Articles: