IP subnet filtering with Python |

In this blog post I want to share, and talk about, Python program I wrote to solve problem of filtering out child prefixes.

Problem description
Solution
- Naïve algorithm
- Patricia tree aka binary radix tree
Python implementation
- Load testing
- Module and cli version
Closing thoughts
References

Problem description

So what exactly is the problem we're trying to solve?

Imagine that you have a group of prefixes, assigned to servers, or clients, and you want to only keep prefixes that are not subnets of any other prefix. This could be because of your routing requirements or because you want to minimise number of security rules.

Concretely, given the below example prefixes:

10.0.0.0/25
10.0.0.0/29
10.0.0.8/29
10.0.0.16/29
10.0.0.128/26
10.0.0.160/27
10.0.0.192/26
10.0.0.248/30

We only want to be left with the ones that are not contained within any other prefix on the list:

10.0.0.0/25
10.0.0.128/26
10.0.0.192/26

There are a number of use cases for getting rid of child prefixes but hopefully the ones I presented help in illlustrating the idea.

Solution

Now that we know what we're trying to solve, let's talk about how we're going to do it.

Naïve algorithm

The simplest, and also least efficient, algorithm that comes to mind is to loop over all prefixes and then for each prefix in the outer loop have inner loop, again going over all prefixes.

Inside inner loop we check if the outer and inner prefixes are different, since each prefix is a subnet of itself.

If outer prefix is a subnet of any other prefix in the inner loop we stop iteration, otherwise we've checked outer prefix against all other prefixes and we can safely mark it as not being a subnet of any other.

It is clear from the pseudocode included below that this algorithm is O(n^2) since outer loop executes n-times and inner loop also executes up to n times. In other words, this is not stellar and will be slow even with n being fairly small.

Below algorithm also won't correctly deal with duplicate prefixes as the equality is checked between values and not indexes but we won't concern ourselves with that since this is just to start the discussion.

for opfx in pfxs:
    for ipfx in pfxs:
        if opfx == ipfx:
            continue
        if  opfx.subnet_of(ipfx):
            break
    else:
        no_children.append(opfx)

Ok, now that we know that the first idea is not so great after all, we'll move to a more promising solution.

Patricia tree aka binary radix tree

Many problems in computing keep coming up in different scenarios and there already are libraries out there that can help us solve these problems in an efficient way.

In our case I'll use an implementation of Patricia tree, also known as binary radix tree, a data structure commonly used in implementing routing tables ^[1]. Since routing tables rely on quick look ups of longest prefix match, we should be confident this data structure will lead to a faster algorithm.

For my little Python application I used pytricia library ^[2] which also provides very helpful methods "children" and "parent". With these in hand we can now proceed to building our algorithm.

New solution, using patricia trees, proceeds as follows:

We will have a loop going over all prefixes. Inside of a loop we add each of the prefixes to the tree.

	pyt.insert(p, p)

After insertion we check if newly inserted prefix has a parent, and if so, we delete it from the tree:

if pyt.parent(p):
    del pyt[p]

If the new prefix has no parent then we check if it hasn't become a parent itself. This can happen because we might have already had some prefixes in the tree.
In the case that this prefix is a parent to any other prefixes, we retrieve them and delete one by one.

elif pyt.children(p):
    for child in pyt.children(p):
        del pyt[child]

Finally, once the loop completes, the only prefixes left on the tree will be the ones without any parents. Which is exactly what we were looking for!

The idea behind the solution is that instead of using brute force we can adjust an existing, very efficient, algorithm to our needs. This results in a faster code. We also get to learn about one of the algorithms powering networking, something that I personally find fascinating.

To top it off we get full IPv4 and IPv6 support out of the box, no need to worry about any of the details!

Python implementation

Below is the full implementation of the function taking iterable with prefixes and returning list with removed subnet prefixes.

from typing import Iterable, List

import pytricia


def filter_out_subnets(prefixes: Iterable[str], ipv6=False) -> List[str]:
    """
    Goes through prefixes and filters out any prefix that is a subnet of any other
    Uses Patricia Tree for efficient prefix lookups
    :param prefixes: iterable with prefixes
    :param ipv6: set to True for IPv6 prefixes
    :return: list of filtered out prefixes
    """
    if ipv6:
        pyt = pytricia.PyTricia(128)
    else:
        pyt = pytricia.PyTricia()
    for p in prefixes:
        pyt.insert(p, p)
        if pyt.parent(p):
            del pyt[p]
        elif pyt.children(p):
            for child in pyt.children(p):
                del pyt[child]

    return [pyt[p] for p in pyt]

Below are examples for IPv4 and IPv6, showing completed code in action:

>>> ippfxs = ["10.0.1.0/24", "10.0.1.128/25", "10.0.1.192/28", "10.0.2.0/24"]
>>> ippfxs_filt = filter_out_subnets(ippfxs)
>>> print(ippfxs_filt)
['10.0.1.0/24', '10.0.2.0/24']

>>> ipv6pfxs = ["2001:db8::/48", "2001:db8:0:a::/64", "2001:db8:0:b::/64", "2001:db8:1:a::/64"]
>>> ipv6pfxs_filt = filter_out_subnets(ipv6pfxs, ipv6=True)
ipv6
>>> ipv6pfxs_filt = filter_out_subnets(ipv6pfxs, ipv6=True)
ipv6
>>> print(ipv6pfxs_filt)
['2001:db8::/48', '2001:db8:1:a::/64']

Load testing

I like testing my stuff, so of course I had to run some load testing against the code. I did it for randomly generated IPv4 prefixes, getting the following results:

It took 0.00205s to process 1,000 random prefixes.
It took 0.01696s to process 10,000 random prefixes.
It took 0.13466s to process 100,000 random prefixes.
It took 1.32733s to process 1,000,000 random prefixes.

As a final test I downloaded full Internet routing table for IPv4 and fed it to my little gem:

Now something more fun. Full Internet table!
It took 1.12990s to process 788,533 Internet prefixes.
After filtering out subnets we are left with 367,254 prefixes.

Module and cli version

In my GitHub repository you can also find the above function packaged into a module ^[3]. GitHub version can be used as a standalone program or you can import it into your own program. More detailed instructiona are included in the README in the repo.

Closing thoughts

Seeing as I'll probably only ever deal with prefixes in low 1000s this might be an overkill. I do however believe that sometimes it's worth to spend a bit more time to better understand the problem and make sure our solution will stand the test of time.

I also enjoy little diversions resulting from researching the problem. Learning new things and finding out more about algorithms and mathematics driving computing, and by extension computer networks, is time well spent.

References

Sklower, K. (1991). A Tree-Based Packet Routing Table for Berkeley Unix. USENIX Winter. Available at: http://www.cs.columbia.edu/~ji/F02/ir04/routing.pdf ↩︎
Pytricia Python module. Available at: https://github.com/jsommers/pytricia ↩︎
Subnet-removal GitHub repo. Available at: https://github.com/progala/subnet-removal ↩︎

TTL255 - Przemek Rogala's blog

Computer Networks, Python and Automation

IP subnet filtering with Python

Contents