IP Address Validation: Principles and Best Practices

IP Address Validation: Understanding How to Work with IP Addresses

Any application that interacts with a network in any way must validate the correctness of IP addresses. This is more complicated than it might seem. It’s easy to go to extremes: with overly strict validation, users won’t be able to enter valid data; with insufficient validation, they’ll be left dealing with low-level error messages (if they’re even shown at all). In this article, we’ll look at some of the challenges that arise when validating addresses, and then review some ready-made libraries that can help.

Address Validation

Errors in addresses can occur in three ways:

  • Typos
  • Misunderstandings
  • Intentional attempts to break the application

Address validation alone won’t protect you from attempts to break your application. It can make such attempts more difficult, but it’s no substitute for proper authorization checks and error handling at every stage of your program. So, improved security should be seen more as a useful side effect. The main goal is to make life easier for users who accidentally enter an incorrect address or misunderstand what’s required.

Checks can be roughly divided into format checks and substantive checks. The goal of a format check is to ensure that the string entered by the user could possibly be a valid address. Many programs stop here. We’ll go further and see how to check not only that an address is valid, but also that it’s appropriate for a specific purpose—but more on that later.

Format Checks

Checking the correctness of the format may seem like a job for a simple regular expression—but in reality, it’s not that easy.

With IPv4, the difficulties start with the standard for this format—there isn’t one. The dot-decimal format (0.0.0.0–255.255.255.255) is widely accepted, but not standardized. The IPv4 standard doesn’t mention address notation at all. No other RFC specifies the format for IPv4 addresses either, so the commonly used format is just a convention.

And it’s not the only convention. The inet_aton() function allows you to omit trailing zero octets, for example, 192.0.2 = 192.0.2.0. It also allows entering the address as a single integer, e.g., 511 = 0.0.1.255.

INFO: Can a host address end in zero? Of course it can—in any network larger than /23, there will be at least one such address. For example, 192.168.0.0/23 contains host addresses 192.168.0.1–192.168.1.254, including 192.168.1.0.

If you limit support to only the full dot-decimal format with four groups, without allowing omitted zero octets, the expression (\d+)\.(\d+)\.(\d+)\.(\d+) can catch a significant number of typos. If you want, you can create a regular expression for any valid address, though it will be quite unwieldy. It’s better to take advantage of the fact that it’s easy to split into groups and explicitly check that each is in the 0–255 range:

def check_ipv4(s):
    groups = s.split('.')
    if len(groups) != 4:
        raise ValueError("Invalid IPv4 address format")
    for g in groups:
        num = int(g)
        if (num > 255) or (num < 0):
            raise ValueError("Invalid octet value")

With IPv6, things are both simpler and more complex. Simpler because the authors of IPv6 learned from IPv4 and included the address notation in RFC 4291. Any alternative formats can be safely ignored as non-standard. On the other hand, the formats themselves are more complex. The main challenge is the compressed notation: groups of zero octets can be replaced with ::, for example, 2001:db8::1 instead of 2001:db8:0:0:0:0:0:1. This is convenient for users, but for developers, it means you can’t just split the address by colons; you need much more complex logic. The standard also prohibits using :: more than once in an address, which complicates things further.

So, if your application supports IPv6, you’ll need a full-fledged parser for address validation. There’s no point in writing your own, since there are ready-made libraries that provide this and other useful functions.

Substantive Checks

If you’re already using a library and parsing addresses, let’s look at what additional checks you can perform to filter out invalid values and make error messages more informative.

The necessary checks will depend on how the address will be used. For example, suppose a user wants to enter 124.1.2.3 as a DNS server address, but a typo turns it into 224.1.2.3. A format check won’t catch this—the format is valid. However, this address can’t be a DNS server address, since the 224.0.0.0/4 network is reserved for multicast routing, which DNS never uses.

If you want to filter out all addresses that can’t be host addresses on the public internet, you can find an almost complete list of reserved networks in RFC 5735 (Special Use IPv4 Addresses). “Almost complete” because it doesn’t include 100.64.0.0/10, which is reserved for CG-NAT (RFC 6598). A complete list of all reserved IPv4 and IPv6 ranges can be found in RFC 6890, though it’s not as conveniently organized.

Pay attention to subnet masks. Some people think the private-use network is 172.16.0.0/16 (172.16.0.0–172.16.255.255). Reading RFC 5735 quickly dispels this myth: it’s actually much larger, 172.16.0.0/12 (172.16.0.1–172.31.255.254). A real example of this mistake occurred in GoatCounter—a statistics collection script incorrectly counted visits from within the local network.

Also, keep in mind that “reserved for future use” networks may stop being reserved. Networks from RFC 5735 are reserved forever and are safe in this sense. But the authors of the once-popular Hamachi virtual network for gamers once thought 5.0.0.0/8 could be used for their needs, since it was reserved for future use—until the future arrived and IANA assigned this network to RIPE.

Libraries

netaddr

Python 3’s standard library already includes the ipaddress module, but if you can install a third-party library, netaddr can make your life much easier. For example, it has built-in functions for checking if an address belongs to reserved ranges:

>>> import netaddr
>>> def is_public_ip(s):
...     ip = netaddr.IPAddress(s)
...     return (ip.is_unicast() and not ip.is_private() and not ip.is_reserved())
...
>>> is_public_ip('192.0.2.1') # Reserved for documentation
False
>>> is_public_ip('172.16.1.2') # Reserved for private networks
False
>>> is_public_ip('224.0.0.5') # Multicast
False
>>> is_public_ip('8.8.8.8')
True

Even if these functions didn’t exist, you could easily implement them yourself. The library makes excellent use of magic methods to make the interface as convenient as Python’s built-in objects. For example, you can check if an address belongs to a network or range using the in operator, making it as easy to work with as lists or dictionaries:

def is_public_ip(s):
    loopback_net = netaddr.IPNetwork('127.0.0.0/8')
    multicast_net = netaddr.IPNetwork('224.0.0.0/4')
    ...
    ip = netaddr.IPAddress(s)
    if ip in multicast_net:
        raise ValueError("Multicast address found")
    elif ip in loopback_net:
        raise ValueError("Loopback address found")
    ...

libcidr

Even for pure C, you can find a library with a convenient interface, such as libcidr by Matthew Fuller. In Debian, you can install it from the repositories. For example, here’s how you might write a check for whether an address belongs to the multicast network, in a file called is_multicast.c:

#include <stdio.h>
#include <libcidr.h>

void main(int argc, char** argv) {
    const char* ipv4_multicast_net = "224.0.0.0/4";
    CIDR* ip = cidr_from_str(argv[1]);
    CIDR* multicast_net = cidr_from_str(ipv4_multicast_net);
    if( cidr_contains(multicast_net, ip) == 0 ) {
        printf("The argument is an IPv4 multicast address\n");
    } else {
        printf("The argument is not an IPv4 multicast address\n");
    }
}
$ sudo aptitude install libcidr-dev
$ gcc -o is_multicast -lcidr ./is_multicast.c
$ ./is_multicast 8.8.8.8
The argument is not an IPv4 multicast address
$ ./is_multicast 239.1.2.3
The argument is an IPv4 multicast address

Conclusion

Address validation and providing informative error messages about incorrect settings may seem like a minor part of the interface, but attention to detail is a sign of professionalism—especially since ready-made libraries make this task much easier.

Leave a Reply