Link to home
Start Free TrialLog in
Avatar of summer_soccer
summer_soccer

asked on

how to do IP prefix matching efficiently in Python?

I asked a similar question previously, and got some answers from Mish. I got to implement code as suggested in the link Mish provided, but I found it is very inefficient. My code is attached.

The basic idea is, first I read in all prefixes into a list. There are about 300k such prefixes. Then for each IP address, I need to do a sequential comparison with all such 300k prefixes to find a match. This can take a long time. Further, because I want the longest matching prefix, I cannot stop in the middle when a match is found, because it might not be the longest matching prefix.

I have found an package pysubnettree-0.11 on the Internet, but it requires root privilege to get installed. My code will run on multiple hosts without root privilege.

Does anybody have any ideas how to resolve this problem? Many thanks!
# convert an IP address from its dotted-quad format to its 
    # 32 binary digit representation 
    def ip2bin(self, ip): 
        b = "" 
        inQuads = ip.split(".")
        outQuads = 4
        for q in inQuads:
            if q != "":
                b += self.dec2bin(int(q),8)
                outQuads -= 1
        while outQuads > 0:
            b += "00000000"
            outQuads -= 1
        return b
 
    # convert a decimal number to binary representation
    # if d is specified, left-pad the binary number with 0s to that length
    def dec2bin(self, n,d=None):
        s = ""
        while n>0:
            if n&1:
                s = "1"+s
            else:
                s = "0"+s
            n >>= 1
        if d is not None:
            while len(s)<d:
                s = "0"+s
        if s == "": s = "0"
        return s
 
    def match_ip_prefix(self, ip, prefix):
        #print 'the prefix is ', prefix
        #print 'the ip is ', ip
        parts = prefix.split("/")
        baseIP = self.ip2bin(parts[0])
        subnet = int(parts[1])
        # Python string-slicing weirdness:
        # "myString"[:-1] -> "myStrin" but "myString"[:0] -> "" 
        
        if subnet == 32: 
            if self.ip2bin(ip) == baseIP:
                return True
            else:
                return False
 
        # for any other size subnet, print a list of IP addresses by concatenating 
        # the prefix with each of the suffixes in the subnet 
        else: 
            ipPrefix = baseIP[:-(32-subnet)] 
            ipinbin = self.ip2bin(ip) 
            partip = ipinbin[:-(32-subnet)] 
            
            #print 'partIP is ', partip, 'ipPrefix is ', ipPrefix 
            
            if ipPrefix == partip: 
                #print 'they match' 
                return True 
            else: 
                #print 'they do not match' 
                return False

Open in new window

Avatar of ramrom
ramrom
Flag of United States of America image

Use a dictionary rather than a list.

The code you posted does not show the file reading or the list searching! Please post those.

Please provide a few examples of prefixes and addresses so I can understand exactly what you mean by "longest matching prefix".
Avatar of summer_soccer
summer_soccer

ASKER

OK, let me give some more details.

The prefix file is in the following format:

1.8.3.0/24                   1033
200.37.0.0/16             251
..... .......

There are more than 300k such lines. Each line has two fields. The left field is the IP prefix with the mask length, and the right field is the mapped value of that prefix.

I have a list of IP addresses,  about 200 such IPs.
158.112.13.1
133.101.1.1
200.37.25.3
... ...

What I need to do is, for each of these 200 IPs, I need to look up and find the prefix that matches the IP address and has the longest mask length if there are multiple matches (so called longest matching prefix), and returned the mapped value for that IP.

For example, IP 200.37.25.3 matches the prefix 200.37.0.0/16, therefore, we should find 251 for IP 200.37.25.3

Is my explanation clear?
Is the mask length always a multiple of 8?

In other words are we always matching complete octets?

If so the take using a dictionary is fairly easy.
Not necessarily always multiple of 8, it can be 23, 25, any integer between 8 and 31.
ASKER CERTIFIED SOLUTION
Avatar of ramrom
ramrom
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you, very good solution. I thought about using B-tree to implement this, but it is too complicated.