summer_soccer
asked on
how to do IP prefix matching efficiently in Python?
I asked a similar question previously, and got some answers from Mish. I got to implement code as suggested in the link Mish provided, but I found it is very inefficient. My code is attached.
The basic idea is, first I read in all prefixes into a list. There are about 300k such prefixes. Then for each IP address, I need to do a sequential comparison with all such 300k prefixes to find a match. This can take a long time. Further, because I want the longest matching prefix, I cannot stop in the middle when a match is found, because it might not be the longest matching prefix.
I have found an package pysubnettree-0.11 on the Internet, but it requires root privilege to get installed. My code will run on multiple hosts without root privilege.
Does anybody have any ideas how to resolve this problem? Many thanks!
The basic idea is, first I read in all prefixes into a list. There are about 300k such prefixes. Then for each IP address, I need to do a sequential comparison with all such 300k prefixes to find a match. This can take a long time. Further, because I want the longest matching prefix, I cannot stop in the middle when a match is found, because it might not be the longest matching prefix.
I have found an package pysubnettree-0.11 on the Internet, but it requires root privilege to get installed. My code will run on multiple hosts without root privilege.
Does anybody have any ideas how to resolve this problem? Many thanks!
# convert an IP address from its dotted-quad format to its
# 32 binary digit representation
def ip2bin(self, ip):
b = ""
inQuads = ip.split(".")
outQuads = 4
for q in inQuads:
if q != "":
b += self.dec2bin(int(q),8)
outQuads -= 1
while outQuads > 0:
b += "00000000"
outQuads -= 1
return b
# convert a decimal number to binary representation
# if d is specified, left-pad the binary number with 0s to that length
def dec2bin(self, n,d=None):
s = ""
while n>0:
if n&1:
s = "1"+s
else:
s = "0"+s
n >>= 1
if d is not None:
while len(s)<d:
s = "0"+s
if s == "": s = "0"
return s
def match_ip_prefix(self, ip, prefix):
#print 'the prefix is ', prefix
#print 'the ip is ', ip
parts = prefix.split("/")
baseIP = self.ip2bin(parts[0])
subnet = int(parts[1])
# Python string-slicing weirdness:
# "myString"[:-1] -> "myStrin" but "myString"[:0] -> ""
if subnet == 32:
if self.ip2bin(ip) == baseIP:
return True
else:
return False
# for any other size subnet, print a list of IP addresses by concatenating
# the prefix with each of the suffixes in the subnet
else:
ipPrefix = baseIP[:-(32-subnet)]
ipinbin = self.ip2bin(ip)
partip = ipinbin[:-(32-subnet)]
#print 'partIP is ', partip, 'ipPrefix is ', ipPrefix
if ipPrefix == partip:
#print 'they match'
return True
else:
#print 'they do not match'
return False
ASKER
OK, let me give some more details.
The prefix file is in the following format:
1.8.3.0/24 1033
200.37.0.0/16 251
..... .......
There are more than 300k such lines. Each line has two fields. The left field is the IP prefix with the mask length, and the right field is the mapped value of that prefix.
I have a list of IP addresses, about 200 such IPs.
158.112.13.1
133.101.1.1
200.37.25.3
... ...
What I need to do is, for each of these 200 IPs, I need to look up and find the prefix that matches the IP address and has the longest mask length if there are multiple matches (so called longest matching prefix), and returned the mapped value for that IP.
For example, IP 200.37.25.3 matches the prefix 200.37.0.0/16, therefore, we should find 251 for IP 200.37.25.3
Is my explanation clear?
The prefix file is in the following format:
1.8.3.0/24 1033
200.37.0.0/16 251
..... .......
There are more than 300k such lines. Each line has two fields. The left field is the IP prefix with the mask length, and the right field is the mapped value of that prefix.
I have a list of IP addresses, about 200 such IPs.
158.112.13.1
133.101.1.1
200.37.25.3
... ...
What I need to do is, for each of these 200 IPs, I need to look up and find the prefix that matches the IP address and has the longest mask length if there are multiple matches (so called longest matching prefix), and returned the mapped value for that IP.
For example, IP 200.37.25.3 matches the prefix 200.37.0.0/16, therefore, we should find 251 for IP 200.37.25.3
Is my explanation clear?
Is the mask length always a multiple of 8?
In other words are we always matching complete octets?
If so the take using a dictionary is fairly easy.
In other words are we always matching complete octets?
If so the take using a dictionary is fairly easy.
ASKER
Not necessarily always multiple of 8, it can be 23, 25, any integer between 8 and 31.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Thank you, very good solution. I thought about using B-tree to implement this, but it is too complicated.
The code you posted does not show the file reading or the list searching! Please post those.
Please provide a few examples of prefixes and addresses so I can understand exactly what you mean by "longest matching prefix".