Solved

asynchronous dns lookup

Posted on 2015-01-13
12
246 Views
Last Modified: 2015-01-30
Good morning,

I would like to find a way to resolve about 2500 IPs to names by using nslookup. But the problem with my code is that it use nslookup to query the DNS server for the computer names and i have to wait until the query is done. It costs me a lot of time when I query for about 100 hosts.
I found a solution from python called async, and from google I found this:
http://www.catonmat.net/blog/asynchronous-dns-resolution/
and this:
http://blog.schmichael.com/2007/09/18/a-lesson-on-python-dns-and-threads/
https://www.geekynick.co.uk/bulk-dns-lookup-in-windows-powershell-better-than-nslookup-2/2/
0
Comment
Question by:totoroha
  • 4
  • 3
  • 2
  • +2
12 Comments
 
LVL 76

Expert Comment

by:arnold
ID: 40548140
You can store the data you get back in a searchable DB
do you have local dns server installed?
not sure what your scripting skill level is, but using an external command while fine, has the issue that requires you to parse the output to get the info.
using python/perl might speedup the process somewhat.

Are you looking the same IP everytime it appears or does your system/setup only looks up an IP once?
0
 
LVL 23

Expert Comment

by:savone
ID: 40548150
Is there a reason you need to use python?  I do stuff like this right from the command line using bash and dig.

for example, you can put all 2500 IP addresses in a file called ips and loop through it.

for i in `cat ips`; do echo -n "$i - "; dig +short -x $i; done

example file:
73.56.45.12
76.85.54.56
69.68.54.23
68.54.56.59


example output:
73.56.45.12 - m001311d9c4d5.atlt5.ga.comcast.net.
76.85.54.56 - mta-76-85-54-56.neb.rr.com.
69.68.54.23 - oh-69-68-54-23.sta.embarqhsd.net.
68.54.56.59 - c-68-54-56-59.hsd1.fl.comcast.net.
0
 

Author Comment

by:totoroha
ID: 40548273
I had my own script now and it works pretty well. Only one thing is that you have to cat lists.txt | script.py.
My scripting skill is not really good, so I want to ask one question: How do I read line by line and run it with my script. If you want, I can share the script here and we can modify it to make it read from text file with file location and give us the query.
In my opinion, we can use asynchronous library of python or using twisted, and it will speed up the query speed really fast.
In case you have a bulk of hostname or IP names.
0
 
LVL 76

Expert Comment

by:arnold
ID: 40548282
I'm not familiar with Python,

https://docs.python.org/2/tutorial/inputoutput.html
F=open('filename','R')

You possibly can pass the filename on the command line
Script.py filename if it changes.

You need to look at storing the data in a faster access mode I.e. A DB/DBM, or in a tree type
0
 
LVL 32

Assisted Solution

by:DrDamnit
DrDamnit earned 250 total points
ID: 40556269
You want threading:

import os
import re
import time
import sys
import dns.resolver
from threading import Thread

class dnsRecord(Thread):
	def __init__ (self,url):
		Thread.__init__(self)
		self.url = url
	def run(self):
		answer = dns.resolver.query(self.url,'A')
		for a in answer:
			print "%s resolves to: %s\n" % (self.url,a)

print time.ctime()

urllist = ['www.google.com', 'mail.google.com', 'www.yahoo.com', 'experts-exchange.com']

for url in urllist:
	current = dnsRecord(url)
	current.start()

print time.ctime()

Open in new window


This, of course, requires that you have python DNS installed. (Under Linux / Wheezy, use apt-get install python-dnspython).

This creates a loop through the list of urls you want to resolve, and then passes each of them to a threaded object, starts, the object, and moves on to the next.

It should be quite fast. I would imagine you should be able to complete 2500 queries in a second or two. I would test it, but, i don't have a list of 2500 urls to run. :-)

You'll also notice that the two time statements will appear before your first answer (in most cases). That's because we have delegated the query lookup to a thread. So, once it creates the 2500 threads, it will tell you the "I'm done. Here's the second time stamp" and the 2,500 return queries will come in as they are completed.
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 250 total points
ID: 40556454
Wow... 2500 threads, huh?  I'm definitely not sure I'd recommend starting up 2500 threads.  

Typically, what you'd normally do is put your data into a threadsafe structure (python's Queue would be a good pick -- https://docs.python.org/2/library/queue.html), you then start up a reasonable number of worker threads (10 to 20) and have them pull from the queue and process as they are free until the queue is empty.

Modifying Michael's answer up just a bit... Other than the number of threads he's starting,  he's also got an issue with his timer as his main thread will just blow through and finish while his workers will still be processing.  So the timestamp won't actually be counting anything other than the amount of time it takes him to start up all his threads.  Another non-issue but a bit of a pet peeve is that he's sharing stdout between multiple threads something-- not a big fan of that.   And lastly while printing to the screen is nice for a demo, typically, in a real program, you'd want to do something with those results so you need a way to get them back into the main thread.  To do that, you need a threadsafe structure that all your threads can dump data into.

Here's Michael's answer cleaned up a bit.  

With no arguments, the following downloads the list of the 500 most popular sites on the internet and runs through and resolves them.   If given an argument, it opens a file with a single dns name in it per line and runs through those.

import os
import time
import dns.resolver
from threading import Thread
from Queue import Queue
import sys
import urllib2
import csv

class dnsRecord(Thread):
    def __init__ (self, q_in, q_out):
        Thread.__init__(self)
        self.q_in = q_in
        self.q_out = q_out

    def run(self):
        while (not self.q_in.empty()):
            url = self.q_in.get(block=True, timeout=5)
            try:
                answer = dns.resolver.query(url,'A')
                for a in answer:
                    self.q_out.put((url, a), block=True, timeout=5)
            except dns.resolver.NoAnswer:
                self.q_out.put((url, "NO ANSWER"))

def resolveUrls(urls, num_of_threads=10):
    q_in = Queue()
    q_out = Queue()
    for url in urls:
        q_in.put(url.rstrip(), block=True, timeout=5)
    
    thread_pool = list()
    for i in range(num_of_threads):
        t = dnsRecord(q_in, q_out)
        t.start()
        thread_pool.append(t)

    for t in thread_pool:
        t.join()

    while not q_out.empty():
        yield q_out.get()

        
if __name__ == '__main__':
    if len(sys.argv) > 1:
        urllist = open(sys.argv[1], "r")
    else:
        urlcsv = csv.DictReader(urllib2.urlopen('http://moz.com/top500/domains/csv'))
        urllist = [url["URL"][:-1] for url in urlcsv]
    
    start = time.clock()
    urls = resolveUrls(urllist)
    time_taken = time.clock() - start
     
    for (url, address) in urls:
        print "{url} => {address}".format(url=url, address=address)

    print "Seconds: {0}".format(time_taken)

Open in new window

0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 32

Expert Comment

by:DrDamnit
ID: 40556488
Yeah... I thought 2500 threads was excessive, but in the end, they are simple UDP requests, which should be tiny.

Now, that being said, I also aimed that if it was too much overhead, the asker would mod the code to do it in batches.

But I do line clockwatcher's modifications.
0
 

Author Comment

by:totoroha
ID: 40578555
This would be an excellent sample for me next time if I want to check the DNS records od URLs. However, in this case, I just need to check the computer name of given IP address or vice versa, the IP address of a given computer name.

@Michael: It would be great if you can help me with another thread about python.
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40578878
That's exactly what that code does.  There is no such thing as a DNS record for an URL.  The code resolves a list of computer names to their IP addresses.  The choice of variable names wasn't the best.
0
 

Author Comment

by:totoroha
ID: 40579724
Good morning, clockwatcher.

What is the format of the csv file that I need to follow in order to resolve the computer names (pcname.domain.com) to IP address and vice versa?
0
 
LVL 32

Expert Comment

by:DrDamnit
ID: 40579736
The problem is the "vice versa". You're trying to do a reverse DNS lookup, which may or may not work depending on if pointers of have been properly set up.
0
 

Author Comment

by:totoroha
ID: 40579775
you're right Michael. I have problem in understanding your code and clockwatcher code. Would you please elaborate it?
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Article by: Swadhin
Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now