asynchronous dns lookup

Posted on 2015-01-13
Medium Priority
Last Modified: 2015-01-30
Good morning,

I would like to find a way to resolve about 2500 IPs to names by using nslookup. But the problem with my code is that it use nslookup to query the DNS server for the computer names and i have to wait until the query is done. It costs me a lot of time when I query for about 100 hosts.
I found a solution from python called async, and from google I found this:
and this:
Question by:totoroha
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +2
LVL 80

Expert Comment

ID: 40548140
You can store the data you get back in a searchable DB
do you have local dns server installed?
not sure what your scripting skill level is, but using an external command while fine, has the issue that requires you to parse the output to get the info.
using python/perl might speedup the process somewhat.

Are you looking the same IP everytime it appears or does your system/setup only looks up an IP once?
LVL 23

Expert Comment

ID: 40548150
Is there a reason you need to use python?  I do stuff like this right from the command line using bash and dig.

for example, you can put all 2500 IP addresses in a file called ips and loop through it.

for i in `cat ips`; do echo -n "$i - "; dig +short -x $i; done

example file:

example output: - m001311d9c4d5.atlt5.ga.comcast.net. - mta-76-85-54-56.neb.rr.com. - oh-69-68-54-23.sta.embarqhsd.net. - c-68-54-56-59.hsd1.fl.comcast.net.

Author Comment

ID: 40548273
I had my own script now and it works pretty well. Only one thing is that you have to cat lists.txt | script.py.
My scripting skill is not really good, so I want to ask one question: How do I read line by line and run it with my script. If you want, I can share the script here and we can modify it to make it read from text file with file location and give us the query.
In my opinion, we can use asynchronous library of python or using twisted, and it will speed up the query speed really fast.
In case you have a bulk of hostname or IP names.
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 80

Expert Comment

ID: 40548282
I'm not familiar with Python,


You possibly can pass the filename on the command line
Script.py filename if it changes.

You need to look at storing the data in a faster access mode I.e. A DB/DBM, or in a tree type
LVL 32

Assisted Solution

DrDamnit earned 1000 total points
ID: 40556269
You want threading:

import os
import re
import time
import sys
import dns.resolver
from threading import Thread

class dnsRecord(Thread):
	def __init__ (self,url):
		self.url = url
	def run(self):
		answer = dns.resolver.query(self.url,'A')
		for a in answer:
			print "%s resolves to: %s\n" % (self.url,a)

print time.ctime()

urllist = ['www.google.com', 'mail.google.com', 'www.yahoo.com', 'experts-exchange.com']

for url in urllist:
	current = dnsRecord(url)

print time.ctime()

Open in new window

This, of course, requires that you have python DNS installed. (Under Linux / Wheezy, use apt-get install python-dnspython).

This creates a loop through the list of urls you want to resolve, and then passes each of them to a threaded object, starts, the object, and moves on to the next.

It should be quite fast. I would imagine you should be able to complete 2500 queries in a second or two. I would test it, but, i don't have a list of 2500 urls to run. :-)

You'll also notice that the two time statements will appear before your first answer (in most cases). That's because we have delegated the query lookup to a thread. So, once it creates the 2500 threads, it will tell you the "I'm done. Here's the second time stamp" and the 2,500 return queries will come in as they are completed.
LVL 25

Accepted Solution

clockwatcher earned 1000 total points
ID: 40556454
Wow... 2500 threads, huh?  I'm definitely not sure I'd recommend starting up 2500 threads.  

Typically, what you'd normally do is put your data into a threadsafe structure (python's Queue would be a good pick -- https://docs.python.org/2/library/queue.html), you then start up a reasonable number of worker threads (10 to 20) and have them pull from the queue and process as they are free until the queue is empty.

Modifying Michael's answer up just a bit... Other than the number of threads he's starting,  he's also got an issue with his timer as his main thread will just blow through and finish while his workers will still be processing.  So the timestamp won't actually be counting anything other than the amount of time it takes him to start up all his threads.  Another non-issue but a bit of a pet peeve is that he's sharing stdout between multiple threads something-- not a big fan of that.   And lastly while printing to the screen is nice for a demo, typically, in a real program, you'd want to do something with those results so you need a way to get them back into the main thread.  To do that, you need a threadsafe structure that all your threads can dump data into.

Here's Michael's answer cleaned up a bit.  

With no arguments, the following downloads the list of the 500 most popular sites on the internet and runs through and resolves them.   If given an argument, it opens a file with a single dns name in it per line and runs through those.

import os
import time
import dns.resolver
from threading import Thread
from Queue import Queue
import sys
import urllib2
import csv

class dnsRecord(Thread):
    def __init__ (self, q_in, q_out):
        self.q_in = q_in
        self.q_out = q_out

    def run(self):
        while (not self.q_in.empty()):
            url = self.q_in.get(block=True, timeout=5)
                answer = dns.resolver.query(url,'A')
                for a in answer:
                    self.q_out.put((url, a), block=True, timeout=5)
            except dns.resolver.NoAnswer:
                self.q_out.put((url, "NO ANSWER"))

def resolveUrls(urls, num_of_threads=10):
    q_in = Queue()
    q_out = Queue()
    for url in urls:
        q_in.put(url.rstrip(), block=True, timeout=5)
    thread_pool = list()
    for i in range(num_of_threads):
        t = dnsRecord(q_in, q_out)

    for t in thread_pool:

    while not q_out.empty():
        yield q_out.get()

if __name__ == '__main__':
    if len(sys.argv) > 1:
        urllist = open(sys.argv[1], "r")
        urlcsv = csv.DictReader(urllib2.urlopen('http://moz.com/top500/domains/csv'))
        urllist = [url["URL"][:-1] for url in urlcsv]
    start = time.clock()
    urls = resolveUrls(urllist)
    time_taken = time.clock() - start
    for (url, address) in urls:
        print "{url} => {address}".format(url=url, address=address)

    print "Seconds: {0}".format(time_taken)

Open in new window

LVL 32

Expert Comment

ID: 40556488
Yeah... I thought 2500 threads was excessive, but in the end, they are simple UDP requests, which should be tiny.

Now, that being said, I also aimed that if it was too much overhead, the asker would mod the code to do it in batches.

But I do line clockwatcher's modifications.

Author Comment

ID: 40578555
This would be an excellent sample for me next time if I want to check the DNS records od URLs. However, in this case, I just need to check the computer name of given IP address or vice versa, the IP address of a given computer name.

@Michael: It would be great if you can help me with another thread about python.
LVL 25

Expert Comment

ID: 40578878
That's exactly what that code does.  There is no such thing as a DNS record for an URL.  The code resolves a list of computer names to their IP addresses.  The choice of variable names wasn't the best.

Author Comment

ID: 40579724
Good morning, clockwatcher.

What is the format of the csv file that I need to follow in order to resolve the computer names (pcname.domain.com) to IP address and vice versa?
LVL 32

Expert Comment

ID: 40579736
The problem is the "vice versa". You're trying to do a reverse DNS lookup, which may or may not work depending on if pointers of have been properly set up.

Author Comment

ID: 40579775
you're right Michael. I have problem in understanding your code and clockwatcher code. Would you please elaborate it?

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Variable is a place holder or reserved memory locations to store any value. Which means whenever we create a variable, indirectly we are reserving some space in the memory. The interpreter assigns or allocates some space in the memory based on the d…
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question