Solved

asynchronous dns lookup

Posted on 2015-01-13
12
382 Views
Last Modified: 2015-01-30
Good morning,

I would like to find a way to resolve about 2500 IPs to names by using nslookup. But the problem with my code is that it use nslookup to query the DNS server for the computer names and i have to wait until the query is done. It costs me a lot of time when I query for about 100 hosts.
I found a solution from python called async, and from google I found this:
http://www.catonmat.net/blog/asynchronous-dns-resolution/
and this:
http://blog.schmichael.com/2007/09/18/a-lesson-on-python-dns-and-threads/
https://www.geekynick.co.uk/bulk-dns-lookup-in-windows-powershell-better-than-nslookup-2/2/
0
Comment
Question by:totoroha
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +2
12 Comments
 
LVL 78

Expert Comment

by:arnold
ID: 40548140
You can store the data you get back in a searchable DB
do you have local dns server installed?
not sure what your scripting skill level is, but using an external command while fine, has the issue that requires you to parse the output to get the info.
using python/perl might speedup the process somewhat.

Are you looking the same IP everytime it appears or does your system/setup only looks up an IP once?
0
 
LVL 23

Expert Comment

by:savone
ID: 40548150
Is there a reason you need to use python?  I do stuff like this right from the command line using bash and dig.

for example, you can put all 2500 IP addresses in a file called ips and loop through it.

for i in `cat ips`; do echo -n "$i - "; dig +short -x $i; done

example file:
73.56.45.12
76.85.54.56
69.68.54.23
68.54.56.59


example output:
73.56.45.12 - m001311d9c4d5.atlt5.ga.comcast.net.
76.85.54.56 - mta-76-85-54-56.neb.rr.com.
69.68.54.23 - oh-69-68-54-23.sta.embarqhsd.net.
68.54.56.59 - c-68-54-56-59.hsd1.fl.comcast.net.
0
 

Author Comment

by:totoroha
ID: 40548273
I had my own script now and it works pretty well. Only one thing is that you have to cat lists.txt | script.py.
My scripting skill is not really good, so I want to ask one question: How do I read line by line and run it with my script. If you want, I can share the script here and we can modify it to make it read from text file with file location and give us the query.
In my opinion, we can use asynchronous library of python or using twisted, and it will speed up the query speed really fast.
In case you have a bulk of hostname or IP names.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 78

Expert Comment

by:arnold
ID: 40548282
I'm not familiar with Python,

https://docs.python.org/2/tutorial/inputoutput.html
F=open('filename','R')

You possibly can pass the filename on the command line
Script.py filename if it changes.

You need to look at storing the data in a faster access mode I.e. A DB/DBM, or in a tree type
0
 
LVL 32

Assisted Solution

by:DrDamnit
DrDamnit earned 250 total points
ID: 40556269
You want threading:

import os
import re
import time
import sys
import dns.resolver
from threading import Thread

class dnsRecord(Thread):
	def __init__ (self,url):
		Thread.__init__(self)
		self.url = url
	def run(self):
		answer = dns.resolver.query(self.url,'A')
		for a in answer:
			print "%s resolves to: %s\n" % (self.url,a)

print time.ctime()

urllist = ['www.google.com', 'mail.google.com', 'www.yahoo.com', 'experts-exchange.com']

for url in urllist:
	current = dnsRecord(url)
	current.start()

print time.ctime()

Open in new window


This, of course, requires that you have python DNS installed. (Under Linux / Wheezy, use apt-get install python-dnspython).

This creates a loop through the list of urls you want to resolve, and then passes each of them to a threaded object, starts, the object, and moves on to the next.

It should be quite fast. I would imagine you should be able to complete 2500 queries in a second or two. I would test it, but, i don't have a list of 2500 urls to run. :-)

You'll also notice that the two time statements will appear before your first answer (in most cases). That's because we have delegated the query lookup to a thread. So, once it creates the 2500 threads, it will tell you the "I'm done. Here's the second time stamp" and the 2,500 return queries will come in as they are completed.
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 250 total points
ID: 40556454
Wow... 2500 threads, huh?  I'm definitely not sure I'd recommend starting up 2500 threads.  

Typically, what you'd normally do is put your data into a threadsafe structure (python's Queue would be a good pick -- https://docs.python.org/2/library/queue.html), you then start up a reasonable number of worker threads (10 to 20) and have them pull from the queue and process as they are free until the queue is empty.

Modifying Michael's answer up just a bit... Other than the number of threads he's starting,  he's also got an issue with his timer as his main thread will just blow through and finish while his workers will still be processing.  So the timestamp won't actually be counting anything other than the amount of time it takes him to start up all his threads.  Another non-issue but a bit of a pet peeve is that he's sharing stdout between multiple threads something-- not a big fan of that.   And lastly while printing to the screen is nice for a demo, typically, in a real program, you'd want to do something with those results so you need a way to get them back into the main thread.  To do that, you need a threadsafe structure that all your threads can dump data into.

Here's Michael's answer cleaned up a bit.  

With no arguments, the following downloads the list of the 500 most popular sites on the internet and runs through and resolves them.   If given an argument, it opens a file with a single dns name in it per line and runs through those.

import os
import time
import dns.resolver
from threading import Thread
from Queue import Queue
import sys
import urllib2
import csv

class dnsRecord(Thread):
    def __init__ (self, q_in, q_out):
        Thread.__init__(self)
        self.q_in = q_in
        self.q_out = q_out

    def run(self):
        while (not self.q_in.empty()):
            url = self.q_in.get(block=True, timeout=5)
            try:
                answer = dns.resolver.query(url,'A')
                for a in answer:
                    self.q_out.put((url, a), block=True, timeout=5)
            except dns.resolver.NoAnswer:
                self.q_out.put((url, "NO ANSWER"))

def resolveUrls(urls, num_of_threads=10):
    q_in = Queue()
    q_out = Queue()
    for url in urls:
        q_in.put(url.rstrip(), block=True, timeout=5)
    
    thread_pool = list()
    for i in range(num_of_threads):
        t = dnsRecord(q_in, q_out)
        t.start()
        thread_pool.append(t)

    for t in thread_pool:
        t.join()

    while not q_out.empty():
        yield q_out.get()

        
if __name__ == '__main__':
    if len(sys.argv) > 1:
        urllist = open(sys.argv[1], "r")
    else:
        urlcsv = csv.DictReader(urllib2.urlopen('http://moz.com/top500/domains/csv'))
        urllist = [url["URL"][:-1] for url in urlcsv]
    
    start = time.clock()
    urls = resolveUrls(urllist)
    time_taken = time.clock() - start
     
    for (url, address) in urls:
        print "{url} => {address}".format(url=url, address=address)

    print "Seconds: {0}".format(time_taken)

Open in new window

0
 
LVL 32

Expert Comment

by:DrDamnit
ID: 40556488
Yeah... I thought 2500 threads was excessive, but in the end, they are simple UDP requests, which should be tiny.

Now, that being said, I also aimed that if it was too much overhead, the asker would mod the code to do it in batches.

But I do line clockwatcher's modifications.
0
 

Author Comment

by:totoroha
ID: 40578555
This would be an excellent sample for me next time if I want to check the DNS records od URLs. However, in this case, I just need to check the computer name of given IP address or vice versa, the IP address of a given computer name.

@Michael: It would be great if you can help me with another thread about python.
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40578878
That's exactly what that code does.  There is no such thing as a DNS record for an URL.  The code resolves a list of computer names to their IP addresses.  The choice of variable names wasn't the best.
0
 

Author Comment

by:totoroha
ID: 40579724
Good morning, clockwatcher.

What is the format of the csv file that I need to follow in order to resolve the computer names (pcname.domain.com) to IP address and vice versa?
0
 
LVL 32

Expert Comment

by:DrDamnit
ID: 40579736
The problem is the "vice versa". You're trying to do a reverse DNS lookup, which may or may not work depending on if pointers of have been properly set up.
0
 

Author Comment

by:totoroha
ID: 40579775
you're right Michael. I have problem in understanding your code and clockwatcher code. Would you please elaborate it?
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show the steps for installing Python on Ubuntu Operating System. I have created a virtual machine with Ubuntu Operating system 8.10 and this installing process also works with upgraded version of Ubuntu OS. For installing Py…
Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question