[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 564
  • Last Modified:

asynchronous dns lookup

Good morning,

I would like to find a way to resolve about 2500 IPs to names by using nslookup. But the problem with my code is that it use nslookup to query the DNS server for the computer names and i have to wait until the query is done. It costs me a lot of time when I query for about 100 hosts.
I found a solution from python called async, and from google I found this:
http://www.catonmat.net/blog/asynchronous-dns-resolution/
and this:
http://blog.schmichael.com/2007/09/18/a-lesson-on-python-dns-and-threads/
https://www.geekynick.co.uk/bulk-dns-lookup-in-windows-powershell-better-than-nslookup-2/2/
0
totoroha
Asked:
totoroha
  • 4
  • 3
  • 2
  • +2
2 Solutions
 
arnoldCommented:
You can store the data you get back in a searchable DB
do you have local dns server installed?
not sure what your scripting skill level is, but using an external command while fine, has the issue that requires you to parse the output to get the info.
using python/perl might speedup the process somewhat.

Are you looking the same IP everytime it appears or does your system/setup only looks up an IP once?
0
 
savoneCommented:
Is there a reason you need to use python?  I do stuff like this right from the command line using bash and dig.

for example, you can put all 2500 IP addresses in a file called ips and loop through it.

for i in `cat ips`; do echo -n "$i - "; dig +short -x $i; done

example file:
73.56.45.12
76.85.54.56
69.68.54.23
68.54.56.59


example output:
73.56.45.12 - m001311d9c4d5.atlt5.ga.comcast.net.
76.85.54.56 - mta-76-85-54-56.neb.rr.com.
69.68.54.23 - oh-69-68-54-23.sta.embarqhsd.net.
68.54.56.59 - c-68-54-56-59.hsd1.fl.comcast.net.
0
 
totorohaAuthor Commented:
I had my own script now and it works pretty well. Only one thing is that you have to cat lists.txt | script.py.
My scripting skill is not really good, so I want to ask one question: How do I read line by line and run it with my script. If you want, I can share the script here and we can modify it to make it read from text file with file location and give us the query.
In my opinion, we can use asynchronous library of python or using twisted, and it will speed up the query speed really fast.
In case you have a bulk of hostname or IP names.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
arnoldCommented:
I'm not familiar with Python,

https://docs.python.org/2/tutorial/inputoutput.html
F=open('filename','R')

You possibly can pass the filename on the command line
Script.py filename if it changes.

You need to look at storing the data in a faster access mode I.e. A DB/DBM, or in a tree type
0
 
DrDamnitCommented:
You want threading:

import os
import re
import time
import sys
import dns.resolver
from threading import Thread

class dnsRecord(Thread):
	def __init__ (self,url):
		Thread.__init__(self)
		self.url = url
	def run(self):
		answer = dns.resolver.query(self.url,'A')
		for a in answer:
			print "%s resolves to: %s\n" % (self.url,a)

print time.ctime()

urllist = ['www.google.com', 'mail.google.com', 'www.yahoo.com', 'experts-exchange.com']

for url in urllist:
	current = dnsRecord(url)
	current.start()

print time.ctime()

Open in new window


This, of course, requires that you have python DNS installed. (Under Linux / Wheezy, use apt-get install python-dnspython).

This creates a loop through the list of urls you want to resolve, and then passes each of them to a threaded object, starts, the object, and moves on to the next.

It should be quite fast. I would imagine you should be able to complete 2500 queries in a second or two. I would test it, but, i don't have a list of 2500 urls to run. :-)

You'll also notice that the two time statements will appear before your first answer (in most cases). That's because we have delegated the query lookup to a thread. So, once it creates the 2500 threads, it will tell you the "I'm done. Here's the second time stamp" and the 2,500 return queries will come in as they are completed.
0
 
clockwatcherCommented:
Wow... 2500 threads, huh?  I'm definitely not sure I'd recommend starting up 2500 threads.  

Typically, what you'd normally do is put your data into a threadsafe structure (python's Queue would be a good pick -- https://docs.python.org/2/library/queue.html), you then start up a reasonable number of worker threads (10 to 20) and have them pull from the queue and process as they are free until the queue is empty.

Modifying Michael's answer up just a bit... Other than the number of threads he's starting,  he's also got an issue with his timer as his main thread will just blow through and finish while his workers will still be processing.  So the timestamp won't actually be counting anything other than the amount of time it takes him to start up all his threads.  Another non-issue but a bit of a pet peeve is that he's sharing stdout between multiple threads something-- not a big fan of that.   And lastly while printing to the screen is nice for a demo, typically, in a real program, you'd want to do something with those results so you need a way to get them back into the main thread.  To do that, you need a threadsafe structure that all your threads can dump data into.

Here's Michael's answer cleaned up a bit.  

With no arguments, the following downloads the list of the 500 most popular sites on the internet and runs through and resolves them.   If given an argument, it opens a file with a single dns name in it per line and runs through those.

import os
import time
import dns.resolver
from threading import Thread
from Queue import Queue
import sys
import urllib2
import csv

class dnsRecord(Thread):
    def __init__ (self, q_in, q_out):
        Thread.__init__(self)
        self.q_in = q_in
        self.q_out = q_out

    def run(self):
        while (not self.q_in.empty()):
            url = self.q_in.get(block=True, timeout=5)
            try:
                answer = dns.resolver.query(url,'A')
                for a in answer:
                    self.q_out.put((url, a), block=True, timeout=5)
            except dns.resolver.NoAnswer:
                self.q_out.put((url, "NO ANSWER"))

def resolveUrls(urls, num_of_threads=10):
    q_in = Queue()
    q_out = Queue()
    for url in urls:
        q_in.put(url.rstrip(), block=True, timeout=5)
    
    thread_pool = list()
    for i in range(num_of_threads):
        t = dnsRecord(q_in, q_out)
        t.start()
        thread_pool.append(t)

    for t in thread_pool:
        t.join()

    while not q_out.empty():
        yield q_out.get()

        
if __name__ == '__main__':
    if len(sys.argv) > 1:
        urllist = open(sys.argv[1], "r")
    else:
        urlcsv = csv.DictReader(urllib2.urlopen('http://moz.com/top500/domains/csv'))
        urllist = [url["URL"][:-1] for url in urlcsv]
    
    start = time.clock()
    urls = resolveUrls(urllist)
    time_taken = time.clock() - start
     
    for (url, address) in urls:
        print "{url} => {address}".format(url=url, address=address)

    print "Seconds: {0}".format(time_taken)

Open in new window

0
 
DrDamnitCommented:
Yeah... I thought 2500 threads was excessive, but in the end, they are simple UDP requests, which should be tiny.

Now, that being said, I also aimed that if it was too much overhead, the asker would mod the code to do it in batches.

But I do line clockwatcher's modifications.
0
 
totorohaAuthor Commented:
This would be an excellent sample for me next time if I want to check the DNS records od URLs. However, in this case, I just need to check the computer name of given IP address or vice versa, the IP address of a given computer name.

@Michael: It would be great if you can help me with another thread about python.
0
 
clockwatcherCommented:
That's exactly what that code does.  There is no such thing as a DNS record for an URL.  The code resolves a list of computer names to their IP addresses.  The choice of variable names wasn't the best.
0
 
totorohaAuthor Commented:
Good morning, clockwatcher.

What is the format of the csv file that I need to follow in order to resolve the computer names (pcname.domain.com) to IP address and vice versa?
0
 
DrDamnitCommented:
The problem is the "vice versa". You're trying to do a reverse DNS lookup, which may or may not work depending on if pointers of have been properly set up.
0
 
totorohaAuthor Commented:
you're right Michael. I have problem in understanding your code and clockwatcher code. Would you please elaborate it?
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 4
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now