Need to programmatically capture information from a website.

Posted on 2005-04-21
Last Modified: 2012-12-25
I am trying to gather information from a website (that I don't own)

I log in manually, and then using EFGrabber and WinBatch, I have created a macro that scrolls through records that I have searched and extracts the information I am looking for into an excel file (without my having to type each record one by one....)

The challenge is this -- Because it's a click location based macro, if the next button is even slightly off, the macro hangs.

I am wondering if there is a better way to do this.... Can i use XML, or write a script, or something like that?

Any direction would be greatly appreciated. I'm kind of at a loss on how to proceed.

FYI: I am proficient in ASP and .NET, VB, and VBScript, but could use another language :sigh: if absolutely necessary.

Question by:VeeVan
    LVL 29

    Expert Comment

    Have permission?
    LVL 12

    Assisted Solution

    There's a microsoft component called MSXML 4.0
    you can download it free from microsoft

    I've copied this tidbit from the SDK docs:
    var xmlhttp = new ActiveXObject("Msxml2.XMLHTTP.4.0");"GET","http://myserver/save.asp", false);

    or this, using the server version of the control:
    var srvXmlHttp
    srvXmlHttp = Server.CreateObject("Msxml2.ServerXMLHTTP.4.0"); ("GET", "http://myserver/myresponse.asp", false);
    newsElement = srvXmlHttp.responseXML.selectSingleNode("/news/story1");

    <p>Top News Story<p>

    Basically, these controlls let you open a web page and work with the results. If the page has consistant formatting (ie lets say the data you need is in a table that always has the same ID) you can use that to pull specific pieces of the page you need.

    - Jack

    LVL 1

    Author Comment

    Can I use the XML to jump from one page to the next?

    Here's the scenario:
    1.  I manually do a search for the information that I want.
    2.  It then comes up in a list format that has partial information.
    3.  I click on one of the detail records and pull out a couple of fields into Excel using EFGrabber.
    4.  Then, I click next to go to the next detail page.
    5.  I repeat steps 3 and 4 until I have all the info that I want.

    BTW: The stuff I'm searching is in public domain (County Property Appraiser's Website) I'm just trying to avoid the manual process of copying, pasting, or typing. Takes a really long time.

    So what I need is a method to both grab the information: for which it seems XML should work nicely.....and also forward to the next page programmatically -- and I don't know if that's possible.

    Thanks again for all your assistance.

    LVL 29

    Accepted Solution

    >BTW: The stuff I'm searching is in public domain (County Property Appraiser's Website) I'm just trying to avoid the manual process of copying, pasting, or typing. Takes a really long time.

    Thank you.
    Using the XMLHTTP object, you can emulate a web browser, which means you can use both the post and get methods.  The XML part of the name isn't really relavent to you here, it is the HTTP part of the name you need.

    What you want to concentrate on is the links to the detail records.

    If you can predict the links for the properties you need the details for, you can write a script to fetch those pages using the XMLHTTP object as the go between.

    It will return the content of the page and you can use string manipulation functions (Left, Right, Mid, InStr) to carve out the part you want.

    Add that to the file system object and you can write the results to a CSV file on your machine which Excel will open as though it were native XLS.

    <%@ Language = VBScript %>
    Response.Buffer = True
    Dim objHTTP, myVariable

    ' Create an xmlhttp object:
    Set objHTTP = Server.CreateObject("Microsoft.XMLHTTP")

    objHTTP.Open "GET", "", False

    myVariable = objHTTP.responseText
    Set objHTTP = Nothing

    After the above code, myVariable has everything that was output on the web page in a plain old character string.  Now you can commence to strip out what you need.

    LVL 1

    Author Comment

    That's exactly what I was looking for. Thanks for your help.

    I have dabbled in XML in the past, and had a sneaking suspicion that it would do what I wanted, but wasn't sure.

    One last simple Q -- Do you know, can I use XML in .NET? I think I can. (I think I can, I think I can....)

    I appreciate your help!

    LVL 29

    Expert Comment

    Okay, this is an object that happes to be able to pull a document via HTTP like a browser, which would include an XML document, bit in this case you are only retrieving the HTML source of the page.  Not XML.

    Just to clarify.

    Yes you can use it in .NET, PHP, ASP calssig, JScript, JavaScript, etc. etc.  It has become quite ubiquitous (sp?)

    Have fun,
    LVL 1

    Author Comment


    Featured Post

    Courses: Start Training Online With Pros, Today

    Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

    Join & Write a Comment

    Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
    Problem to be resolved in this article Currently, development of website and web application can be done without writing thousands of lines of programming code by hand. Description This can be done through by using a open source framework such …
    The viewer will learn how to count occurrences of each item in an array.
    Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now