Need to programmatically capture information from a website.

I am trying to gather information from a website (that I don't own)

I log in manually, and then using EFGrabber and WinBatch, I have created a macro that scrolls through records that I have searched and extracts the information I am looking for into an excel file (without my having to type each record one by one....)

The challenge is this -- Because it's a click location based macro, if the next button is even slightly off, the macro hangs.

I am wondering if there is a better way to do this.... Can i use XML, or write a script, or something like that?

Any direction would be greatly appreciated. I'm kind of at a loss on how to proceed.

FYI: I am proficient in ASP and .NET, VB, and VBScript, but could use another language :sigh: if absolutely necessary.

Who is Participating?
rdivilbissConnect With a Mentor Commented:
>BTW: The stuff I'm searching is in public domain (County Property Appraiser's Website) I'm just trying to avoid the manual process of copying, pasting, or typing. Takes a really long time.

Thank you.
Using the XMLHTTP object, you can emulate a web browser, which means you can use both the post and get methods.  The XML part of the name isn't really relavent to you here, it is the HTTP part of the name you need.

What you want to concentrate on is the links to the detail records.

If you can predict the links for the properties you need the details for, you can write a script to fetch those pages using the XMLHTTP object as the go between.

It will return the content of the page and you can use string manipulation functions (Left, Right, Mid, InStr) to carve out the part you want.

Add that to the file system object and you can write the results to a CSV file on your machine which Excel will open as though it were native XLS.

<%@ Language = VBScript %>
Response.Buffer = True
Dim objHTTP, myVariable

' Create an xmlhttp object:
Set objHTTP = Server.CreateObject("Microsoft.XMLHTTP")

objHTTP.Open "GET", "¶m2=val2", False

myVariable = objHTTP.responseText
Set objHTTP = Nothing

After the above code, myVariable has everything that was output on the web page in a plain old character string.  Now you can commence to strip out what you need.

Have permission?
fruhjConnect With a Mentor Commented:
There's a microsoft component called MSXML 4.0
you can download it free from microsoft

I've copied this tidbit from the SDK docs:
var xmlhttp = new ActiveXObject("Msxml2.XMLHTTP.4.0");"GET","http://myserver/save.asp", false);

or this, using the server version of the control:
var srvXmlHttp
srvXmlHttp = Server.CreateObject("Msxml2.ServerXMLHTTP.4.0"); ("GET", "http://myserver/myresponse.asp", false);
newsElement = srvXmlHttp.responseXML.selectSingleNode("/news/story1");

<p>Top News Story<p>

Basically, these controlls let you open a web page and work with the results. If the page has consistant formatting (ie lets say the data you need is in a table that always has the same ID) you can use that to pull specific pieces of the page you need.

- Jack

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

VeeVanAuthor Commented:
Can I use the XML to jump from one page to the next?

Here's the scenario:
1.  I manually do a search for the information that I want.
2.  It then comes up in a list format that has partial information.
3.  I click on one of the detail records and pull out a couple of fields into Excel using EFGrabber.
4.  Then, I click next to go to the next detail page.
5.  I repeat steps 3 and 4 until I have all the info that I want.

BTW: The stuff I'm searching is in public domain (County Property Appraiser's Website) I'm just trying to avoid the manual process of copying, pasting, or typing. Takes a really long time.

So what I need is a method to both grab the information: for which it seems XML should work nicely.....and also forward to the next page programmatically -- and I don't know if that's possible.

Thanks again for all your assistance.

VeeVanAuthor Commented:
That's exactly what I was looking for. Thanks for your help.

I have dabbled in XML in the past, and had a sneaking suspicion that it would do what I wanted, but wasn't sure.

One last simple Q -- Do you know, can I use XML in .NET? I think I can. (I think I can, I think I can....)

I appreciate your help!

Okay, this is an object that happes to be able to pull a document via HTTP like a browser, which would include an XML document, bit in this case you are only retrieving the HTML source of the page.  Not XML.

Just to clarify.

Yes you can use it in .NET, PHP, ASP calssig, JScript, JavaScript, etc. etc.  It has become quite ubiquitous (sp?)

Have fun,
VeeVanAuthor Commented:
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.