Solved

How do I use VBA to loop through linked HTML pages

Posted on 2007-11-14
13
925 Views
Last Modified: 2013-11-27
I have a database that accesses web pages, using VBA, not mine and I have no control over the web pages.  One of the web page queries has returned a page that has several more linked to it using href's.  One of the hrefs will return the next page until there is no more.  Problem is I have no idea how to use the href to access the next page using VBA.

Please provide specific example on how to do this.

Thanks
0
Comment
Question by:c7c4c7
  • 6
  • 5
13 Comments
 
LVL 16

Expert Comment

by:Rick_Rickards
Comment Utility
There's a great many ways this could be done.  Without knowing exactly what you're starting with I've prepared an example that makes some assumptions.  Hopefully it's close enough to illustrate how to do what you want to do by merely altering the code below to fit your situation.

In this case it is assumed that you've loaded each web page's HTML code into a memo field located in a table named "tblWeb" with a field named "HTML".  Simply pushing the data into a string is enough as long as you have built in a way to loop thru all the pages you're interested in.

First you start off with the entire HTML text found on the web page.  This example I'm going to assume you've loaded the entire HTML text from each page into a single memo field inside an Access Table.  Where the data comes from isn't partifcularly important, I just used the table/recordset sample to illustrate the idea.  

From there we can easily extract the link for every "<a href=" tag found within the the HTML code.  We can do this by using the Split() function to create a string array containing the tag "<a href=".  Naturally there will be a bunch of garbage at the end of each string that is why we clean up each element within the array within a loop.  Whether to start counting at 0 or 1 depends on whether your HTML code started with "<a href=" which is why you see the use of...

Abs(Left(rst(0).Value, 9) <> "<a href=""")

... within the for Next loop.  If it starts that way it will start counting at 0 since the very first thing it finds is the tag you're looking for.  Otherwise it starts at 1 since the first element will contain nothing but garbage data that you're not seeking out.

As we loop thru the array we look for the lclose of the "<a href=" tag which appears as ">

Once found we strip everything from the 2nd character to the location of "> less 2 characters so we don't end up with "> at the end of the string.

When done, the entire array is nothing but a cleaned up list of every link found within the HTML document.

Good luck!!!



.  We drop the first character and then we drop everything that extends beyond the characters ">.

The line that reads...

Debug.Print str(lngCount)

... will spit out each of the web links found withing the HTML document.  

Of course you can get fancy and then pull HTML documents from each web link found and repeat the process until you've reached a page w/o any links and/or one that contains links already examined or otherwise deemed of no importance to you.

Hope this helps and proves to be easier than you might have anticipated.  It is probably worth noting that href links are not always the only type of data people find valuable when mining for data in an HTML document but once you've mastered this technique it isn't hard to see how you could use the same technique to extract any other part you were interested in.

Rick


Sub ExtractingHrefs()

    Dim db As DAO.Database

    Dim rst As DAO.Recordset

    Dim str() As String

    Dim lngCount As Long

    Set db = CurrentDb

    Set rst = db.OpenRecordset("SELECT tblWeb.HTML FROM tblWeb;")

    If rst.RecordCount Then

        Do Until rst.EOF

            str() = Split(rst(0), "<a href=")

            For lngCount = Abs(Left(rst(0).Value, 9) <> "<a href=""") To UBound(str)

                str(lngCount) = (Mid(str(lngCount), 2, (InStr(2, str(lngCount), """>") - 2)))

                Debug.Print str(lngCount)

            Next lngCount

            rst.MoveNext

        Loop

    End If

End Sub

Open in new window

0
 
LVL 16

Expert Comment

by:Rick_Rickards
Comment Utility
Sorry about the extra text that follows Good Luck!!!.  Seems I lost track of where I was in the message and left text I rewrote in by accident.  In any case just look to everything above the line that reads Good Luck!!! and you should find your answer.  Code supplied to illustrate it step by step.

Rick
0
 

Author Comment

by:c7c4c7
Comment Utility
I think the title of this question has misled you.  Your statement here is what i am trying to solve.

Of course you can get fancy and then pull HTML documents from each web link found and repeat the process until you've reached a page w/o any links and/or one that contains links already examined or otherwise deemed of no importance to you.

The real meat of the question is this portion

I have no idea how to use the href to access the next page using VBA.

Thanks
0
 
LVL 16

Expert Comment

by:Rick_Rickards
Comment Utility
How are you viewing the web page now?  There is an Internet Exploer Active X control that works really well for displaying and navigating thru web pages, (I know as I use it within my own apps), but before I assume to much, perhaps I should be clear on how you are viewing the web pages now that you wish to then click and/or execute a given href on.

Rick
0
 

Author Comment

by:c7c4c7
Comment Utility
The actual view is through the existing VBA program, that does screen scraping.  The problem I'm trying to get past is how to access the rest of the pages that are referenced via href's.

To answer your question though both IE and Firefox are on the machine though

Thanks for the patience
0
 
LVL 16

Expert Comment

by:Rick_Rickards
Comment Utility
To be honest I'm not sure I'm following the question exactly.  My current guess is you have a hyperlink you'd like your web browser to open.  Below is a very simple example of how to do that, (obviously it can be modified any number of ways to work in a more automated scenario but I thought a very basic example would be best to illustrate the basic idea.

This example assumes you have a form with a command button on it named cmdHyperlink.  Other than that you just place the following code behind the form's Load Event, (obviously it doesn't have to be a form's load event, it's merely something I arbitrarily used for the example.

The result is, when the form loads it will open your web browser to the web address identified as the .HyperlinkAddress of the command button.  In this case it is http://www.microsoft.com/

Is this what you were looking for or am I still missing something?
Private Sub Form_Load()

    Dim ctl As CommandButton

    Set ctl = Me.cmdHyperlink

    With ctl

        .Visible = False

        .HyperlinkAddress = "http://www.microsoft.com/"

        .Hyperlink.Follow

    End With

End Sub

Open in new window

0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:c7c4c7
Comment Utility
Here is a sample of a href I want to follow until there are no more

<a href="list.asp?catid=3&man=MyValue&PG=2&guid=10d56e80563040ab900a0d2669163072"

My question is what is the VBA code to get the  "<a href" value to return the next page so I can screen scrape it?

Thanks
0
 
LVL 16

Expert Comment

by:Rick_Rickards
Comment Utility
So is that value one of many values in a table of many rows or is it one long contiguous string of <a href values that need to be parsed out and followed?
0
 

Author Comment

by:c7c4c7
Comment Utility
The href's are not in the database.  
The hrefs are in the html pages that I am pulling the information for the database from
I just need to understand how to code VBA, with an href,  so when I submit it to the internet I get the html page it references
0
 
LVL 16

Expert Comment

by:Rick_Rickards
Comment Utility
What exactly do you mean when you say... "I just need to understand how to code VBA, with an href,  so when I submit it to the internet I get the html page it references"

Specifically, are you opening the web page in IE, FireFox or some other such browser or for example, opening the web page within an ActiveX Control like the one Windows Provides for IE?

Rick
0
 

Accepted Solution

by:
c7c4c7 earned 0 total points
Comment Utility
I finally figured this out.

Here is what I found

The original page is loaded into the browser by submitting something like this
http://www.mywebsite.com/listings/forsale/list.asp?catid=3&man=targetvalue&guid=D2F2CB92B3DD43F9A7C47C5FB40CA285

the above was all on one line and there were 6 more pages returned from the initial http request.  

My question was how do I get to the other 6 pages.  The answer is replace this  value

man=targetvalue&guid=D2F2CB92B3DD43F9A7C47C5FB40CA285

with the ones associated with the other pages that were pointed to by values like this

a href="list.asp?catid=3&man=NEXTPAGE&PG=2&guid=9124AF3551244CED94D4DBC7F4A47E8E

so the request for the second page would look like this

http://www.mywebsite.com/listings/forsale/list.asp?catid=3&man=NEXTPAGE&PG=2&guid=9124AF3551244CED94D4DBC7F4A47E8E

again it would all be on one line, then submit the request for the next page from VBA

on and on until all the pages were displayed
0
 
LVL 1

Expert Comment

by:modus_operandi
Comment Utility
Closed, 500 points refunded.
modus_operandi
EE Moderator
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Today's users almost expect this to happen in all search boxes. After all, if their favourite search engine juggles with tens of thousand keywords while they type, and suggests matching phrases on the fly, why shouldn't they expect the same from you…
In the article entitled Working with Objects – Part 1 (http://www.experts-exchange.com/Microsoft/Development/MS_Access/A_4942-Working-with-Objects-Part-1.html), you learned the basics of working with objects, properties, methods, and events. In Work…
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
What’s inside an Access Desktop Database. Will look at the basic interface, Navigation Pane (Database Container), Tables, Queries, Forms, Report, Macro’s, and VBA code.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now