Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How do I use VBA to loop through linked HTML pages

Posted on 2007-11-14
13
Medium Priority
?
939 Views
Last Modified: 2013-11-27
I have a database that accesses web pages, using VBA, not mine and I have no control over the web pages.  One of the web page queries has returned a page that has several more linked to it using href's.  One of the hrefs will return the next page until there is no more.  Problem is I have no idea how to use the href to access the next page using VBA.

Please provide specific example on how to do this.

Thanks
0
Comment
Question by:c7c4c7
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
13 Comments
 
LVL 16

Expert Comment

by:Rick_Rickards
ID: 20305753
There's a great many ways this could be done.  Without knowing exactly what you're starting with I've prepared an example that makes some assumptions.  Hopefully it's close enough to illustrate how to do what you want to do by merely altering the code below to fit your situation.

In this case it is assumed that you've loaded each web page's HTML code into a memo field located in a table named "tblWeb" with a field named "HTML".  Simply pushing the data into a string is enough as long as you have built in a way to loop thru all the pages you're interested in.

First you start off with the entire HTML text found on the web page.  This example I'm going to assume you've loaded the entire HTML text from each page into a single memo field inside an Access Table.  Where the data comes from isn't partifcularly important, I just used the table/recordset sample to illustrate the idea.  

From there we can easily extract the link for every "<a href=" tag found within the the HTML code.  We can do this by using the Split() function to create a string array containing the tag "<a href=".  Naturally there will be a bunch of garbage at the end of each string that is why we clean up each element within the array within a loop.  Whether to start counting at 0 or 1 depends on whether your HTML code started with "<a href=" which is why you see the use of...

Abs(Left(rst(0).Value, 9) <> "<a href=""")

... within the for Next loop.  If it starts that way it will start counting at 0 since the very first thing it finds is the tag you're looking for.  Otherwise it starts at 1 since the first element will contain nothing but garbage data that you're not seeking out.

As we loop thru the array we look for the lclose of the "<a href=" tag which appears as ">

Once found we strip everything from the 2nd character to the location of "> less 2 characters so we don't end up with "> at the end of the string.

When done, the entire array is nothing but a cleaned up list of every link found within the HTML document.

Good luck!!!



.  We drop the first character and then we drop everything that extends beyond the characters ">.

The line that reads...

Debug.Print str(lngCount)

... will spit out each of the web links found withing the HTML document.  

Of course you can get fancy and then pull HTML documents from each web link found and repeat the process until you've reached a page w/o any links and/or one that contains links already examined or otherwise deemed of no importance to you.

Hope this helps and proves to be easier than you might have anticipated.  It is probably worth noting that href links are not always the only type of data people find valuable when mining for data in an HTML document but once you've mastered this technique it isn't hard to see how you could use the same technique to extract any other part you were interested in.

Rick


Sub ExtractingHrefs()
    Dim db As DAO.Database
    Dim rst As DAO.Recordset
    Dim str() As String
    Dim lngCount As Long
    Set db = CurrentDb
    Set rst = db.OpenRecordset("SELECT tblWeb.HTML FROM tblWeb;")
    If rst.RecordCount Then
        Do Until rst.EOF
            str() = Split(rst(0), "<a href=")
            For lngCount = Abs(Left(rst(0).Value, 9) <> "<a href=""") To UBound(str)
                str(lngCount) = (Mid(str(lngCount), 2, (InStr(2, str(lngCount), """>") - 2)))
                Debug.Print str(lngCount)
            Next lngCount
            rst.MoveNext
        Loop
    End If
End Sub

Open in new window

0
 
LVL 16

Expert Comment

by:Rick_Rickards
ID: 20305765
Sorry about the extra text that follows Good Luck!!!.  Seems I lost track of where I was in the message and left text I rewrote in by accident.  In any case just look to everything above the line that reads Good Luck!!! and you should find your answer.  Code supplied to illustrate it step by step.

Rick
0
 

Author Comment

by:c7c4c7
ID: 20306440
I think the title of this question has misled you.  Your statement here is what i am trying to solve.

Of course you can get fancy and then pull HTML documents from each web link found and repeat the process until you've reached a page w/o any links and/or one that contains links already examined or otherwise deemed of no importance to you.

The real meat of the question is this portion

I have no idea how to use the href to access the next page using VBA.

Thanks
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 16

Expert Comment

by:Rick_Rickards
ID: 20306524
How are you viewing the web page now?  There is an Internet Exploer Active X control that works really well for displaying and navigating thru web pages, (I know as I use it within my own apps), but before I assume to much, perhaps I should be clear on how you are viewing the web pages now that you wish to then click and/or execute a given href on.

Rick
0
 

Author Comment

by:c7c4c7
ID: 20308334
The actual view is through the existing VBA program, that does screen scraping.  The problem I'm trying to get past is how to access the rest of the pages that are referenced via href's.

To answer your question though both IE and Firefox are on the machine though

Thanks for the patience
0
 
LVL 16

Expert Comment

by:Rick_Rickards
ID: 20308698
To be honest I'm not sure I'm following the question exactly.  My current guess is you have a hyperlink you'd like your web browser to open.  Below is a very simple example of how to do that, (obviously it can be modified any number of ways to work in a more automated scenario but I thought a very basic example would be best to illustrate the basic idea.

This example assumes you have a form with a command button on it named cmdHyperlink.  Other than that you just place the following code behind the form's Load Event, (obviously it doesn't have to be a form's load event, it's merely something I arbitrarily used for the example.

The result is, when the form loads it will open your web browser to the web address identified as the .HyperlinkAddress of the command button.  In this case it is http://www.microsoft.com/

Is this what you were looking for or am I still missing something?
Private Sub Form_Load()
    Dim ctl As CommandButton
    Set ctl = Me.cmdHyperlink
    With ctl
        .Visible = False
        .HyperlinkAddress = "http://www.microsoft.com/"
        .Hyperlink.Follow
    End With
End Sub

Open in new window

0
 

Author Comment

by:c7c4c7
ID: 20309337
Here is a sample of a href I want to follow until there are no more

<a href="list.asp?catid=3&man=MyValue&PG=2&guid=10d56e80563040ab900a0d2669163072"

My question is what is the VBA code to get the  "<a href" value to return the next page so I can screen scrape it?

Thanks
0
 
LVL 16

Expert Comment

by:Rick_Rickards
ID: 20309542
So is that value one of many values in a table of many rows or is it one long contiguous string of <a href values that need to be parsed out and followed?
0
 

Author Comment

by:c7c4c7
ID: 20310539
The href's are not in the database.  
The hrefs are in the html pages that I am pulling the information for the database from
I just need to understand how to code VBA, with an href,  so when I submit it to the internet I get the html page it references
0
 
LVL 16

Expert Comment

by:Rick_Rickards
ID: 20379077
What exactly do you mean when you say... "I just need to understand how to code VBA, with an href,  so when I submit it to the internet I get the html page it references"

Specifically, are you opening the web page in IE, FireFox or some other such browser or for example, opening the web page within an ActiveX Control like the one Windows Provides for IE?

Rick
0
 

Accepted Solution

by:
c7c4c7 earned 0 total points
ID: 20379891
I finally figured this out.

Here is what I found

The original page is loaded into the browser by submitting something like this
http://www.mywebsite.com/listings/forsale/list.asp?catid=3&man=targetvalue&guid=D2F2CB92B3DD43F9A7C47C5FB40CA285

the above was all on one line and there were 6 more pages returned from the initial http request.  

My question was how do I get to the other 6 pages.  The answer is replace this  value

man=targetvalue&guid=D2F2CB92B3DD43F9A7C47C5FB40CA285

with the ones associated with the other pages that were pointed to by values like this

a href="list.asp?catid=3&man=NEXTPAGE&PG=2&guid=9124AF3551244CED94D4DBC7F4A47E8E

so the request for the second page would look like this

http://www.mywebsite.com/listings/forsale/list.asp?catid=3&man=NEXTPAGE&PG=2&guid=9124AF3551244CED94D4DBC7F4A47E8E 

again it would all be on one line, then submit the request for the next page from VBA

on and on until all the pages were displayed
0
 
LVL 1

Expert Comment

by:modus_operandi
ID: 20411505
Closed, 500 points refunded.
modus_operandi
EE Moderator
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Access custom database properties are useful for storing miscellaneous bits of information in a format that persists through database closing and reopening.  This article shows how to create and use them.
Microsoft Access is a place to store data within tables and represent this stored data using multiple database objects such as in form of macros, forms, reports, etc. After a MS Access database is created there is need to improve the performance and…
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question