Solved

VBA controling Internet Explorer - problem getting 'next page' of results

Posted on 2011-09-12
2
442 Views
Last Modified: 2012-05-12
I am trying capture the links to some publications through a search engine.  There are usually more results than can be returned on one page.  You can not use the URL to get to the next page of results (as far as I can tell).  

I have been able to get IE to click the 'Next' button to get to the next page.  I have put the code in the order that I would expect it to perform the different operations.  Basically,
1. Perform search - puts you on first page (results 1-20)
2. Get desired links to publications 1- 20
3. Click 'Next'
4. Go to page with the next 20 pubs (21-40)
5. Get desired links to publication 21-40

It appears that it is getting 1-20 twice and then advancing to the next page.

Why would it do this and how can I make it perform correctly?

I have provided the code and highlighted and tried to label what I think are the problem areas.


Thanks!
Sub PMCSearch()

Dim appIE As Object ' InternetExplorer.Application
Dim sURL As String
Dim SearchTerm As String
Dim NumOfYears As Integer
Dim PubCount As Integer

Dim WorkSheetName As String
'WorkSheetName = SearchTerm & CStr(Round(Time, 0))

Dim Hits As Integer
Dim LoopCount As Integer

'-----------------------------------------------------------------------------------------------
'HOUSE KEEPING - NO QUESTIONS HERE (I THINK)
' URL input
SearchTerm = "origene"

'Create new sheet named after searchterm
WorkSheetName = SearchTerm & CStr(Hour(Time)) & CStr(Second(Time))
ActiveWorkbook.Worksheets.Add
If Len(WorkSheetName) > 31 Then
    WorkSheetName = Left(WorkSheetName, 30)
End If
ActiveSheet.Name = WorkSheetName

'Clear active sheet
ActiveSheet.Cells.Clear

'Start in A1
Range("A1").Select

'-----------------------------------------------------------------------------------------------
'THIS PART WORKS - NOQUESTIONS (I THINK)
'Start Internet Explorer
Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")

'USE URL TO PERFORM SEARCH

    sURL = "http://www.ncbi.nlm.nih.gov/pmc/?term=" & SearchTerm & "%5BAll+Fields%5D+AND+(%222008%2F09%2F11%22%5BPubDate%5D+%3A+%222011%2F09%2F10%22%5BPubDate%5D)&cmd=DetailsSearch"
    Debug.Print sURL
    With appIE
        .navigate sURL
        ' uncomment the line below if you want to watch the code execute, or for debugging
        .Visible = True
    End With

    ' loop until the page finishes loading
    Do While appIE.readyState <> 4
        DoEvents
    Loop

    Set HTMLDoc = appIE.document

'GET LINKS FROM PAGE AND PUT IN SPREADSHEET
    Set Publinks = HTMLDoc.Links
    Debug.Print "Links length  = " & Publinks.Length
    PubLinksLength = Publinks.Length
    For i = 3 To PubLinksLength - 1
        'THE NEXT 4 LINES ARE REQUIRED TO DORT OUR THE LINKS I WANT FROM THE REST
        Test1 = Publinks.Item(i).href
        Test2 = Publinks.Item(i - 2).href
        If Right(Publinks.Item(i).href, 4) = "trez" And InStr(1, Publinks.Item(i), "pdf") = 0 And Test1 <> Test2 Then
        Debug.Print "number " & i & " " & Publinks.Item(i).href
        'PUT THEM IN THE SPREASHEET
        ActiveCell.Value = Publinks.Item(i).href
        ActiveCell.Offset(1, 0).Select
        
        End If
    Next i
        
'-----------------------------------------------------------------------------------------------
'***PROBLEM IN THIS AREA - THIS APPEARS TO BE WHERE IT DOES NOT FUNTION LIKE I WANT IT TO

'GOTO NEXT PAGE BY CLICKING LINK
        'FIND THE 'NEXT PAGE' LINK
        Set NextButton = HTMLDoc.getElementById("EntrezSystem2.PEntrez.Pmc.Pmc_ResultsPanel.Entrez_Pager.Page")
        NextButton.Click
                
        'LET THE NEXT PAGE LOAD
        Do While appIE.readyState <> 4
            DoEvents
        Loop
        
        'GET THE NEXT PAGE AS A DOCUMENT OBJECT
        Set HTMLDoc = appIE.document

        'GET LINKS FROM PAGE AND PUT IN SPREADSHEET
'*** PROBLEM HERE???  - THIS IS STILL RETURNS THE LINKS FROM THE FIRST PAGE
'DUPLICATE CODE FROM ABOVE BUT I EXPECT THE DOCUMENT TO HAVE THE NEXT PAGE OF RESULTS FOLLOWING THE NextButton.Click
    Set Publinks = HTMLDoc.Links
    Debug.Print "Links length  = " & Publinks.Length
    PubLinksLength = Publinks.Length
    For i = 3 To PubLinksLength - 1
        'THE NEXT 4 LINES ARE REQUIRED TO DORT OUR THE LINKS I WANT FROM THE REST
        Test1 = Publinks.Item(i).href
        Test2 = Publinks.Item(i - 2).href
        If Right(Publinks.Item(i).href, 4) = "trez" And InStr(1, Publinks.Item(i), "pdf") = 0 And Test1 <> Test2 Then
        Debug.Print "number " & i & " " & Publinks.Item(i).href
        'PUT THEM IN THE SPREASHEET
        ActiveCell.Value = Publinks.Item(i).href
        ActiveCell.Offset(1, 0).Select
        
        End If
    Next i
 
End Sub

Open in new window

0
Comment
Question by:Ben210
2 Comments
 
LVL 16

Accepted Solution

by:
carsRST earned 500 total points
ID: 36524789
My guess is the page is still loading, even though you have the "appIE.readyState <> 4"


You might try this...

Do While appIE.Busy Or appIE.readyState <> 4
   DoEvents
Loop



If that doesn't work, then you might start testing for something unique on the 21-40 page that you know is there.  When you see that the value show up, then proceed with 21-40.



Third option would just be to kill enough time through "do while" loops that you know for a fact the page has loaded.  Then proceed on with your 21-40 code.
0
 

Author Comment

by:Ben210
ID: 36524860
Adding 'Do while appIE.Busy Or ' worked!!!

I love a simple solution!

Thanks so much.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now