Solved

VBA controling Internet Explorer - problem getting 'next page' of results

Posted on 2011-09-12
2
452 Views
Last Modified: 2012-05-12
I am trying capture the links to some publications through a search engine.  There are usually more results than can be returned on one page.  You can not use the URL to get to the next page of results (as far as I can tell).  

I have been able to get IE to click the 'Next' button to get to the next page.  I have put the code in the order that I would expect it to perform the different operations.  Basically,
1. Perform search - puts you on first page (results 1-20)
2. Get desired links to publications 1- 20
3. Click 'Next'
4. Go to page with the next 20 pubs (21-40)
5. Get desired links to publication 21-40

It appears that it is getting 1-20 twice and then advancing to the next page.

Why would it do this and how can I make it perform correctly?

I have provided the code and highlighted and tried to label what I think are the problem areas.


Thanks!
Sub PMCSearch()

Dim appIE As Object ' InternetExplorer.Application
Dim sURL As String
Dim SearchTerm As String
Dim NumOfYears As Integer
Dim PubCount As Integer

Dim WorkSheetName As String
'WorkSheetName = SearchTerm & CStr(Round(Time, 0))

Dim Hits As Integer
Dim LoopCount As Integer

'-----------------------------------------------------------------------------------------------
'HOUSE KEEPING - NO QUESTIONS HERE (I THINK)
' URL input
SearchTerm = "origene"

'Create new sheet named after searchterm
WorkSheetName = SearchTerm & CStr(Hour(Time)) & CStr(Second(Time))
ActiveWorkbook.Worksheets.Add
If Len(WorkSheetName) > 31 Then
    WorkSheetName = Left(WorkSheetName, 30)
End If
ActiveSheet.Name = WorkSheetName

'Clear active sheet
ActiveSheet.Cells.Clear

'Start in A1
Range("A1").Select

'-----------------------------------------------------------------------------------------------
'THIS PART WORKS - NOQUESTIONS (I THINK)
'Start Internet Explorer
Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")

'USE URL TO PERFORM SEARCH

    sURL = "http://www.ncbi.nlm.nih.gov/pmc/?term=" & SearchTerm & "%5BAll+Fields%5D+AND+(%222008%2F09%2F11%22%5BPubDate%5D+%3A+%222011%2F09%2F10%22%5BPubDate%5D)&cmd=DetailsSearch"
    Debug.Print sURL
    With appIE
        .navigate sURL
        ' uncomment the line below if you want to watch the code execute, or for debugging
        .Visible = True
    End With

    ' loop until the page finishes loading
    Do While appIE.readyState <> 4
        DoEvents
    Loop

    Set HTMLDoc = appIE.document

'GET LINKS FROM PAGE AND PUT IN SPREADSHEET
    Set Publinks = HTMLDoc.Links
    Debug.Print "Links length  = " & Publinks.Length
    PubLinksLength = Publinks.Length
    For i = 3 To PubLinksLength - 1
        'THE NEXT 4 LINES ARE REQUIRED TO DORT OUR THE LINKS I WANT FROM THE REST
        Test1 = Publinks.Item(i).href
        Test2 = Publinks.Item(i - 2).href
        If Right(Publinks.Item(i).href, 4) = "trez" And InStr(1, Publinks.Item(i), "pdf") = 0 And Test1 <> Test2 Then
        Debug.Print "number " & i & " " & Publinks.Item(i).href
        'PUT THEM IN THE SPREASHEET
        ActiveCell.Value = Publinks.Item(i).href
        ActiveCell.Offset(1, 0).Select
        
        End If
    Next i
        
'-----------------------------------------------------------------------------------------------
'***PROBLEM IN THIS AREA - THIS APPEARS TO BE WHERE IT DOES NOT FUNTION LIKE I WANT IT TO

'GOTO NEXT PAGE BY CLICKING LINK
        'FIND THE 'NEXT PAGE' LINK
        Set NextButton = HTMLDoc.getElementById("EntrezSystem2.PEntrez.Pmc.Pmc_ResultsPanel.Entrez_Pager.Page")
        NextButton.Click
                
        'LET THE NEXT PAGE LOAD
        Do While appIE.readyState <> 4
            DoEvents
        Loop
        
        'GET THE NEXT PAGE AS A DOCUMENT OBJECT
        Set HTMLDoc = appIE.document

        'GET LINKS FROM PAGE AND PUT IN SPREADSHEET
'*** PROBLEM HERE???  - THIS IS STILL RETURNS THE LINKS FROM THE FIRST PAGE
'DUPLICATE CODE FROM ABOVE BUT I EXPECT THE DOCUMENT TO HAVE THE NEXT PAGE OF RESULTS FOLLOWING THE NextButton.Click
    Set Publinks = HTMLDoc.Links
    Debug.Print "Links length  = " & Publinks.Length
    PubLinksLength = Publinks.Length
    For i = 3 To PubLinksLength - 1
        'THE NEXT 4 LINES ARE REQUIRED TO DORT OUR THE LINKS I WANT FROM THE REST
        Test1 = Publinks.Item(i).href
        Test2 = Publinks.Item(i - 2).href
        If Right(Publinks.Item(i).href, 4) = "trez" And InStr(1, Publinks.Item(i), "pdf") = 0 And Test1 <> Test2 Then
        Debug.Print "number " & i & " " & Publinks.Item(i).href
        'PUT THEM IN THE SPREASHEET
        ActiveCell.Value = Publinks.Item(i).href
        ActiveCell.Offset(1, 0).Select
        
        End If
    Next i
 
End Sub

Open in new window

0
Comment
Question by:Ben210
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 16

Accepted Solution

by:
carsRST earned 500 total points
ID: 36524789
My guess is the page is still loading, even though you have the "appIE.readyState <> 4"


You might try this...

Do While appIE.Busy Or appIE.readyState <> 4
   DoEvents
Loop



If that doesn't work, then you might start testing for something unique on the 21-40 page that you know is there.  When you see that the value show up, then proceed with 21-40.



Third option would just be to kill enough time through "do while" loops that you know for a fact the page has loaded.  Then proceed on with your 21-40 code.
0
 

Author Comment

by:Ben210
ID: 36524860
Adding 'Do while appIE.Busy Or ' worked!!!

I love a simple solution!

Thanks so much.
0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
How to get Spreadsheet Compare 2016 working with the 64 bit version of Office 2016
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question