Solved

VBA controling Internet Explorer - problem getting 'next page' of results

Posted on 2011-09-12
2
446 Views
Last Modified: 2012-05-12
I am trying capture the links to some publications through a search engine.  There are usually more results than can be returned on one page.  You can not use the URL to get to the next page of results (as far as I can tell).  

I have been able to get IE to click the 'Next' button to get to the next page.  I have put the code in the order that I would expect it to perform the different operations.  Basically,
1. Perform search - puts you on first page (results 1-20)
2. Get desired links to publications 1- 20
3. Click 'Next'
4. Go to page with the next 20 pubs (21-40)
5. Get desired links to publication 21-40

It appears that it is getting 1-20 twice and then advancing to the next page.

Why would it do this and how can I make it perform correctly?

I have provided the code and highlighted and tried to label what I think are the problem areas.


Thanks!
Sub PMCSearch()

Dim appIE As Object ' InternetExplorer.Application
Dim sURL As String
Dim SearchTerm As String
Dim NumOfYears As Integer
Dim PubCount As Integer

Dim WorkSheetName As String
'WorkSheetName = SearchTerm & CStr(Round(Time, 0))

Dim Hits As Integer
Dim LoopCount As Integer

'-----------------------------------------------------------------------------------------------
'HOUSE KEEPING - NO QUESTIONS HERE (I THINK)
' URL input
SearchTerm = "origene"

'Create new sheet named after searchterm
WorkSheetName = SearchTerm & CStr(Hour(Time)) & CStr(Second(Time))
ActiveWorkbook.Worksheets.Add
If Len(WorkSheetName) > 31 Then
    WorkSheetName = Left(WorkSheetName, 30)
End If
ActiveSheet.Name = WorkSheetName

'Clear active sheet
ActiveSheet.Cells.Clear

'Start in A1
Range("A1").Select

'-----------------------------------------------------------------------------------------------
'THIS PART WORKS - NOQUESTIONS (I THINK)
'Start Internet Explorer
Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")

'USE URL TO PERFORM SEARCH

    sURL = "http://www.ncbi.nlm.nih.gov/pmc/?term=" & SearchTerm & "%5BAll+Fields%5D+AND+(%222008%2F09%2F11%22%5BPubDate%5D+%3A+%222011%2F09%2F10%22%5BPubDate%5D)&cmd=DetailsSearch"
    Debug.Print sURL
    With appIE
        .navigate sURL
        ' uncomment the line below if you want to watch the code execute, or for debugging
        .Visible = True
    End With

    ' loop until the page finishes loading
    Do While appIE.readyState <> 4
        DoEvents
    Loop

    Set HTMLDoc = appIE.document

'GET LINKS FROM PAGE AND PUT IN SPREADSHEET
    Set Publinks = HTMLDoc.Links
    Debug.Print "Links length  = " & Publinks.Length
    PubLinksLength = Publinks.Length
    For i = 3 To PubLinksLength - 1
        'THE NEXT 4 LINES ARE REQUIRED TO DORT OUR THE LINKS I WANT FROM THE REST
        Test1 = Publinks.Item(i).href
        Test2 = Publinks.Item(i - 2).href
        If Right(Publinks.Item(i).href, 4) = "trez" And InStr(1, Publinks.Item(i), "pdf") = 0 And Test1 <> Test2 Then
        Debug.Print "number " & i & " " & Publinks.Item(i).href
        'PUT THEM IN THE SPREASHEET
        ActiveCell.Value = Publinks.Item(i).href
        ActiveCell.Offset(1, 0).Select
        
        End If
    Next i
        
'-----------------------------------------------------------------------------------------------
'***PROBLEM IN THIS AREA - THIS APPEARS TO BE WHERE IT DOES NOT FUNTION LIKE I WANT IT TO

'GOTO NEXT PAGE BY CLICKING LINK
        'FIND THE 'NEXT PAGE' LINK
        Set NextButton = HTMLDoc.getElementById("EntrezSystem2.PEntrez.Pmc.Pmc_ResultsPanel.Entrez_Pager.Page")
        NextButton.Click
                
        'LET THE NEXT PAGE LOAD
        Do While appIE.readyState <> 4
            DoEvents
        Loop
        
        'GET THE NEXT PAGE AS A DOCUMENT OBJECT
        Set HTMLDoc = appIE.document

        'GET LINKS FROM PAGE AND PUT IN SPREADSHEET
'*** PROBLEM HERE???  - THIS IS STILL RETURNS THE LINKS FROM THE FIRST PAGE
'DUPLICATE CODE FROM ABOVE BUT I EXPECT THE DOCUMENT TO HAVE THE NEXT PAGE OF RESULTS FOLLOWING THE NextButton.Click
    Set Publinks = HTMLDoc.Links
    Debug.Print "Links length  = " & Publinks.Length
    PubLinksLength = Publinks.Length
    For i = 3 To PubLinksLength - 1
        'THE NEXT 4 LINES ARE REQUIRED TO DORT OUR THE LINKS I WANT FROM THE REST
        Test1 = Publinks.Item(i).href
        Test2 = Publinks.Item(i - 2).href
        If Right(Publinks.Item(i).href, 4) = "trez" And InStr(1, Publinks.Item(i), "pdf") = 0 And Test1 <> Test2 Then
        Debug.Print "number " & i & " " & Publinks.Item(i).href
        'PUT THEM IN THE SPREASHEET
        ActiveCell.Value = Publinks.Item(i).href
        ActiveCell.Offset(1, 0).Select
        
        End If
    Next i
 
End Sub

Open in new window

0
Comment
Question by:Ben210
2 Comments
 
LVL 16

Accepted Solution

by:
carsRST earned 500 total points
ID: 36524789
My guess is the page is still loading, even though you have the "appIE.readyState <> 4"


You might try this...

Do While appIE.Busy Or appIE.readyState <> 4
   DoEvents
Loop



If that doesn't work, then you might start testing for something unique on the 21-40 page that you know is there.  When you see that the value show up, then proceed with 21-40.



Third option would just be to kill enough time through "do while" loops that you know for a fact the page has loaded.  Then proceed on with your 21-40 code.
0
 

Author Comment

by:Ben210
ID: 36524860
Adding 'Do while appIE.Busy Or ' worked!!!

I love a simple solution!

Thanks so much.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Vb.net dynamic formulas in runtime 11 63
What is format f12.8 for a CSV file 6 41
MS Access 03, TransferText, decimal places 8 46
Auto Adjust Percent rate 5 30
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

896 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now