?
Solved

Scrape Data from a list of hyperlinks

Posted on 2016-11-10
3
Medium Priority
?
93 Views
Last Modified: 2016-11-11
Hello Everyone,
I have the following routine.  The purpose is to scrape data from a list of web pages.  The links to these pages are listed in a column in the worksheet.  The code loops through each cell containing a URL, imports data from the associated web page to the worksheet, deletes some unwanted information, transposes the remaining data from rows to columns, and moves on to the next cell.  

The problem is the code keeps hanging and I'm not sure why.  The web links are valid.  I can manually import the data from the website but the code seems the go slower and slower until it finally grinds to a halt.

In the example file there are about 150 rows already done.  To test the code, select cell B150 and run the ScrapeCompanyInfo macro.  It will continue down the list until it gets stuck again.

I'm not experienced with querying web data using Excel.  Maybe there's better way to do it.  Any help would be appreciated.  

Thanks,
Kyle

Sub ScrapeCompanyInfo()
Dim rng As Range, rFind As Range, rEnd As Range
Dim xConnect As Object
Set rng = Selection
Do Until rng = ""
    With ActiveSheet.QueryTables.Add(Connection:="URL;" & rng, _
        Destination:=rng.Offset(0, 1))
        .Name = Cells(rng.Row, 1)
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Set rFind = rng.Offset(0, 1).EntireColumn.Find(Cells(rng.Row, 1), , xlValues, xlWhole)
    Range(rng.Offset(0, 1), rFind.Offset(-1)).Delete xlShiftUp
    Set rEnd = Cells(Rows.Count, rng.Offset(0, 1).Column).End(xlUp)
    Range(rng.Offset(12, 1), rEnd).ClearContents
    Set rEnd = Cells(Rows.Count, rng.Offset(0, 1).Column).End(xlUp)
    With Range(rng.Offset(0, 1), rEnd)
        .Copy
        .Resize(1).Offset(0, 1).PasteSpecial Transpose:=True
        .ClearContents
    End With
    Set rng = rng.Offset(1)
    For Each xConnect In ActiveWorkbook.Connections
        If xConnect.Name <> "ThisWorkbookDataModel" Then xConnect.Delete
    Next xConnect
    ThisWorkbook.Save
Loop
End Sub

Open in new window

Equipment-Directory.xlsm
0
Comment
Question by:kgerb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 14

Accepted Solution

by:
Pierre Cornelius earned 2000 total points
ID: 41883530
I have made some changes to your code and added some comments. The biggest change is that you were creating new connections during each loop whereas my code just creates one and reuses it to fetch the data. Also you were saving during each loop, rather just save at the end or even leave saving up to the user.

Also note that it will run until the first blank cell found and you have some blank rows in your spreadsheet so it won't run to the end. But I guess you know this already.

Try it out and let me know how it goes.
Sub ScrapeCompanyInfo()
    Dim rng As Range, rFind As Range, rEnd As Range
    Dim xConnect As Object
    Set rng = Selection
    
    Dim UrlFetcher As QueryTable
    Set UrlFetcher = ActiveSheet.QueryTables.Add(Connection:="URL;", Destination:=rng.Offset(0, 1))
    
    With UrlFetcher
        .Name = "scraper"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False 'True - I chagned this to false, we don't want asynchronous queries
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = False 'True - I changed this to false, we merely fetch and extract what we need so no need to save with spreadsheet
        .AdjustColumnWidth = False 'True - no need for this, we merely fetch and extract what we need so no need to adjust
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        '.Refresh BackgroundQuery:=False This will happen in the loop...
    End With
    
    Do Until rng = ""
        With UrlFetcher
            .Connection = "URL;" & rng.Value
            .Destination = rng.Offset(0, 1)
            If .Refresh Then
                Set rFind = rng.Offset(0, 1).EntireColumn.Find(Cells(rng.Row, 1), , xlValues, xlWhole)
                Range(rng.Offset(0, 1), rFind.Offset(-1)).Delete xlShiftUp
                Set rEnd = Cells(Rows.Count, rng.Offset(0, 1).Column).End(xlUp)
                Range(rng.Offset(12, 1), rEnd).ClearContents
                Set rEnd = Cells(Rows.Count, rng.Offset(0, 1).Column).End(xlUp)
                With Range(rng.Offset(0, 1), rEnd)
                    .Copy
                    .Resize(1).Offset(0, 1).PasteSpecial Transpose:=True
                    .ClearContents
                End With
                
            End If
            Set rng = rng.Offset(1)
        End With
        
        'For Each xConnect In ActiveWorkbook.Connections
        '    If xConnect.Name <> "ThisWorkbookDataModel" Then xConnect.Delete
        'Next xConnect
        'ThisWorkbook.Save Don't save in the loop, save after looping or leave it up to the user...
    Loop
    For Each xConnect In ActiveWorkbook.Connections
        If xConnect.Name <> "ThisWorkbookDataModel" Then xConnect.Delete
    Next xConnect
End Sub

Open in new window

0
 
LVL 12

Author Comment

by:kgerb
ID: 41883775
Thanks Pierre,
It seems to be working better with your modifications.  It's still super slow though.  I'm going to try a different approach but you answered my question so thank you!

regards,
Kyle
0
 
LVL 14

Expert Comment

by:Pierre Cornelius
ID: 41883777
You're welcome.

The slow part is connecting to the web and loading the info. You can see this when stepping through the code in debug mode. The call to .Refresh takes some time...
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
This article describes a serious pitfall that can happen when deleting shapes using VBA.
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question