Solved

Screen scraper - getting data out of web pages and into spreadsheets

Posted on 2014-02-26
7
243 Views
Last Modified: 2014-03-06
Hi,

I'm trying to find a way to extract multiple addresses from attached yellowpage (example) to Excel spreadsheet.
The challenge I'm facing is, I need to get these addresses from 100+ pages, and there is no 'View All' option on the site.
I need to put all of these business names and addresses in Excel spreadsheet format.
Do you know any way to do this without copying & pasting 1000+ times?

Thanks!
Yellowpage.docx
0
Comment
Question by:iamnamja
  • 3
  • 2
7 Comments
 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
Do you know any programming language?
0
 
LVL 23

Expert Comment

by:David
Comment Utility
My approach might be to view and save the underlying HTML to a text document.  Then, there are many text editors or screen scrapers that could search by the element's tag.

Curious, how do you plan to address the data owner's copyrights?
0
 
LVL 23

Expert Comment

by:David
Comment Utility
Ping to the author, asking for any update or closure.

dvz
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 38

Expert Comment

by:Aaron Tomosky
Comment Utility
Pretty sure I mentioned using an excel macro and then pasted a link. Well I'm sure you can search for "address on website to excel macro" yourself
0
 

Author Comment

by:iamnamja
Comment Utility
I've requested that this question be deleted for the following reason:

I believe my question was not fully explained correctly because I see none of the answers address my question. I cannot accept any of the answers as solution because they didn't provide any actual solutions.
0
 
LVL 23

Accepted Solution

by:
David earned 500 total points
Comment Utility
Okay, what if we tell you that in our collective, professional opinion, your stated goal cannot be done as stated.  With the criteria that there are "hundreds of pages", instead of one long scrollable region, something has to advance the focus from one URL to the next -- robot or spider fashion.

Any workaround would require some repetitive process, not necessarily a manual "cut and paste" as stated.  To my earlier comment, you or your code would have to examine the source code (Firefox is rt clk | view page source).  Locate the HTML tag that wraps the desired field string.

With programming, this step and capture could be automated, as Aaron suggested.  In line with padas' administrative comment above, the following link is offered as a suggestion that demonstrates several techniques of extracting HTML data by means of Excel:  http://msdn.microsoft.com/en-us/library/aa203721%28v=office.11%29.aspx

My regrets if I have still failed to meet your expectations -- but at least I wanted to make the extra effort to help you find your answer.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
CDC audit 17 84
Move SQL 2005 Express to Server 2012R2 19 68
Exchange 2010 Mailbox Database offline 2 48
report returning null 21 49
Microsoft Office Picture Manager was included in Office 2003, 2007, and 2010, but not in Office 2013. Users had hopes that it would be in Office 2016/Office 365, but it is not. Fortunately, the same zero-cost technique that works to install it with …
This article explains all about SQL Server Piecemeal Restore with examples in step by step manner.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Microsoft Office Picture Manager is not included in Office 2013. This comes as quite a surprise to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This video expla…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now