Solved

Screen scraper - getting data out of web pages and into spreadsheets

Posted on 2014-02-26
7
248 Views
Last Modified: 2014-03-06
Hi,

I'm trying to find a way to extract multiple addresses from attached yellowpage (example) to Excel spreadsheet.
The challenge I'm facing is, I need to get these addresses from 100+ pages, and there is no 'View All' option on the site.
I need to put all of these business names and addresses in Excel spreadsheet format.
Do you know any way to do this without copying & pasting 1000+ times?

Thanks!
Yellowpage.docx
0
Comment
Question by:iamnamja
  • 3
  • 2
7 Comments
 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 39891055
Do you know any programming language?
0
 
LVL 23

Expert Comment

by:David
ID: 39892730
My approach might be to view and save the underlying HTML to a text document.  Then, there are many text editors or screen scrapers that could search by the element's tag.

Curious, how do you plan to address the data owner's copyrights?
0
 
LVL 23

Expert Comment

by:David
ID: 39903453
Ping to the author, asking for any update or closure.

dvz
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 39903503
Pretty sure I mentioned using an excel macro and then pasted a link. Well I'm sure you can search for "address on website to excel macro" yourself
0
 

Author Comment

by:iamnamja
ID: 39910266
I've requested that this question be deleted for the following reason:

I believe my question was not fully explained correctly because I see none of the answers address my question. I cannot accept any of the answers as solution because they didn't provide any actual solutions.
0
 
LVL 23

Accepted Solution

by:
David earned 500 total points
ID: 39909929
Okay, what if we tell you that in our collective, professional opinion, your stated goal cannot be done as stated.  With the criteria that there are "hundreds of pages", instead of one long scrollable region, something has to advance the focus from one URL to the next -- robot or spider fashion.

Any workaround would require some repetitive process, not necessarily a manual "cut and paste" as stated.  To my earlier comment, you or your code would have to examine the source code (Firefox is rt clk | view page source).  Locate the HTML tag that wraps the desired field string.

With programming, this step and capture could be automated, as Aaron suggested.  In line with padas' administrative comment above, the following link is offered as a suggestion that demonstrates several techniques of extracting HTML data by means of Excel:  http://msdn.microsoft.com/en-us/library/aa203721%28v=office.11%29.aspx

My regrets if I have still failed to meet your expectations -- but at least I wanted to make the extra effort to help you find your answer.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
MSSQL Convert Char To Date 4 52
postgres queries -- need opinions 1 56
Has anyone used domo? 1 51
Insert with SET how to handle join 6 56
PaperPort has a feature called the "Send To Bar". It provides a convenient, drag-and-drop interface for using other installed software, such as Microsoft Office. However, this article shows that the latest Office 2016 apps (installed with an Office …
Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
Microsoft Office Picture Manager is not included in Office 2013. This comes as quite a surprise to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This video expla…
This video Micro Tutorial is the second in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles a…

943 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now