Solved

Screen scraper - getting data out of web pages and into spreadsheets

Posted on 2014-02-26
7
258 Views
Last Modified: 2014-03-06
Hi,

I'm trying to find a way to extract multiple addresses from attached yellowpage (example) to Excel spreadsheet.
The challenge I'm facing is, I need to get these addresses from 100+ pages, and there is no 'View All' option on the site.
I need to put all of these business names and addresses in Excel spreadsheet format.
Do you know any way to do this without copying & pasting 1000+ times?

Thanks!
Yellowpage.docx
0
Comment
Question by:iamnamja
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
7 Comments
 
LVL 39

Expert Comment

by:Aaron Tomosky
ID: 39891055
Do you know any programming language?
0
 
LVL 23

Expert Comment

by:David
ID: 39892730
My approach might be to view and save the underlying HTML to a text document.  Then, there are many text editors or screen scrapers that could search by the element's tag.

Curious, how do you plan to address the data owner's copyrights?
0
 
LVL 23

Expert Comment

by:David
ID: 39903453
Ping to the author, asking for any update or closure.

dvz
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 39

Expert Comment

by:Aaron Tomosky
ID: 39903503
Pretty sure I mentioned using an excel macro and then pasted a link. Well I'm sure you can search for "address on website to excel macro" yourself
0
 

Author Comment

by:iamnamja
ID: 39910266
I've requested that this question be deleted for the following reason:

I believe my question was not fully explained correctly because I see none of the answers address my question. I cannot accept any of the answers as solution because they didn't provide any actual solutions.
0
 
LVL 23

Accepted Solution

by:
David earned 500 total points
ID: 39909929
Okay, what if we tell you that in our collective, professional opinion, your stated goal cannot be done as stated.  With the criteria that there are "hundreds of pages", instead of one long scrollable region, something has to advance the focus from one URL to the next -- robot or spider fashion.

Any workaround would require some repetitive process, not necessarily a manual "cut and paste" as stated.  To my earlier comment, you or your code would have to examine the source code (Firefox is rt clk | view page source).  Locate the HTML tag that wraps the desired field string.

With programming, this step and capture could be automated, as Aaron suggested.  In line with padas' administrative comment above, the following link is offered as a suggestion that demonstrates several techniques of extracting HTML data by means of Excel:  http://msdn.microsoft.com/en-us/library/aa203721%28v=office.11%29.aspx

My regrets if I have still failed to meet your expectations -- but at least I wanted to make the extra effort to help you find your answer.
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
This video Micro Tutorial is the first in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles al…
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question