Solved

Screen scraper - getting data out of web pages and into spreadsheets

Posted on 2014-02-26
7
260 Views
Last Modified: 2014-03-06
Hi,

I'm trying to find a way to extract multiple addresses from attached yellowpage (example) to Excel spreadsheet.
The challenge I'm facing is, I need to get these addresses from 100+ pages, and there is no 'View All' option on the site.
I need to put all of these business names and addresses in Excel spreadsheet format.
Do you know any way to do this without copying & pasting 1000+ times?

Thanks!
Yellowpage.docx
0
Comment
Question by:iamnamja
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
7 Comments
 
LVL 39

Expert Comment

by:Aaron Tomosky
ID: 39891055
Do you know any programming language?
0
 
LVL 23

Expert Comment

by:David
ID: 39892730
My approach might be to view and save the underlying HTML to a text document.  Then, there are many text editors or screen scrapers that could search by the element's tag.

Curious, how do you plan to address the data owner's copyrights?
0
 
LVL 23

Expert Comment

by:David
ID: 39903453
Ping to the author, asking for any update or closure.

dvz
0
Webinar: Choosing a MySQL HA Solution

Join Percona’s Principal Technical Services Engineer, Marcos Albe as he presents Choosing a MySQL High Availability Solution on Thursday, June 29, 2017 at 10:00 am PDT / 2:00 pm EDT (UTC-7).

 
LVL 39

Expert Comment

by:Aaron Tomosky
ID: 39903503
Pretty sure I mentioned using an excel macro and then pasted a link. Well I'm sure you can search for "address on website to excel macro" yourself
0
 

Author Comment

by:iamnamja
ID: 39910266
I've requested that this question be deleted for the following reason:

I believe my question was not fully explained correctly because I see none of the answers address my question. I cannot accept any of the answers as solution because they didn't provide any actual solutions.
0
 
LVL 23

Accepted Solution

by:
David earned 500 total points
ID: 39909929
Okay, what if we tell you that in our collective, professional opinion, your stated goal cannot be done as stated.  With the criteria that there are "hundreds of pages", instead of one long scrollable region, something has to advance the focus from one URL to the next -- robot or spider fashion.

Any workaround would require some repetitive process, not necessarily a manual "cut and paste" as stated.  To my earlier comment, you or your code would have to examine the source code (Firefox is rt clk | view page source).  Locate the HTML tag that wraps the desired field string.

With programming, this step and capture could be automated, as Aaron suggested.  In line with padas' administrative comment above, the following link is offered as a suggestion that demonstrates several techniques of extracting HTML data by means of Excel:  http://msdn.microsoft.com/en-us/library/aa203721%28v=office.11%29.aspx

My regrets if I have still failed to meet your expectations -- but at least I wanted to make the extra effort to help you find your answer.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Office Picture Manager is not included in Office 2013. This comes as a shock to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This article explains how…
Azure Functions is a solution for easily running small pieces of code, or "functions," in the cloud. This article shows how to create one of these functions to write directly to Azure Table Storage.
In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question