• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 271
  • Last Modified:

Screen scraper - getting data out of web pages and into spreadsheets

Hi,

I'm trying to find a way to extract multiple addresses from attached yellowpage (example) to Excel spreadsheet.
The challenge I'm facing is, I need to get these addresses from 100+ pages, and there is no 'View All' option on the site.
I need to put all of these business names and addresses in Excel spreadsheet format.
Do you know any way to do this without copying & pasting 1000+ times?

Thanks!
Yellowpage.docx
0
iamnamja
Asked:
iamnamja
  • 3
  • 2
1 Solution
 
Aaron TomoskyTechnology ConsultantCommented:
Do you know any programming language?
0
 
DavidSenior Oracle Database AdministratorCommented:
My approach might be to view and save the underlying HTML to a text document.  Then, there are many text editors or screen scrapers that could search by the element's tag.

Curious, how do you plan to address the data owner's copyrights?
0
 
DavidSenior Oracle Database AdministratorCommented:
Ping to the author, asking for any update or closure.

dvz
0
Get quick recovery of individual SharePoint items

Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.

 
Aaron TomoskyTechnology ConsultantCommented:
Pretty sure I mentioned using an excel macro and then pasted a link. Well I'm sure you can search for "address on website to excel macro" yourself
0
 
iamnamjaAuthor Commented:
I've requested that this question be deleted for the following reason:

I believe my question was not fully explained correctly because I see none of the answers address my question. I cannot accept any of the answers as solution because they didn't provide any actual solutions.
0
 
DavidSenior Oracle Database AdministratorCommented:
Okay, what if we tell you that in our collective, professional opinion, your stated goal cannot be done as stated.  With the criteria that there are "hundreds of pages", instead of one long scrollable region, something has to advance the focus from one URL to the next -- robot or spider fashion.

Any workaround would require some repetitive process, not necessarily a manual "cut and paste" as stated.  To my earlier comment, you or your code would have to examine the source code (Firefox is rt clk | view page source).  Locate the HTML tag that wraps the desired field string.

With programming, this step and capture could be automated, as Aaron suggested.  In line with padas' administrative comment above, the following link is offered as a suggestion that demonstrates several techniques of extracting HTML data by means of Excel:  http://msdn.microsoft.com/en-us/library/aa203721%28v=office.11%29.aspx

My regrets if I have still failed to meet your expectations -- but at least I wanted to make the extra effort to help you find your answer.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now