Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 410
  • Last Modified:

reading google sorce code (view source) but does not come through with data displayed as page

I have been looking at the source code for google for say dogs, via view source but it it does not look like the page data when it is dropped into say dreamweaver, so if you scrape the site how do you make heads and tails of the data or how do you extract all the page data with all the links attached
0
sydneyguy
Asked:
sydneyguy
  • 2
1 Solution
 
COBOLdinosaurCommented:
Most of what you see on Google pages is the result of scripting.  To actually scrape the page would require some serious detailed parsing code; an analysis of the scripting;  and some knowledge of the Google backend that is generating the scripts.

If such a parser exists, it would be rendered worthless every time Google did a change to the  scripts or backend processing.  Google provides all sorts of widgets to access the information it has stored like the Google search widgets you see all over the place.  That is how they want the data accessed because it keeps them in control.  So they make it as difficult as possible for anyone to grab stuff off their pages without using their tools.

0
 
sydneyguyAuthor Commented:
ok so how would i grab a screen data dump of whats on the page via c# i can at present place the url into my code page then extract the code behind the page, so what i really need is of the code to do a select all then grab the data on the page not he code, any ideas
0
 
COBOLdinosaurCommented:
I don't use or know C#.  If I was trying to do this I would get the page on my server side with php, and then figure out how to parse out what I need.  From the look of code source it is probaby pretty labor intensive, and you would not be able to write the parser based on one page because there is no guarantee that all pages have the same structural components.  I do that with that with some RSS feed that are loaded up with junk, but that is a walk in the park compared to what I see in Google pages.  The RSS feeds at least have a consistent tag structure but the Google pages ???  ...not so much.
0

Featured Post

[Webinar On Demand] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now