reading google sorce code (view source) but does not come through with data displayed as page

Posted on 2011-10-21
Last Modified: 2013-11-19
I have been looking at the source code for google for say dogs, via view source but it it does not look like the page data when it is dropped into say dreamweaver, so if you scrape the site how do you make heads and tails of the data or how do you extract all the page data with all the links attached
Question by:sydneyguy
    LVL 53

    Expert Comment

    Most of what you see on Google pages is the result of scripting.  To actually scrape the page would require some serious detailed parsing code; an analysis of the scripting;  and some knowledge of the Google backend that is generating the scripts.

    If such a parser exists, it would be rendered worthless every time Google did a change to the  scripts or backend processing.  Google provides all sorts of widgets to access the information it has stored like the Google search widgets you see all over the place.  That is how they want the data accessed because it keeps them in control.  So they make it as difficult as possible for anyone to grab stuff off their pages without using their tools.


    Author Comment

    ok so how would i grab a screen data dump of whats on the page via c# i can at present place the url into my code page then extract the code behind the page, so what i really need is of the code to do a select all then grab the data on the page not he code, any ideas
    LVL 53

    Accepted Solution

    I don't use or know C#.  If I was trying to do this I would get the page on my server side with php, and then figure out how to parse out what I need.  From the look of code source it is probaby pretty labor intensive, and you would not be able to write the parser based on one page because there is no guarantee that all pages have the same structural components.  I do that with that with some RSS feed that are loaded up with junk, but that is a walk in the park compared to what I see in Google pages.  The RSS feeds at least have a consistent tag structure but the Google pages ???  ...not so much.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Course: HTML5 Mobile App Development with PhoneGap

    PhoneGap can help you leverage your already existing HTML5, JavaScript, and CSS skills in order to create and deploy cross-platform mobile apps.This program comprehensively covers HTML5 mobile app development from top to bottom.

    This article provides a case study on how our local youth baseball league deployed a new website, including the platform selection, implementation and benefits to the league.
    Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
    The purpose of this video is to demonstrate how to set up basic WordPress SEO. This will be demonstrated using a Windows 8 PC. The plugin used will be WordPress SEO by Yoast. Go to your WordPress login page. This will look like the following: myw…
    This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now