Solved

How to get dynamic web page content in C#?

Posted on 2011-03-24
3
1,675 Views
Last Modified: 2012-05-11
I'm programming a Windows Forms based web crawler which should do the following:

1) Start from a URL defined by user (for example www.microsoft.com)
2) Download the content of that page and scrape specific data (strings)
3) After going through page content, add all the data found into an existing database
4) Find all the links on the page, select 5 of them and create new crawlers to crawl each of them.
5) Each crawler starts again from stage 1 with their unique URLs.

Now the problem is that when I download a webpage using WebClient-, HttpWebRequest-, or WebResponse class (all in System.Net) I only get the static content of the web page. Most websites contain scripts, php code and other dynamic content and I can't see them with these classes.

Simply:
Let's say I have a php page www.example.com/page.php and when shown in web browser it gets 100 names from a database and prints them on the page. I want to be able to read that dynamic content in my Windows Forms application using C#.

This is just an example, the page could be for example ASP.NET and contain news headlines or something like that. I can't define what URL's the users will scrape so I really have to be able to read static and dynamic content from any URL.

NOTICE! The problem is clearly stated above, I don't need help in implementing any other features listed on top of this question :)

Thanks!
0
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 500 total points
ID: 35211654
Your example actually puts up 'static' content in that it will usually be included in the original page as HTML.  What you will have a problem with is the content loaded by javascript/AJAX methods.  You will have to find a way to read the javascript and make the requests that it does.
0
 

Author Closing Comment

by:SubsonicDesignOfficial
ID: 35211687
Thanks for your answer! Now that I recall I only saw some javascript regions unformatted (they were shown as code). Now I also remember that PHP and ASP.NET are translated to HTML at the server side (?), I should have thought of that!
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 35211707
Thanks for the points.  The other thing that will be a problem is Flash of course since it often loads it's own content.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Ivo
C# And Nullable Types Since 2.0 C# has Nullable(T) Generic Structure. The idea behind is to allow value type objects to have null values just like reference types have. This concerns scenarios where not all data sources have values (like a databa…
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
In this video, viewers will be given step by step instructions on adjusting mouse, pointer and cursor visibility in Microsoft Windows 10. The video seeks to educate those who are struggling with the new Windows 10 Graphical User Interface. Change Cu…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question