Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How to get dynamic web page content in C#?

Posted on 2011-03-24
3
Medium Priority
?
1,756 Views
Last Modified: 2012-05-11
I'm programming a Windows Forms based web crawler which should do the following:

1) Start from a URL defined by user (for example www.microsoft.com)
2) Download the content of that page and scrape specific data (strings)
3) After going through page content, add all the data found into an existing database
4) Find all the links on the page, select 5 of them and create new crawlers to crawl each of them.
5) Each crawler starts again from stage 1 with their unique URLs.

Now the problem is that when I download a webpage using WebClient-, HttpWebRequest-, or WebResponse class (all in System.Net) I only get the static content of the web page. Most websites contain scripts, php code and other dynamic content and I can't see them with these classes.

Simply:
Let's say I have a php page www.example.com/page.php and when shown in web browser it gets 100 names from a database and prints them on the page. I want to be able to read that dynamic content in my Windows Forms application using C#.

This is just an example, the page could be for example ASP.NET and contain news headlines or something like that. I can't define what URL's the users will scrape so I really have to be able to read static and dynamic content from any URL.

NOTICE! The problem is clearly stated above, I don't need help in implementing any other features listed on top of this question :)

Thanks!
0
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 84

Accepted Solution

by:
Dave Baldwin earned 1500 total points
ID: 35211654
Your example actually puts up 'static' content in that it will usually be included in the original page as HTML.  What you will have a problem with is the content loaded by javascript/AJAX methods.  You will have to find a way to read the javascript and make the requests that it does.
0
 

Author Closing Comment

by:SubsonicDesignOfficial
ID: 35211687
Thanks for your answer! Now that I recall I only saw some javascript regions unformatted (they were shown as code). Now I also remember that PHP and ASP.NET are translated to HTML at the server side (?), I should have thought of that!
0
 
LVL 84

Expert Comment

by:Dave Baldwin
ID: 35211707
Thanks for the points.  The other thing that will be a problem is Flash of course since it often loads it's own content.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Ivo
C# And Nullable Types Since 2.0 C# has Nullable(T) Generic Structure. The idea behind is to allow value type objects to have null values just like reference types have. This concerns scenarios where not all data sources have values (like a databa…
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question