• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2939
  • Last Modified:

Get rendered HTML source in .net c#

Hello,

I'm building a little web crawler, and for each web page, I extract all links to navigate further.

Today I'm downloading html source by the WebClient class and then searches for <a> tags.

I have realized that I don't get all links that I want this way.
Some pages render html by javascript and ajax after page is loaded.

If I navigate to a web page with FireFox and open FireBug, I can find the HTML source that I'm looking for, but is there any component out there that does this in .net?

It's also important that my console app can run in multi threading mode.


Thanks
0
jimmieandersson
Asked:
jimmieandersson
  • 2
  • 2
1 Solution
 
BlueYonderCommented:
Try this mehtod

        public static String RenderHtml<T>(this Control control) where T : Control
        {
            StringWriter controlString = new StringWriter();
            control.RenderControl(new HtmlTextWriter(controlString));
            return controlString.ToString();
        }
0
 
jimmieanderssonAuthor Commented:
Thanks for your reply.
Not sure if I used it correctly, But I tried this:

                var control = new LiteralControl(webClientData.Source);
                var html = control.RenderHtml<LiteralControl>();

Open in new window

the html return didn't change anything. exactly the same as input
0
 
Ron MalmsteadInformation Services ManagerCommented:
You might consider switching to an invisible WebBrowser Object in an invisible Win Form project rather than WebClient in a console.

That will allow you to access the DOM document after it is fully rendered.

WebBrowser1.Navigate("http://www.yahoo.com");
HtmlElementCollection Collection = WebBrowser1.Document.GetElementsByTagName("a");
0
 
jimmieanderssonAuthor Commented:
It works. But unfortunately not as fast as I was hoping
0
 
Ron MalmsteadInformation Services ManagerCommented:
That's probably because it waits until the page is fully rendered... all little bits and peices.
0

Featured Post

Get quick recovery of individual SharePoint items

Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now