Get rendered HTML source in .net c#

Posted on 2012-08-16
Last Modified: 2012-09-06

I'm building a little web crawler, and for each web page, I extract all links to navigate further.

Today I'm downloading html source by the WebClient class and then searches for <a> tags.

I have realized that I don't get all links that I want this way.
Some pages render html by javascript and ajax after page is loaded.

If I navigate to a web page with FireFox and open FireBug, I can find the HTML source that I'm looking for, but is there any component out there that does this in .net?

It's also important that my console app can run in multi threading mode.

Question by:jimmieandersson
    LVL 9

    Expert Comment

    Try this mehtod

            public static String RenderHtml<T>(this Control control) where T : Control
                StringWriter controlString = new StringWriter();
                control.RenderControl(new HtmlTextWriter(controlString));
                return controlString.ToString();

    Author Comment

    Thanks for your reply.
    Not sure if I used it correctly, But I tried this:

                    var control = new LiteralControl(webClientData.Source);
                    var html = control.RenderHtml<LiteralControl>();

    Open in new window

    the html return didn't change anything. exactly the same as input
    LVL 25

    Accepted Solution

    You might consider switching to an invisible WebBrowser Object in an invisible Win Form project rather than WebClient in a console.

    That will allow you to access the DOM document after it is fully rendered.

    HtmlElementCollection Collection = WebBrowser1.Document.GetElementsByTagName("a");

    Author Closing Comment

    It works. But unfortunately not as fast as I was hoping
    LVL 25

    Expert Comment

    by:Ron M
    That's probably because it waits until the page is fully rendered... all little bits and peices.

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Join & Write a Comment

    Citrix XenApp, Internet Explorer 11 set to Enterprise Mode and using central hosted sites.xml file.
    Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
    In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
    Shows how to create a shortcut to site-search Experts Exchange using Google in the Chrome browser. This eliminates the need to type out whenever you want to search the site. Launch the Search Engine Menu: In chrome, via you…

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now