WebClient request excluding images and other elements

I would like to read a webpage using WebClient.OpenRead but for the sake of bandwidth I don't want to download any images or other elements (flash, js,...). In short, I just want the bare website, like when you disable images in IE.

How can I do this?
jiiins2Asked:
Who is Participating?
 
käµfm³d 👽Commented:
That's exactly what you get! When your browser downloads a web page, it reads all the <img>, <embed>, <object>, etc. and if it sees that any have references to external (to the page) resources, then it sends subsequent requests for said resources. This is easily visible if you run a tool like Fiddler and examine the output. For your operation, calling OpenRead on some web page will give you exactly what you are after.

However, one thing to note is that if you save this data off to a file and open that file in a web browser, then you are letting your web browser examine the data, and any absolute paths to resources will be requested when the browser parses the page; relative resources will fail because the (most likely) do not exist on your machine at that location. You will see placeholders for areas where a relatively-linked resource should be.
0
 
käµfm³d 👽Commented:
As an example, using this code:

Module Module1

    Sub Main()
        Dim client As New Net.WebClient()
        Dim webStream As IO.Stream = client.OpenRead("http://www.experts-exchange.com/Programming/Languages/.NET/Visual_Basic.NET/Q_27417692.html")
        Dim buffer(4096) As Byte
        Dim bytesRead As Integer

        Do
            bytesRead = webStream.Read(buffer, 0, buffer.Length)

            Console.Write(Text.ASCIIEncoding.ASCII.GetString(buffer))
        Loop While bytesRead > 0

        Console.WriteLine(vbNewLine & vbNewLine & "**** DONE ****")
        Console.ReadKey()
    End Sub

End Module

Open in new window


...against the URL to your question:

http://www.experts-exchange.com/Programming/Languages/.NET/Visual_Basic.NET/Q_27417692.html

Open in new window


...this is what Fiddler sees:

Fiddler and Code
...and this is what it sees when I go through the browser:

Fiddler and Browser
I've highlighted the images that were requested by my browser. As you can see, each is a request of its own. (I actually highlighted a few images that were cached on my computer. Had they not been cached, those images would have been separate requests to the web server as well.)
0
 
jiiins2Author Commented:
Perfect, thanks!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.