WebClient request excluding images and other elements

Posted on 2011-10-26
Medium Priority
Last Modified: 2012-05-12
I would like to read a webpage using WebClient.OpenRead but for the sake of bandwidth I don't want to download any images or other elements (flash, js,...). In short, I just want the bare website, like when you disable images in IE.

How can I do this?
Question by:jiiins2
  • 2
LVL 75

Accepted Solution

käµfm³d   👽 earned 2000 total points
ID: 37036049
That's exactly what you get! When your browser downloads a web page, it reads all the <img>, <embed>, <object>, etc. and if it sees that any have references to external (to the page) resources, then it sends subsequent requests for said resources. This is easily visible if you run a tool like Fiddler and examine the output. For your operation, calling OpenRead on some web page will give you exactly what you are after.

However, one thing to note is that if you save this data off to a file and open that file in a web browser, then you are letting your web browser examine the data, and any absolute paths to resources will be requested when the browser parses the page; relative resources will fail because the (most likely) do not exist on your machine at that location. You will see placeholders for areas where a relatively-linked resource should be.
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37036080
As an example, using this code:

Module Module1

    Sub Main()
        Dim client As New Net.WebClient()
        Dim webStream As IO.Stream = client.OpenRead("http://www.experts-exchange.com/Programming/Languages/.NET/Visual_Basic.NET/Q_27417692.html")
        Dim buffer(4096) As Byte
        Dim bytesRead As Integer

            bytesRead = webStream.Read(buffer, 0, buffer.Length)

        Loop While bytesRead > 0

        Console.WriteLine(vbNewLine & vbNewLine & "**** DONE ****")
    End Sub

End Module

Open in new window

...against the URL to your question:


Open in new window

...this is what Fiddler sees:

Fiddler and Code
...and this is what it sees when I go through the browser:

Fiddler and Browser
I've highlighted the images that were requested by my browser. As you can see, each is a request of its own. (I actually highlighted a few images that were cached on my computer. Had they not been cached, those images would have been separate requests to the web server as well.)

Author Closing Comment

ID: 37036154
Perfect, thanks!

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Kraeven
Introduction Remote Share is a simple remote sharing tool, enabling you to see, add and remove remote or local shares. The application is written in VB.NET targeting the .NET framework 2.0. The source code and the compiled programs have been in…
It’s quite interesting for me as I worked with Excel using vb.net for some time. Here are some topics which I know want to share with others whom this might help. First of all if you are working with Excel then you need to Download the Following …
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Enter Foreign and Special Characters Enter characters you can't find on a keyboard using its ASCII code ... and learn how to make a handy reference for yourself using Excel ~ Use these codes in any Windows application! ... whether it is a Micr…
Suggested Courses

616 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question