How to track ALL web browser http requests of a certain url?

Posted on 2011-02-16
Medium Priority
Last Modified: 2013-11-05
Hi folks,

I'm trying to develop a quick software tool which (primary) purpose is to monitor the performance of the advertising placed on our website by an external ad server (by javascript).

This tool should run contineously and check a variety of pages on our website.

My aim is to simulate webbrowser behaviour in order to be able to track ALL http requests following to the first request which only gets the html source of the page.
Most of you might know tools like "Fiddler" or "Tamper Data", they're working in a way I need. I'd like to automate this and write the results into a log file or a database.

Do any of you guys have experience in developing such a tool in .NET? I tried the embedded IE webbrowser control (SHDocVw.WebBrowser) and skybound gecko, but I didn't succeed so far. Are there any tutorials about that topic in the web?
Unfortunately, any approach to google this didn't get me to a proper help or a usable documentation.

Thank you in advance,
Question by:kickeronline
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
LVL 23

Expert Comment

ID: 34905778
I think the System.Net.HttpWebRequest class provides what you need.  You can craft a http request with it (including cookies, etc) and then get the corresponding response (headers, content, etc).

Here is a sample code:
var req = System.Net.HttpWebRequest.Create("http://www.yahoo.com/");
var res = req.GetResponse();
var sr = new System.IO.StreamReader(res.GetResponseStream());
var html = sr.ReadToEnd();

Open in new window

More details about System.Net.HttpWebRequest:

I hope this helps.


Author Comment

ID: 34905849
Thanks but... no, I'm sorry. That's too simple. I certainly know how to get the source code for a url.

Regarding your example, the source code of http://www.yahoo.com/ contains lots of references to images, text files (.js, .css), videos (.swf) and banners (.jpg, .swf). It's all that kind of stuff your browser will load right AFTER it got the source code for the address.

What I want to do is to gather information about all those resources, however they get loaded: Some resources are embedded directly into the source code (pictures, stylesheets, scripts), but others get included by e.g. javascript. That's what your browser does when you open a website.

I could try to parse the source code, put that would mean on the one hand a tremendous effort and on the other hand reinventing the wheel. I'm sure there's a much better solution out there...
LVL 23

Expert Comment

ID: 34907183
I think I got it.  

The following sample code retrieves the URL of the resources used on the page (LINK, SCRIPT, and IMG tags only).  It uses the Html Agility Pack to parse the HTML.  You should be able to easily add other tags if needed.  

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace ReadHtml
    class Program
        static void Main(string[] args)
            var tags = new Dictionary<string, string>()
                {"IMG",     "SRC"},
                {"LINK",    "HREF"},
                {"SCRIPT",  "SRC"},

            var resources = new List<string>();

            var req = System.Net.HttpWebRequest.Create("http://www.yahoo.com/");
            var res = req.GetResponse();
            //var sr = new System.IO.StreamReader(res.GetResponseStream());
            //var html = sr.ReadToEnd();
            HtmlDocument doc = new HtmlDocument();


            foreach (var node in doc.DocumentNode.Descendants())
                string attrName;

                if (tags.TryGetValue(node.Name.ToUpper(), out attrName))
                    var attr = node.Attributes[attrName];

                    if (attr != null)

            foreach (var url in resources)
                // You can add your code to download and save the resource here.

Open in new window

AWS Certified Solutions Architect - Associate

This course has been developed to provide you with the requisite knowledge to not only pass the AWS CSA certification exam but also gain the hands-on experience required to become a qualified AWS Solutions architect working in a real-world environment.


Author Comment

ID: 34908936
Okay, that brings me to all resources which are embedded directly into the source code of that page.

But what about the execution of javascript?
For instance, if there is a script which is executed right after the source code has been completed, which puts additional <img> tags by using

document.write('<img src=\'/somepath/someimage.jpg\'>');

Open in new window

That's a very common way to place advertising on a web page.
I wouldn't dare to even think about trying to solve all these cases programmatically, that's why I tried to use existing browser technologies.

Maybe there's a way to derive a web proxy class in order to attach it to some browser control...?
LVL 23

Accepted Solution

wdosanjos earned 1000 total points
ID: 34909466
Humm... in that case you do need to execute the page to get them and somehow capture the requests.

On IE or FF, if you just select 'File / Save' you don't get resources referenced by the JavaScript code.

If you can write a proxy and update the Web Browser Control configuration to use the proxy, you might be able to achieve what you need.  You'll need to come up with some controls to identify what page is being loaded, because the proxy normally don't get that information. (maybe through the referrer header you can infer it).

I found this Mini C# Proxy Server, maybe you could leverage it.

Sorry, but I don't have any other suggestions.

Assisted Solution

kickeronline earned 0 total points
ID: 34916027
This tool is a good approach to solve that problem.

I started the mini proxy server at localhost:8090 and managed to assign it as proxy address of the gecko browser, which I embedded into my Windows Form (VB.NET):

Public Class Form1
    Public Sub New()
        setSetting("network.proxy.http", "")
        setSetting("network.proxy.http_port", 8090)
        setSetting("network.proxy.share_proxy_settings", True)
        setSetting("network.proxy.type", 1)
        setSetting("network.proxy.no_proxies_on", "")
    End Sub
    Private Sub setSetting(ByVal key As String, ByVal value As Object)
        Skybound.Gecko.GeckoPreferences.Default.Item(key) = value
    End Sub
End Class

Open in new window

What I now got to do is to log the requests into a database, which should be no problem as I got the source codes for hat project. Apart from that, I'm receiving several 404/400 errors, so I assume that the software is still a bit buggy. Maybe I'm able to fix that.

I noticed that the sql lite library was already added to that project. Unfortunately, there's no documentation about it.

If you got any expirence in logging into a database with the mini proxy server, or anybody knows some well-working alternative solution, I'd appreciate any feedback.

Thanks a lot!

Author Closing Comment

ID: 34949670
The solution is to automate/script a browser, which can be accomplished by adding a browser control to some windows form and navigate to web pages repeatedly.

To trace all ressources that are getting requested by a browser, one has to use a proxy server which logs the entire traffic into a log file or a database.

wdosanjos gives a good example for an open source proxy server. Anyway, it's got to be modified to work accurate.

Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question