Looking at the HTML source code doesn't show all the downloaded files, a proxy does

AID: 5349
  • Status: Published

1174 points

  • Bypfrancois
  • TypeTips/Tricks
  • Posted on2011-05-03 at 12:40:42
When pages do not download correctly, and you don't know why, the first thing you do is to look at the HTML source code of that page, but not all the downloaded files appear always clearly. If your source includes a javascript that computes the name of the downloaded file, sometimes, you will find it very hard and even impossible to know which file was supposed to be downloaded. Some websites aim to make it cryptic for preventing you to download a part of their stuff from outside of a browser.

Happily, there is a trick to know every single file you download: if you have have access to the log files of a http proxy server, it will possible for you to know exactly which files have been downloaded.

Normally, you won't have access to such a proxy server, but it is very easy to configure your own logging proxy server.
In this article, I will configure the squid proxy server under GNU/Linux but Windows users can adapt the stuff here for their case, since squid is open source software and runs also on MS platforms.

Installing squid


Installing squid under Ubuntu is as easy as opening a terminal (Applications > Accessories > Terminal) and issuing
sudo aptitude install squid
                                    
1:

Select allOpen in new window


You will have to provide your password.

Configuring squid


Now, the trick is to configure squid for running in user mode, not as root. Create a directory in your home directory. My choice was to create the directory squid inside of a my home directory. Edit a file called squid.conf in that directory by opening a terminal and typing:
mkdir ~/squid
cd ~/squid
gedit squid.conf
                                    
1:
2:
3:

Select allOpen in new window



In that file, you have to provide the content below:
acl all src 0.0.0.0/0.0.0.0
acl localhost src 127.0.0.1/32
http_access allow localhost 
http_access deny all
http_port 3128
cache_log /home/_myname_/squid/cache.log
cache_store_log none
pid_filename /home/_myname_/squid/pid
cache deny all
cache_dir null /tmp
logformat custom %mt %<st %ru
access_log /home/_myname_/squid/access.log custom
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:

Select allOpen in new window


Attention: where you see _myname_ in this configuration file, you have to provide the name of your home directory.
This will make squid listen to port 3128 of the local host and write the file name of all the files that pass through the proxy server into the log file /home/_myname_/squid/access.log. We will only log the MIME type, the size and the URL of the file.

Starting and stopping squid


For starting the proxy server, you have to issue squid -f squid.conf -N -d 1 in the directory ~/squid. For stopping, you just interrupt the proxy by hitting control-C in the same terminal where you launched squid. You can also stop it by issuing squid -k shutdown -f ~/squid/squid.conf in any other terminal.
When something goes wrong, the logfile cache.log will give you more information. In some cases, you will need to stop squid by killing it explicitly with its PID. If you don't understand how to do this, just issue:
sudo killall squid
                                    
1:

Select allOpen in new window



Enable your browser to pass through the proxy server


For logging the activity of your browser, you now need to tell your browser to use your local proxy, by setting 127.0.0.1 as IP address of the proxy and 3128 as the port.

Start surfing


Once the proxy is running and your browser point to it, you will see a lot of lines appearing into the access.log file, with the MIME type of the file, its size and its complete URL. Now you can scan that file to find the information you need. Enjoy.
Asked On
2011-05-03 at 12:40:42ID5349
Tags

logging proxy server

,

squid

Topic

Miscellaneous Web Development

Views
565

Comments

Add your Comment

Please Sign up or Log in to comment on this article.

Join Experts Exchange Today

Gain Access to all our Tech Resources

Get personalized answers

Ask unlimited questions

Access Proven Solutions

Search 3.2 million solutions

Read In-Depth How-To Guides

1000+ articles, demos, & tips

Watch Step by Step Tutorials

Learn direct from top tech pros

And Much More!

Your complete tech resource

See Plans and Pricing

30-day free trial. Register in 60 seconds.

Loading Advertisement...

Top Misc Web Dev Experts

  1. COBOLdinosaur

    144,341

    Master

    0 points yesterday

    Profile
    Rank: Genius
  2. Ray_Paseur

    142,428

    Master

    3,800 points yesterday

    Profile
    Rank: Savant
  3. jason1178

    102,330

    Master

    0 points yesterday

    Profile
    Rank: Genius
  4. DaveBaldwin

    76,853

    Master

    2,200 points yesterday

    Profile
    Rank: Genius
  5. ve3ofa

    50,168

    Master

    0 points yesterday

    Profile
    Rank: Genius
  6. nap0leon

    45,060

    0 points yesterday

    Profile
    Rank: Sage
  7. mplungjan

    45,026

    0 points yesterday

    Profile
    Rank: Savant
  8. leakim971

    33,300

    0 points yesterday

    Profile
    Rank: Genius
  9. ChrisStanyon

    28,132

    0 points yesterday

    Profile
    Rank: Sage
  10. tommyBoy

    26,968

    0 points yesterday

    Profile
    Rank: Genius
  11. Tiggerito

    26,204

    0 points yesterday

    Profile
    Rank: Sage
  12. kozaiwaniec

    19,800

    0 points yesterday

    Profile
    Rank: Guru
  13. shalomc

    19,268

    0 points yesterday

    Profile
    Rank: Genius
  14. LZ1

    17,720

    0 points yesterday

    Profile
    Rank: Genius
  15. webmatrixpune

    17,668

    0 points yesterday

    Profile
    Rank: Guru
  16. padas

    16,992

    2,000 points yesterday

    Profile
    Rank: Wizard
  17. sammySeltzer

    16,568

    0 points yesterday

    Profile
    Rank: Genius
  18. Gertone

    16,100

    0 points yesterday

    Profile
    Rank: Genius
  19. hielo

    15,700

    0 points yesterday

    Profile
    Rank: Savant
  20. singleton

    14,400

    0 points yesterday

    Profile
    Rank: Guru
  21. kaufmed

    14,376

    0 points yesterday

    Profile
    Rank: Genius
  22. paulmacd

    13,998

    0 points yesterday

    Profile
    Rank: Genius
  23. StingRaY

    13,668

    0 points yesterday

    Profile
    Rank: Wizard
  24. ahoffmann

    13,608

    0 points yesterday

    Profile
    Rank: Genius
  25. sudaraka

    13,000

    0 points yesterday

    Profile
    Rank: Sage

Hall Of Fame