• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 286
  • Last Modified:

HTTP Access

Hi there,

I'm writing some software which in many ways is like a Browser (Netscape, Internet Explorer and co.).
It retrieves HTML pages from HTTP servers.
Mostly it works just fine, yet there are a number of sites that elude me.
Such sites are:
http://www.applelinks.com
http://www.expectingrain.com
When I access these sites from std. browsers, such as those I've mentioned, things work fine.
Yet when my software accesses them, I don't get the same HTML content these browsers get.

More info:
1. I use the std. HTTP 1.0 method of GET (and it works fine in 99% of the cases
2. I tested this on the Internet Sample PowerPlant provides, with the same results.
3. If you try these sites with the Internet Sample, you'll note Expecting Rain returns the HTTP Location header information. I honor that, but still don't get the info. std. browsers get.

The question:
What extra information do Netscape/IE send (and I should send) in order to get the correct HTML info back from these sites?

Thanks,
 Roov
0
roov
Asked:
roov
1 Solution
 
boonstraCommented:
The www.expectingrain.com site is returing a redirection message (302) that the Internet Example doesn't handle.  

The best way to see what is going on is to use a program like OTSessionWatcher, shareware from Peter Lewis, available at <ftp://ftp.stairways.com/stairways/otsessionwatcher-101.sit.bin>.  I've attached an extract of what it returns when going to the expectingrain site.  One of the GIFs is pulled from <http://www.linksynergy.com>, which returns a redirection message to <http://w20.hitbox.com/>.

Part of what OTSessionWatcher returns is as follows (look for the 302):

Send 307 bytes on stream 2.
<00000000< GET http://www.expectingrain.com/ HTTP/1.0  
<0000002C< Proxy-Connection: Keep-Alive  
<0000004A< User-Agent: Mozilla/4.5 (Macintosh; U; PPC)  
<00000077< Host: www.expectingrain.com 
<00000094< Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
<000000D1< image/png, */*  
<000000E1< Accept-Encoding: gzip  
<000000F8< Accept-Language: en  
<0000010D< Accept-Charset: iso-8859-1,*,utf-8  
<00000131<  

Receive 350 bytes on stream 2.
>00000000> HTTP/1.1 200 OK  
>00000011> Date: Mon, 11 Jan 1999 15:38:25 GMT  
>00000036> Server: Apache/1.2.5 FrontPage/3.0.4  
>0000005C> Last-modified: Sun, 10 Jan 1999 21:44:38 GMT  
>0000008A> Etag: "68b27-4b7a-36991f46"  
>000000A7> Content-length: 19322  
>000000BE> Accept-ranges: bytes  
>000000D4> Keep-alive: timeout=15, max=100  
>000000F5> Connection: Keep-Alive  
>0000010D> Content-type: text/html  
>00000126> Connection: keep-alive  
>0000013E> Proxy-connection: keep-alive  
>0000015C>  

Receive 1164 bytes on stream 2.
>0000015E> <HTML>
>00000165> <HEAD>
>0000016C> <TITLE>Bob Dylan - Expecting Rain</TITLE>
>00000196> <META NAME="Author" CONTENT="Karl Erik Andersen">
[snip]
>000053F5> </BODY>
>000053FD> </HTML>

Send 369 bytes on stream 2.
<00000599< GET http://www.expectingrain.com/dok/gif/NewOneByJimGuide.gif 
<000005D7< HTTP/1.0  
<000005E1< Referer: http://www.expectingrain.com/ 
<00000609< Proxy-Connection: Keep-Alive  
<00000627< User-Agent: Mozilla/4.5 (Macintosh; U; PPC)  
<00000654< Host: www.expectingrain.com 
<00000671< Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg
<000006AD< image/png  
<000006B8< Accept-Encoding: gzip  
<000006CF< Accept-Language: en  
<000006E4< Accept-Charset: iso-8859-1,*,utf-8  
<00000708<  

[snip]

Send 378 bytes on stream 7.
<000012B4< GET http://www.linksynergy.com/fs-bin/show?id=y70uhGRE7vg&bids=2
<000012F4< 161.2239 HTTP/1.0  
<00001307< Referer: http://www.expectingrain.com/ 
<0000132F< Proxy-Connection: Keep-Alive  
<0000134D< User-Agent: Mozilla/4.5 (Macintosh; U; PPC)  
<0000137A< Host: www.linksynergy.com 
<00001395< Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg
<000013D1< image/png  
<000013DC< Accept-Encoding: gzip  
<000013F3< Accept-Language: en  
<00001408< Accept-Charset: iso-8859-1,*,utf-8  
<0000142C<  

Receive 219 bytes on stream 2.
>0000C0BC> HTTP/1.1 302 Moved Temporarily  
>0000C0DC> Date: Mon, 11 Jan 1999 15:35:20 GMT  
>0000C101> Server: Apache/1.2.6  
>0000C117> Set-cookie: I27837082=916068920;  
>0000C13A> Location: http://w20.hitbox.com/world1000.gif 
>0000C169> Connection: close  
>0000C17C> Content-type: text/html  
>0000C195>  

Receive 188 bytes on stream 2.
>0000C197> <HTML><HEAD>
>0000C1A4> <TITLE>302 Moved Temporarily</TITLE>
>0000C1C9> </HEAD><BODY>
>0000C1D7> <H1>Moved Temporarily</H1>
>0000C1F2> The document has moved <A HREF="http://w20.hitbox.com/world1000.
>0000C232> gif">here</A>.<P>
>0000C244> </BODY></HTML>

Send 340 bytes on stream 8.
<0000142E< GET http://w20.hitbox.com/world1000.gif HTTP/1.0  
<00001460< Referer: http://www.expectingrain.com/ 
<00001488< Proxy-Connection: Keep-Alive  
<000014A6< User-Agent: Mozilla/4.5 (Macintosh; U; PPC)  
<000014D3< Host: w20.hitbox.com  
<000014E9< Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg
<00001525< image/png  
<00001530< Accept-Encoding: gzip  
<00001547< Accept-Language: en  
<0000155C< Accept-Charset: iso-8859-1,*,utf-8  
<00001580<  

Receive 295 bytes on stream 8.
>0000C253> HTTP/1.1 200 OK  
>0000C264> Date: Mon, 11 Jan 1999 15:35:20 GMT  
>0000C289> Server: Apache/1.2.6  
>0000C29F> Last-modified: Tue, 05 Jan 1999 19:16:28 GMT  
>0000C2CD> Etag: "4b281-132d-3692650c"  
>0000C2EA> Content-length: 4909  
>0000C300> Accept-ranges: bytes  
>0000C316> Connection: close  
>0000C329> Content-type: image/gif  
>0000C342> Connection: keep-alive  
>0000C35A> Proxy-connection: keep-alive  
>0000C378>  

Receive 1807 bytes on stream 8.
>0000C37A> 47 49 46 38  39 61 58 00  3E 00 F7 FF  00 2C 6B 2C  GIF89aX.>....,k,
[snip]

Receive 340 bytes on stream 7.
>0000D6A7> HTTP/1.1 302 Found  
>0000D6BB> Date: Mon, 11 Jan 1999 15:36:45 GMT  
>0000D6E0> Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)  
>0000D70E> Set-cookie: linkshare_cookie2161=2239; path=/; expires=Saturday,
>0000D74E>  09-Nov-2002 23:12:40 GMT; path=/; domain=.linksynergy.com  
>0000D78A> Location: http://banner.linksynergy.com/fs/banners/520_2239.gif?
>0000D7CA> 0  
>0000D7CD> Connection: close  
>0000D7E0> Content-type: text/html  
>0000D7F9>  

Send 403 bytes on stream 8.
<00001582< GET http://banner.linksynergy.com/fs/banners/520_2239.gif?0 
<000015BE< HTTP/1.0  
<000015C8< Referer: http://www.expectingrain.com/ 
<000015F0< Proxy-Connection: Keep-Alive  
<0000160E< User-Agent: Mozilla/4.5 (Macintosh; U; PPC)  
<0000163B< Host: banner.linksynergy.com  
<00001659< Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg
<00001695< image/png  
<000016A0< Accept-Encoding: gzip  
<000016B7< Accept-Language: en  
<000016CC< Accept-Charset: iso-8859-1,*,utf-8  
<000016F0< Cookie: linkshare_cookie2161=2239  
<00001713<  

Receive 235 bytes on stream 7.
>0000D7FB> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>0000D82E> <HTML><HEAD>
>0000D83B> <TITLE>302 Found</TITLE>
>0000D854> </HEAD><BODY>
>0000D862> <H1>Found</H1>
>0000D871> The document has moved <A HREF="http://banner.linksynergy.com/fs
>0000D8B1> /banners/520_2239.gif?0">here</A>.<P>
>0000D8D7> </BODY></HTML>

Receive 347 bytes on stream 5.
>0000D8E6> HTTP/1.1 200 OK  
>0000D8F7> Date: Mon, 11 Jan 1999 15:38:39 GMT  
>0000D91C> Server: Apache/1.2.5 FrontPage/3.0.4  
>0000D942> Last-modified: Sun, 06 Dec 1998 21:04:16 GMT  
>0000D970> Etag: "35e9f-858-366af150"  
>0000D98C> Content-length: 2136  
>0000D9A2> Accept-ranges: bytes  
>0000D9B8> Keep-alive: timeout=15, max=97  
>0000D9D8> Connection: Keep-Alive  
>0000D9F0> Content-type: image/gif  
>0000DA09> Connection: keep-alive  
>0000DA21> Proxy-connection: keep-alive  
>0000DA3F>  

Receive 1167 bytes on stream 5.
>0000DA41> 47 49 46 38  39 61 58 00  1F 00 F7 FF  00 FF FF FF  GIF89aX.........
[snip]

Receive 331 bytes on stream 8.
>0000E299> HTTP/1.1 200 OK  
>0000E2AA> Date: Mon, 11 Jan 1999 15:35:20 GMT  
>0000E2CF> Server: Apache/1.2.5  
>0000E2E5> Last-modified: Thu, 10 Dec 1998 01:01:03 GMT  
>0000E313> Etag: "f129-2626-366f1d4f"  
>0000E32F> Content-length: 9766  
>0000E345> Accept-ranges: bytes  
>0000E35B> Keep-alive: timeout=5, max=200  
>0000E37B> Connection: Keep-Alive  
>0000E393> Content-type: image/gif  
>0000E3AC> Connection: keep-alive  
>0000E3C4> Proxy-connection: keep-alive  
>0000E3E2>  

Receive 1183 bytes on stream 8.
>0000E3E4> 47 49 46 38  39 61 D4 01  3C 00 B3 00  00 FF FF FF  GIF89a..<.......
[snip]

Receive 340 bytes on stream 4.
>00010A0A> HTTP/1.1 302 Found  
>00010A1E> Date: Mon, 11 Jan 1999 15:36:53 GMT  
>00010A43> Server: Apache/1.3.3 (Unix)  (Red Hat/Linux)  
>00010A71> Set-cookie: linkshare_cookie2161=2220; path=/; expires=Saturday,
>00010AB1>  09-Nov-2002 23:12:40 GMT; path=/; domain=.linksynergy.com  
>00010AED> Location: http://banner.linksynergy.com/fs/banners/520_2220.gif?
>00010B2D> 0  
>00010B30> Connection: close  
>00010B43> Content-type: text/html  
>00010B5C>  

Receive 235 bytes on stream 4.
>00010B5E> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>00010B91> <HTML><HEAD>
>00010B9E> <TITLE>302 Found</TITLE>
>00010BB7> </HEAD><BODY>
>00010BC5> <H1>Found</H1>
>00010BD4> The document has moved <A HREF="http://banner.linksynergy.com/fs
>00010C14> /banners/520_2220.gif?0">here</A>.<P>
>00010C3A> </BODY></HTML>

[snip]
0
 
roovAuthor Commented:
The answer contained the information required - I didn't need the logs, I actually tried out OTSessionWatcher, and simply compared my logs with those of std. browsers.

The obvious problem showed it's face immediatly: some sites do not send you back all requested data if you don't specify the (optional) "Host" field in the HTTP request header.

Once I realized that, things worked fine...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now