Link to home
Start Free TrialLog in
Avatar of MikeThelwall
MikeThelwall

asked on

Why do Winsock and the Internet Transfer control give different pages sometimes?

On most web pages (other than those returning an http redirection header) winsock and the Internet Transfer control give the same  results. But on some they fetch entirely different  HTML pages. Why is this? This is true even when the ITC header is cloned in winsock. ITC gets the same page that the browsers do - and therefore the "correct" page.
Winsock returns what is to all intents and purposes a correct page. Here is the first section of it (including the ITC clone request).
----------------

GET / HTTP/1.0
Accept: image/gif,image/x-xbitmap,image/jpeg,image/pjpeg,*/*
User-Agent: Microsoft URL Control - 5.01.4511
Host: 134.220.198.68

----------
HTTP/1.1 200 OK
Date: Fri, 07 Jan 2000 13:31:41 GMT
Server: Apache/1.3.10-dev (Unix) ApacheJServ/1.0 PHP/3.0.6
Content-Location: index.html
Vary: negotiate
TCN: choice
Last-Modified: Tue, 05 Oct 1999 16:43:47 GMT
Connection: close
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
 <HEAD>
  <TITLE>Apache Project Development Site</TITLE> </HEAD>
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
 <BODY  BGCOLOR="#FFFFFF"  TEXT="#000000" LINK="#0000FF"  VLINK="#000080"  ALINK="#FF0000" ><DIV ALIGN="CENTER"> <IMG SRC="images/apache_logo.gif" ALT="[APACHE]"></DIV>
  <H1 ALIGN="CENTER"><SAMP>Dev.Apache.Org</SAMP><BR>Developer Resources</H1>  <P>  <BLOCKQUOTE>  This site includes many of the reference materials used by the Apache  Project.   Note that a lot of these documents may not be entirely up-to-date.  Some may have been superseded by other references, and some may just  be waiting for someone to get around to updating them.
--------------------

Here is the same request and return for ITC, giving a different page (again first part of page only shown).

--------------
GET / HTTP/1.0
Accept: image/gif,image/x-xbitmap,image/jpeg,image/pjpeg,*/*
User-Agent: Microsoft URL Control - 5.01.4511
Host: 134.220.198.68

----------
HTTP/1.1 200 OK
Date: Fri, 07 Jan 2000 13:37:10 GMT
Server: Apache/1.3.10-dev (Unix) ApacheJServ/1.0 PHP/3.0.6
Content-Location: index.html
Vary: negotiate
TCN: choice
Cache-Control: max-age=86400
Expires: Sat, 08 Jan 2000 13:37:10 GMT
Connection: close
Content-Type: text/html

<html>
<head>
<title>The Apache Software Foundation</title>
</head>
<body bgcolor="#FFFFFF" text="#000000"
      link="#0000FF" vlink="#000080" alink="#FF0000"
>
<table width="100%" border="0">
<tr><td valign="top">
  <p><a href="http://www.apache.org/"><img src="foundation/images/asf_logo.gif"
          alt="The Apache Software Foundation"
          border="0" width="387" height="100"></a>
  </p>
  <p>&nbsp; </p>
</td><td width="150" valign="top" align="center">
  <p>&nbsp; </p>

-------------
The same is true for all pages on the www.apache.org site: ITC and winsock give different results.

Why does this happen? Is there any way round this to ensure that winsock get the "correct" page? Is this a problem with my proxy cache or a generic, network-wide problem?
Avatar of BigRat
BigRat
Flag of France image

At first I thought it was the port on which you attached. But I have tried several ports on www.apache.org to no avail. I believe it must be the name translation which is "incorrect". Can you get the IP addresses of the responders? If these are the same then it is the port.
Avatar of MikeThelwall
MikeThelwall

ASKER

Hey you spotted what is going wrong, thanks!

Winsock translates www.apache.org to IP address 209.133.83.18, which is the correct port according to a DNS lookup, but the internet transfer control and browsers cant be using the same IP because I tried them with this IP number and they get a different page, in other words
http://www.apache.org and
http://209.133.83.18
give different pages to each other in browsers and ITC, but not in winsock (I can't trace the IP number that the browsers/ITC are using).

Trying http://www.apache.org/info.html gives a valid page in browsers but an error page in winsock, so there seems to be some intelligent redirection of the IP address going on with ITC & browsers - do you know what exactly is happening and whether it is possible to ensure that Winsock gets the same redirection information and therefore the "correct" pages? I really need a generic solution that will work if this problem happens to be on any other site.

Thanks again.
The browser starts on the standard port of 80 and if that fails tries various other ports like 8000 and 8080 and also the secure socket port 430 (I think). If the port is specifed it does not try the others but just returns the error. That's the "port" case.
   If the addresses are the difference I'm not sure how this occurs, since both use the DNS lookup via the SP wire. However you might have a local "hosts" file and this might be used in one case and not in the other. So looking for that might be a clue.
   Finally I tried searching for the "Apache Project Development Site" (since this is in the title line) via AltaVista but got too many hits, since that's a word search. You might try the same but add more words which you see on the page to the search. That might find the IP address for you.
Good idea! I tried it and got http://dev.apache.org. Looking up the IP addresses of www.apache.org and dev.apache.org gives 209.133.83.18 in both cases  So the situation seems to be a port problem. I'm running a program to test all the port numbers to find the correct one to verify this - it might take a couple of days.

Requesting http://www.apache.org:
  with an ITC header in winsock gets the http://dev.apache.org page 209.133.83.18:80
  with a browser or ITC directly gets the http://www.apache.org page on 209.133.83.18:?

Requesting http://dev.apache.org or http://209.133.83.18:
  with an ITC header in winsock gets the http://dev.apache.org page 209.133.83.18:80
  with a browser or ITC directly gets the http://dev.apache.org page, presumably on 209.133.83.18:80

So a port seems to be assigned at the DNS lookup stage, giving dev.apache.org and www.apache.org the same IP but a different Port number - is this possible? And if so, how does it know the difference between winsock and ITC since they are sending the same header?
If you take a URL of the form xxx.yyy.zzz to a DNS server you'll only get back 123.123.123.123 or similar. Normally you bind with this address and your own port number (which is obtained from /etc/services on unix by looking pu the transfer protocol). Eg: telnet gives port 23.

Strictly speaking the "http" should be looked up "in services" to find the port number (80). But since people put their servers on different ports (I have seen 80,8000, and 8080 quite often) various "browsers" implement their own schemes, and bind to one port after another until they get a connection. Then the header is sent.

What exactly do you mean by winsock? I understand this term only as the socket API on Windows (similar yet deliberately different from that on Unix).
I haven't found the port number yet, I'm up to 4361.

By winsock I mean the Microsoft Winsock Control 5.0(SP2) that comes with Visual Basic, the file being mswinsck.ocx. It is basic enough to be used for a web client or server program, and you can send your own raw HTTP requests with it, for example cloning the header of web browsers. As far as I can tell, winsock cloning the header of ITC or a browser is sending exaclty the same information out. Here is a sample ITC clone command sent after the connection has been established:

wskSend.SendData "GET / HTTP/1.0" + vbCrLf _
    + "Accept: image/gif,image/x-xbitmap,image/jpeg,image/pjpeg,*/*" + vbCrLf _
    + "User-Agent: Microsoft URL Control - 5.01.4511" + vbCrLf _
    + "Host: " + wskSend.LocalIP

With winsock you set your own port number, in this case to 80 but the Internet Transfer control also allows the remote port to be set (also to 80) giving different results when a dns lookup is involved, but not when it isnt.

I've just spotted something interesting: When I do an NS lookup for dev.apache.org and www.apache.org I get the same IP address, but a disclaimer on the second. Perhaps winsock ignores the disclaimer, but ITC doesn't and goes to a different name server to check, getting a different IP? It doesn't save the IP anywhere so it is impossible to check. Does this sound like a plausible answer to you?

Name:    dev.apache.org
Address:  209.133.83.18

Non-authoritative answer:
Name:    www.apache.org
Address:  209.133.83.18
I've just spotted the answer, I've been sending the incorrect header, the HOST value should always be the domain name of the requested server, which uses this value to operate virtual servering. I'd been sending the IP address and getting the default page. it is explained in http://www.apache.org/docs/vhosts/host.html
including the following:
"The HTTP/1.1 protocol contains a method for the server to identify what name it is being addressed as."
which is the HOST http header line.

Please post an answer and I'll give you the points: I could still be looking down other dead ends if you had not helped, thanks.
ASKER CERTIFIED SOLUTION
Avatar of BigRat
BigRat
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I do some of my best thinking on the bus - nothing else much to do there!
Thanks again.