Link to home
Start Free TrialLog in
Avatar of jgore
jgore

asked on

Non-HTML Info from Get?

I would like my Web parser to get uncommon-type data from the web.  I have seen all of the below working on other scripts, but I have yet to find code for it. This is stuff that is NOT in the HTML code
that gets returned.

The information I want is:
1. Server Type(IIS or Apache,etc..) that Get went to to get the page.
2. Last Modified - sometimes this appears in text, but I know their
    getting it else where because what appears in text is not always
    what appears in the script page.
3. Content size - i.e. how big in kbytes is the file. I would like to do
    this with out saving it to a file.
4. Anything else that isn't in HTML that I can grab!

There must be sone kind of attributes it returns during the Get that
has all this information. How do I access that!

How do they do it???
Where can I find out about this?
Avatar of monas
monas
Flag of Lithuania image

jgore,

      On the web documents are transmitted usign HTTP protocol. When you request document you get response consisting of:
1) response header;
2) blank line;
3) document text.

All information you need is in header. Althought HTTP standards DO NOT make all information you requested mandatory. Therefore on some responses such information may be missing.

To see in practice, telnet to port 80 of www server of your choice (telnet server 80), and enter:

HEAD http://www.server.of.your.choice/index.html HTTP/1.0

and hit <enter> 2 times.
You will be given header information (if you want to get document text also - use GET instead of HEAD).

Documentation what could be in the header you could fing in the standard. For most widely used version of protocol (HTTP/1.0) see http://www.cis.ohio-state.edu/htbin/rfc/rfc1945.html . You may want to check out rfc for HTTP/1.1 also.
ASKER CERTIFIED SOLUTION
Avatar of guadalupe
guadalupe

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jgore
jgore

ASKER

To guadalupe:
You will not be able to examine the response code or response headers (like 'Content-Type') when you are accessing the web using this  function. If you need that information you should use the full OO  interface.
(see LWP::UserAgent).

At least you pointed me in the right direction.
I didn't know where to look.

Hmmmm...me thinks this just got a lot harder!

To monas:
Thanks! I downloaded them. I'll read those.


Thanks to you both!  Cya'z