[Webinar] Streamline your web hosting managementRegister Today


If no "head" response, will robots find a page?

Posted on 1997-06-02
Medium Priority
Last Modified: 2013-12-25
The server does not support 'head' request. When robots
send it, it will be simply ignored. The question:
how common is it for the robots to make a quiry for
'head' only and not for the page content?
e.g. - How big will be a user's loss on such a server?
Are Infoseek, WebCrawler, other common robots making
query for <head> only or for page content as well?

Question by:elena515
  • 2

Accepted Solution

julio011597 earned 200 total points
ID: 1854272
There's no robot around which takes just the head part of a page.
Actually, there is no 'head request' in the http protocol.

What a crawler usually does is to get the page (the whole page is send by the web server), then parse a given amount of it; how much of the page will be kept depends on the settings, and may vary from the first few lines to the whole document.
Usually, the international search engines keep just part of a document for reasons of performance and resources, while a national search engine may keep the whole page, since the total count of pages is far lesser.

So don't care too much about headers; they are usually useful just to give a document title and some extra keywords to the indexing engine.

Rgds, julio

Author Comment

ID: 1854273
Dear Julio,

i wouldn't rate the answer, because the answers i've got from
w3 and robots mailing list are quite different ;)

here are some of them:

> The server software we developed does not support
> 'head' request from the robots, it simply ignores
> the 'head' query.

> Is it an appropriate approach within current w3 standards?

No. HTTP servers MUST support HEAD method. Those are IETF standards, BTW.
See RFC 1945 and RFC 2068.

> How common are robots that make query for head only
> and ignore the content of the document?

Common enough. Besides, there are browsers which generate them if the user
requests it.

> In other words, how much we have our users suffer,
> and how much we are in disaccordance with the standards,
> if our server will be released with this deficiency?

Very much. Implement that, please. It can't be too much work.


On Jun 2, 12:51pm, elena danielyan wrote:
> Subject: 'head' request and robots?
> I apologise if this request is inappropriate here.
> The server software we developed does not support
> 'head' request from the robots, it simply ignores
> the 'head' query.
You mean it ignores head requests.  Unless you are looking at the
browser-type, you do not know whether the resquest came from a robot
or a real person, ( UA request for a user ).

> Is it an appropriate approach within current w3 standards?

Servers are required to support head.

> How common are robots that make query for head only
> and ignore the content of the document?
> In other words, how much we have our users suffer,
> and how much we are in disaccordance with the standards,
> if our server will be released with this deficiency?
Robots, caching proxies, and caching browsers will generally make a head
request to see if the document it has is older then the one currently on
the server.  In addition, robots will retrieve any <META> containers to use
for additional search criteria.


HEAD is used primarily as a tool to check whether a URL has changed
since the last time it was retrieved.

Netscape Navigator and other browsers use HEAD to check whether
(and when) to reload pages, images, etc.  For example, when you
use the Reload button of Navigator, it will issue a HEAD request
to the server for that page in order to determine whether it has
to issue a new GET request, or whether it can re-use the data
in its cache.  Navigator also does that for the images on a page
when the user requests a Reload.  [I believe other browsers do
this as well, but I haven't checked them to be sure.]

By not providing HEAD data for browsers, several things will happen:
people who view the pages on your server will experience longer
time-outs when they go to Reload a page (under some circumstances,
the same thing happens simply when they go back into their history
list to return to a page as well).  And the browser software then
has to decide on its own whether or not to reload the page, and/or
the other elements (images, etc.).  I haven't experimented to find
out what each different version of browser actually does, but
whether they load from cache or do another GET to the server, each
will be wrong under some circumstances.  If the browser doesn't
GET the page again from the server, the user could well be shown
an old version of the page; if the browser does an unnecessary GET,
that will unnecessarily increase the traffic for your server.

However, the primary use of HEAD is probably not by browsers, but by
search and index sites.  At least some, probably most of them (certainly
the regional index site that I run) use HEAD to check:  1) whether a
linked-to-page still exists; 2) whether it has changed (if so, the
search engine spider should and usually actually does a GET in order
to check the page's possibly-new title and content).

If your server does not respond to a HEAD request, there is a fairly
high probabilty that pages on the server will end up being deleted
from at least some of the indexing and cataloging sites
within--typically--about a month or so.  I.e., when the cataloging
site re-checks the page via HEAD to see if it's still there.

There are a few other HTTP server software packages which do not respond
properly to HEAD requests (mostly old Macintosh HTTP server implementations).
I know people who used to run servers like that.  Who found out the
hard way what the effects were when the index and catalog sites kept
dropping their pages...

Nontheless there are situations where you might want to have some
specialized kinds of server software that doesn't respond to HEAD
requests, or responds in special ways.  For example, if ALL the
content you're serving out is dynamic, you arguably might always
want to respond to a HEAD request for a potential page with an
expiration date that is in the past.  Alternatively, if you are
serving out pages which should never be catalogued anywhere, in
addition to using an appropriate robots.txt file in your server
root, you might also always want to respond to HEAD requests with
an error code saying the request page doesn't exist.  That makes
it a fairly high probability that a page accidentally indexed
will (whenever it's re-checked) end up being removed by from
most search sites.




Expert Comment

ID: 1854274
Dear Elena,

i couldn't realize you were developing a web server, because you didn't mention it in your question.

Yours shows up as a 'user' question, mainly involved on "how should i make my pages visible to search engines, while my (ISP) web server doesn't support the head request".
The answer you've gotten stay consistent with the question.

Please, next time, spend a few more seconds in formulating your question, so that neither you nor anybody else lose their time.

Cheers, julio

P.S. thanks for the enlightenments.

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to security, close monitoring is a must. According to WhiteHat Security annual report, a substantial number of all web applications are vulnerable always. Monitis offers a new product - fully-featured Website security monitoring and pr…
These seven tips can help you create an extraordinary website, one that captivates audiences and has them wanting to return regularly for more. Keep reading to find out what your site is missing and what you need to add!
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).
Suggested Courses
Course of the Month8 days, 18 hours left to enroll

590 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question