We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now


If no "head" response, will robots find a page?

elena515 asked
Medium Priority
Last Modified: 2013-12-25
The server does not support 'head' request. When robots
send it, it will be simply ignored. The question:
how common is it for the robots to make a quiry for
'head' only and not for the page content?
e.g. - How big will be a user's loss on such a server?
Are Infoseek, WebCrawler, other common robots making
query for <head> only or for page content as well?

Watch Question

There's no robot around which takes just the head part of a page.
Actually, there is no 'head request' in the http protocol.

What a crawler usually does is to get the page (the whole page is send by the web server), then parse a given amount of it; how much of the page will be kept depends on the settings, and may vary from the first few lines to the whole document.
Usually, the international search engines keep just part of a document for reasons of performance and resources, while a national search engine may keep the whole page, since the total count of pages is far lesser.

So don't care too much about headers; they are usually useful just to give a document title and some extra keywords to the indexing engine.

Rgds, julio

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts


Dear Julio,

i wouldn't rate the answer, because the answers i've got from
w3 and robots mailing list are quite different ;)

here are some of them:

> The server software we developed does not support
> 'head' request from the robots, it simply ignores
> the 'head' query.

> Is it an appropriate approach within current w3 standards?

No. HTTP servers MUST support HEAD method. Those are IETF standards, BTW.
See RFC 1945 and RFC 2068.

> How common are robots that make query for head only
> and ignore the content of the document?

Common enough. Besides, there are browsers which generate them if the user
requests it.

> In other words, how much we have our users suffer,
> and how much we are in disaccordance with the standards,
> if our server will be released with this deficiency?

Very much. Implement that, please. It can't be too much work.


On Jun 2, 12:51pm, elena danielyan wrote:
> Subject: 'head' request and robots?
> I apologise if this request is inappropriate here.
> The server software we developed does not support
> 'head' request from the robots, it simply ignores
> the 'head' query.
You mean it ignores head requests.  Unless you are looking at the
browser-type, you do not know whether the resquest came from a robot
or a real person, ( UA request for a user ).

> Is it an appropriate approach within current w3 standards?

Servers are required to support head.

> How common are robots that make query for head only
> and ignore the content of the document?
> In other words, how much we have our users suffer,
> and how much we are in disaccordance with the standards,
> if our server will be released with this deficiency?
Robots, caching proxies, and caching browsers will generally make a head
request to see if the document it has is older then the one currently on
the server.  In addition, robots will retrieve any <META> containers to use
for additional search criteria.


HEAD is used primarily as a tool to check whether a URL has changed
since the last time it was retrieved.

Netscape Navigator and other browsers use HEAD to check whether
(and when) to reload pages, images, etc.  For example, when you
use the Reload button of Navigator, it will issue a HEAD request
to the server for that page in order to determine whether it has
to issue a new GET request, or whether it can re-use the data
in its cache.  Navigator also does that for the images on a page
when the user requests a Reload.  [I believe other browsers do
this as well, but I haven't checked them to be sure.]

By not providing HEAD data for browsers, several things will happen:
people who view the pages on your server will experience longer
time-outs when they go to Reload a page (under some circumstances,
the same thing happens simply when they go back into their history
list to return to a page as well).  And the browser software then
has to decide on its own whether or not to reload the page, and/or
the other elements (images, etc.).  I haven't experimented to find
out what each different version of browser actually does, but
whether they load from cache or do another GET to the server, each
will be wrong under some circumstances.  If the browser doesn't
GET the page again from the server, the user could well be shown
an old version of the page; if the browser does an unnecessary GET,
that will unnecessarily increase the traffic for your server.

However, the primary use of HEAD is probably not by browsers, but by
search and index sites.  At least some, probably most of them (certainly
the regional index site that I run) use HEAD to check:  1) whether a
linked-to-page still exists; 2) whether it has changed (if so, the
search engine spider should and usually actually does a GET in order
to check the page's possibly-new title and content).

If your server does not respond to a HEAD request, there is a fairly
high probabilty that pages on the server will end up being deleted
from at least some of the indexing and cataloging sites
within--typically--about a month or so.  I.e., when the cataloging
site re-checks the page via HEAD to see if it's still there.

There are a few other HTTP server software packages which do not respond
properly to HEAD requests (mostly old Macintosh HTTP server implementations).
I know people who used to run servers like that.  Who found out the
hard way what the effects were when the index and catalog sites kept
dropping their pages...

Nontheless there are situations where you might want to have some
specialized kinds of server software that doesn't respond to HEAD
requests, or responds in special ways.  For example, if ALL the
content you're serving out is dynamic, you arguably might always
want to respond to a HEAD request for a potential page with an
expiration date that is in the past.  Alternatively, if you are
serving out pages which should never be catalogued anywhere, in
addition to using an appropriate robots.txt file in your server
root, you might also always want to respond to HEAD requests with
an error code saying the request page doesn't exist.  That makes
it a fairly high probability that a page accidentally indexed
will (whenever it's re-checked) end up being removed by from
most search sites.



Dear Elena,

i couldn't realize you were developing a web server, because you didn't mention it in your question.

Yours shows up as a 'user' question, mainly involved on "how should i make my pages visible to search engines, while my (ISP) web server doesn't support the head request".
The answer you've gotten stay consistent with the question.

Please, next time, spend a few more seconds in formulating your question, so that neither you nor anybody else lose their time.

Cheers, julio

P.S. thanks for the enlightenments.
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.


Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.