I was given a specific list of url's that I need to spider with a .NET, VB Program. The problem is that many of the sites are in a foreign language.
The question is, is there a way for my spider to detect the page's language? Maybe through the server's response headers? Also, what is a good option for automatic translations? I thought about Systran, but really don't know if it is a good, cost effective option.

Some webpages contains META tags, which specify the language being used. An example of a META tag making this declaration would be:

    <meta http-equiv="Content-Language" content="en-US" />

Which obviously means: English, US.

However, most pages actually don't contain these; in which cases, there's no other way of finding out 100%.

What you could do, is create a database, with the 5 (or 10, for more accuracy) most common words for each language. Then, for each page that you spider, scan them for these words; then create a 'tally' like object, which you will use at the end, to decide which language it is most relevant to (and therefore, is..).

