Language detection and translation.

Posted on 2005-04-08
Last Modified: 2010-04-17
Hi everyone!

I was given a specific list of url's that I need to spider with a .NET, VB Program. The problem is that many of the sites are in a foreign language.
The question is, is there a way for my spider to detect the page's language? Maybe through the server's response headers? Also, what is a good option for automatic translations? I thought about Systran, but really don't know if it is a good, cost effective option.

Thanks a lot!


Question by:glopezz
    1 Comment
    LVL 25

    Accepted Solution

    Some webpages contains META tags, which specify the language being used. An example of a META tag making this declaration would be:

        <meta http-equiv="Content-Language" content="en-US" />

    Which obviously means: English, US.

    However, most pages actually don't contain these; in which cases, there's no other way of finding out 100%.

    What you could do, is create a database, with the 5 (or 10, for more accuracy) most common words for each language. Then, for each page that you spider, scan them for these words; then create a 'tally' like object, which you will use at the end, to decide which language it is most relevant to (and therefore, is..).

    Other than that, I don't do VB, so I couldn't really contribute on a coding level.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    IT, Stop Being Called Into Every Meeting

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Suggested Solutions

    Title # Comments Views Activity
    countTriple  challenge 8 58
    ClickOnce Install - Shortcut Question 3 37
    bunnyEars challenge 6 45
    strCopies  challenge 17 58
    Here we come across an interesting topic of coding guidelines while designing automation test scripts. The scope of this article will not be limited to QTP but to an overall extent of using VB Scripting for automation projects. Introduction Now…
    Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
    In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
    In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    13 Experts available now in Live!

    Get 1:1 Help Now