We help IT Professionals succeed at work.

Program to determine language of text

esak2000
esak2000 used Ask the Experts™
on
Is there a program that can be used to determine the language of a text programmatically?
I would like to be able to pass some text as a parameter and get back the language of the text, similar to the way that Google's language detect works on the web.

I sent an email to Likasoft for their Polyglot 3000 software, but didn't get a response.

I'm using MS Visual FoxPro as my programming language.

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
So, why don't you use http://www.google.com/uds/samples/language/detect.html from FoxPro? It should be feasible to start browser or browser OLE control, propagate the text, and read the result...

Then you may continue on many links here: http://tnlessone.wordpress.com/2007/05/13/how-to-detect-which-language-a-text-is-written-in-or-when-science-meets-human/ 

Author

Commented:
"So, why don't you use http://www.google.com/uds/samples/language/detect.html from FoxPro? It should be feasible to start browser or browser OLE control, propagate the text, and read the result..."

I tried that (and using Google's tool is my preferance), but the html result is not viewable in IE, so Foxpro can't 'see' the result. Can you view it?

I'll take a look at your links

Thank you!
CaptainCyrilFounder, Software Engineer, Data Scientist

Commented:

Author

Commented:
Hi CaptainCyril,

I tried using the API, but the html results of the page show me:

 var container = document.getElementById("detection");
  container.innerHTML = text + " is: <b>" + language + "</b>";

so I can view the actual language on the web page that is generated, but the web code only shows me the variable 'language' and not the actual variable value. Do you know how I can view the variable language in the html of the response?

Thank you,

Daniel
Founder, Software Engineer, Data Scientist
Commented:
Oh boy! Not sure about that since it gets calculated on rendering the page.

How about getting the contents of the HTML document as a whole and then parsing it.

? ie.Document.Body.innerText

Author

Commented:
I tried that, but it doesn't work. Is there some way to force the variable to a file?
CaptainCyrilFounder, Software Engineer, Data Scientist

Commented:
CaptainCyrilFounder, Software Engineer, Data Scientist

Commented:
You can also fill the variable in a hidden variable or textbox and then read the value of the textbox from VFP.

Author

Commented:
"You can also fill the variable in a hidden variable or textbox and then read the value of the textbox from VFP."

Can you write the basic code for that?
CaptainCyrilFounder, Software Engineer, Data Scientist

Commented:
Are you able to change the java script?

document.getElementById('txtLanguage').value = language

Author

Commented:
I found the html in the innertext as you suggested earlier!
The problem was, the page was showing as 'complete' before it loaded the javascript completely, and therefore wasn't showing in my code. I adjusted the code to wait and now it works.

Thanks so much for your help!

Daniel
CaptainCyrilFounder, Software Engineer, Data Scientist

Commented:
You are welcome.
Olaf DoschkeSoftware Developer

Commented:
It is quite inelegant to make use of IE and javascript and extract the result from a HTML page, as there is a version of the google API for non web/javascript users: http://code.google.com/intl/en/apis/ajaxlanguage/documentation/reference.html#_intro_fonje

You can call this for example the following way. The text must be UTF-8 encoded, eg by tcUTF=STRCONV("the text string",9) and then URLencoded.

If you want to stay with the javascript version also take a look at http://west-wind.com/weblog/posts/493536.aspx 
This shows in general how to call javascript from vfp and even get back a result. You could use this to more directly get the value of the language javascript variable instead of first writing to the HTML document and then extracting it from there.

Doesn't matter very much in performance, as the main bottleneck already is the web request anyway, but it may come in handy anyway.

Bye, Olaf.
o = CREATEOBJECT("Microsoft.XMLHTTP")
o.open("GET","http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=This%20is%20a%20test")
o.send()
? o.responseText

Open in new window

Olaf DoschkeSoftware Developer

Commented:
Something is missing: You also need to wait for o.readystate=4 before being able to access o.responsetext.

Bye, Olaf.

o = CREATEOBJECT("Microsoft.XMLHTTP")
o.open("GET","http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=This%20is%20a%20test")
o.send()
do while o.readystate<>4
 doevents force
enddo
? o.responseText

Open in new window

Great example Olaf!

Author

Commented:
Thank you for the posting! I will try it.

Author

Commented:
It worked, and saves a lot of time. Thanks for taking to the time!
Daniel
Olaf DoschkeSoftware Developer

Commented:
Thanks, pcelba and Daniel,

especially, because this could come in handy on day.

As a bonus some simple urlencode function and an example on using it with some spanish text.

Bye, Olaf.


lcText = "Hablamos Español"
lcText = UrlEncode(Strconv(lcText,9))

o = CREATEOBJECT("Microsoft.XMLHTTP")
o.open("GET","http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q="+lcText)
o.send()
do while o.readystate<>4
 doevents force
enddo
? o.responseText

#Define ccUrlChars '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ$-_.~'

Function UrlEncode(tcData)
      Local lcUrlEncoded, lnPos, lcChar
      lcUrlEncoded = ''

      For lnPos=1 To Len(tcData)
         lcChar = Substr(tcData,lnPos,1)
         If lcChar $ ccUrlChars
            lcUrlEncoded = lcUrlEncoded + lcChar
         Else
            lcUrlEncoded = lcUrlEncoded + '%'+Upper(Right(Transform(Asc(lcChar),'@0'),2))
         Endif
      Endfor

      Return lcUrlEncoded
 EndFunc 

Open in new window