bsharath
asked on
Detect a websites language
Hi,
Detect a websites language
I have a excel with 100's of website names and any way to check each and find if the website is English or what other language it is?
Detect a websites language
I have a excel with 100's of website names and any way to check each and find if the website is English or what other language it is?
ASKER
Any straight answer please?
Hi bsharath,
Here is tool developed by microsoft . Type/copy the characther (eg. chinse or japanese ) from the left side of box and click translate. it will automatically show which language it is in Translate from drop down. Please let me know if you have any issue.
http://www.bing.com/translator/?ref=SALL&br=ro&mkt=en-US
Here is tool developed by microsoft . Type/copy the characther (eg. chinse or japanese ) from the left side of box and click translate. it will automatically show which language it is in Translate from drop down. Please let me know if you have any issue.
http://www.bing.com/translator/?ref=SALL&br=ro&mkt=en-US
ASKER
Thanks but as mentioned i want to input website names and need the results on what language are those sites
No other volunteers, so here is my tuppence-worth
In principle, this could be done using a WebBrowser object to navigate to each URL, extracting the text and placing it in a blank Word document.
I can get the Word part working, but not the browser part. For some reason, it never finishes loading the web page, so cannot go on to test the text. In case the problem is an obscure setting on my system, here is the code anyway. It is in an Excel User form. References are set to the Microsoft Word library, the HTML object library and the Internet controls library
In principle, this could be done using a WebBrowser object to navigate to each URL, extracting the text and placing it in a blank Word document.
I can get the Word part working, but not the browser part. For some reason, it never finishes loading the web page, so cannot go on to test the text. In case the problem is an obscure setting on my system, here is the code anyway. It is in an Excel User form. References are set to the Microsoft Word library, the HTML object library and the Internet controls library
Option Explicit
Dim WithEvents Web1 As InternetExplorer
Dim wdApp As Word.Application
Private Sub CommandButton1_Click()
Dim strHTML As String
Dim strURL As String
Set wdApp = CreateObject("Word.Application")
wdApp.Visible = True
Set Web1 = CreateObject("InternetExplorer.Application")
Web1.Visible = True
strURL = "http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Accueil_principal"
Web1.navigate strURL
End Sub
Function GetLanguage(strText As String) As String
Dim doc As Word.Document
wdApp.CheckLanguage = True
Set doc = Documents.Add
doc.Range.Paste
Select Case doc.Range.LanguageID
Case 2057
GetLanguage = "English"
Case 1036
GetLanguage = "French"
'Case 1033 ...
End Select
doc.Close wdDoNotSaveChanges
End Function
Private Sub Web1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
WriteLog "DocumentComplete"
If (pDisp Is Web1) Then
MsgBox GetLanguage(Web1.Document.body.innerText)
End If
End Sub
Private Sub Web1_StatusTextChange(ByVal Text As String)
WriteLog "StatusTextChange: " & Text
End Sub
Sub WriteLog(ByVal Text As String)
Dim f As Integer
Dim strFileName As String
strFileName = "web" & Format$(Now, "MMy") & ".log"
Text = Format$(Now, "HH:nn:ss") & " " & Text
Debug.Print Text
f = FreeFile
Open ActiveWorkbook.Path & "\" & strFileName For Append As #f
Print #f, Text
Close #f
End Sub
ASKER
Thanks
is this a Word macro?
I tried as word macro and this line shows red
Dim WithEvents Web1 As InternetExplorer
Should i have the website names in word?
is this a Word macro?
I tried as word macro and this line shows red
Dim WithEvents Web1 As InternetExplorer
Should i have the website names in word?
ASKER
Thanks for the reply
I have a website and a lot of activity from China who place all sorts of links for Nike/Clothes etc
Rather than me checking 100's of links i have
I have extracted them from my DB and if i find a NON English language URl then i can manually check it to find spam content
its basically help you make the job easier and i am not involved in any hacking :-)
I have a website and a lot of activity from China who place all sorts of links for Nike/Clothes etc
Rather than me checking 100's of links i have
I have extracted them from my DB and if i find a NON English language URl then i can manually check it to find spam content
its basically help you make the job easier and i am not involved in any hacking :-)
The code is hosted in an Excel Userform. You also need to set the references that I mentioned in my last comment.
It only tests one site at the moment. Walking through a column of URLs on an Excel Sheet is the easy bit.
It only tests one site at the moment. Walking through a column of URLs on an Excel Sheet is the easy bit.
ASKER
I added the refrence and inserted a new user form and when i run the macro i get this
-------------------------- -
Microsoft Visual Basic
-------------------------- -
Compile error:
User-defined type not defined
-------------------------- -
OK Help
-------------------------- -
Please see attached file
Language.xlsm
--------------------------
Microsoft Visual Basic
--------------------------
Compile error:
User-defined type not defined
--------------------------
OK Help
--------------------------
Please see attached file
Language.xlsm
You need to set references to the he Microsoft Word library, the Microsoft HTML object library and the Microsoft Internet controls library.
Here is an updated version that works a bit better. It still needs the same references
Here is an updated version that works a bit better. It still needs the same references
Private Sub CommandButton1_Click()
Dim strURL As String
If wdApp Is Nothing Then
On Error Resume Next
Set wdApp = GetObject(, "Word.Application")
On Error GoTo 0
If wdApp Is Nothing Then
Set wdApp = CreateObject("Word.Application")
wdApp.Visible = True
End If
End If
Set Web1 = CreateObject("InternetExplorer.Application")
Web1.Visible = True
strURL = "http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Accueil_principal"
Web1.navigate strURL
End Sub
Function GetLanguage(strText As String) As String
Dim doc As Word.Document
Dim p As Integer
Dim para As Paragraph
Dim i As Integer
Dim L As Integer
Dim LID As WdLanguageID
Dim Languages() As WdLanguageID
Dim Paracounts() As Integer
Dim iMax As Integer
Dim iMaxID As WdLanguageID
L = -1
wdApp.CheckLanguage = True
WriteLog "Length: " & Len(strText)
Set doc = Documents.Add
doc.Range.Text = strText
For Each para In doc.Paragraphs
LID = para.Range.LanguageID
For i = 0 To L
If LID = Languages(i) Then
Paracounts(i) = Paracounts(i) + 1
Exit For
End If
Next i
If i > L Then
ReDim Preserve Languages(i)
ReDim Preserve Paracounts(i)
Paracounts(i) = 1
Languages(i) = LID
L = i
End If
Next para
For i = 0 To L
If iMax < Paracounts(i) Then
iMax = Paracounts(i)
iMaxID = Languages(i)
End If
Next i
WriteLog "Count: " & iMax & ", ID: " & iMaxID
Select Case iMaxID
Case 2057
GetLanguage = "English"
Case 1036
GetLanguage = "French"
'Case 1033 ...
End Select
doc.Close wdDoNotSaveChanges
End Function
Private Sub Web1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
WriteLog "DocumentComplete"
If (pDisp Is Web1) Then
MsgBox GetLanguage(Web1.Document.body.innerText)
End If
End Sub
Private Sub Web1_StatusTextChange(ByVal Text As String)
WriteLog "StatusTextChange: " & Text
End Sub
Sub WriteLog(ByVal Text As String)
Dim f As Integer
Dim strFileName As String
strFileName = "web" & Format$(Now, "MMy") & ".log"
Text = Format$(Now, "HH:nn:ss") & " " & Text
Debug.Print Text
f = FreeFile
Open ActiveWorkbook.Path & "\" & strFileName For Append As #f
Print #f, Text
Close #f
End Sub
ASKER
Column "B" i get -1's and 0's
Is that right?
Is that right?
In that version, there is no attempt to read or write from the sheet, unless you have coded it yourself, which I am sure that you able to do. As I posted it, the code should merely report the result from a single hard-coded URL via a message box.
There still seem to be some timing issues, which I am currently trying to resolve.
There still seem to be some timing issues, which I am currently trying to resolve.
This version seems to work every time.
The apparent timing issues have been resolved, but you might think that the performance is not very good. Most non-English sites have a bit of English somewhere, so that the language is returned as 'mixed' (language ID = 999999). For this reason the code examines every paragraph, keeping a count of the languages used. The returned language is the one with the maximum count of paragraphs in that language. This slows the application down.
It now uses a list of sites taken from column1 of Sheet 3 of the current workbook, the results being entered into column 2. Note that Word recognises over 200 languages and variations, so only a few are in the code (see wdlanguageID Enumeration in the Word VBA help.)
The apparent timing issues have been resolved, but you might think that the performance is not very good. Most non-English sites have a bit of English somewhere, so that the language is returned as 'mixed' (language ID = 999999). For this reason the code examines every paragraph, keeping a count of the languages used. The returned language is the one with the maximum count of paragraphs in that language. This slows the application down.
It now uses a list of sites taken from column1 of Sheet 3 of the current workbook, the results being entered into column 2. Note that Word recognises over 200 languages and variations, so only a few are in the code (see wdlanguageID Enumeration in the Word VBA help.)
Option Explicit
Dim WithEvents Web1 As InternetExplorer
Dim wdApp As Word.Application
Dim r As Integer
Dim sh As Worksheet
Dim bNewWordApp As Boolean
Private Sub CommandButton1_Click()
Dim strURL As String
If wdApp Is Nothing Then
On Error Resume Next
Set wdApp = GetObject(, "Word.Application")
On Error GoTo 0
If wdApp Is Nothing Then
Set wdApp = CreateObject("Word.Application")
bNewWordApp = True
End If
End If
wdApp.Visible = True
Set Web1 = CreateObject("InternetExplorer.Application")
Web1.Visible = True
Set sh = ActiveWorkbook.Worksheets(3)
r = 2
strURL = sh.Cells(r, 1).value
Web1.navigate strURL
WriteLog "Navigating to " & strURL
End Sub
Function GetLanguage(strText As String) As String
Dim doc As Word.Document
Dim p As Integer
Dim para As Paragraph
Dim i As Integer
Dim L As Integer
Dim LID As WdLanguageID
Dim Languages() As WdLanguageID
Dim Paracounts() As Integer
Dim iMax As Integer
Dim iMaxID As WdLanguageID
WriteLog "GetLanguage. Text length: " & Len(strText)
L = -1
wdApp.CheckLanguage = True
Set doc = Documents.Add
doc.Range.Text = strText
doc.Range.CheckSpelling
doc.Range.Select
WriteLog "GetLanguage. Paragraph count: " & doc.Paragraphs.Count
For Each para In doc.Paragraphs
para.Range.Select 'seems necessary to provoke automatic language detecion
DoEvents
LID = para.Range.LanguageID
For i = 0 To L
If LID = Languages(i) Then
Paracounts(i) = Paracounts(i) + 1
Exit For
End If
Next i
If i > L Then
ReDim Preserve Languages(i)
ReDim Preserve Paracounts(i)
Paracounts(i) = 1
Languages(i) = LID
L = i
End If
Next para
For i = 0 To L
If iMax < Paracounts(i) Then
iMax = Paracounts(i)
iMaxID = Languages(i)
End If
Next i
WriteLog "Max language count: " & iMax & ", ID: " & iMaxID
'End If
Select Case iMaxID
'Select Case doc.Range.LanguageID
Case 2057, 1033, 3081
GetLanguage = "English"
Case 1036, 3084
GetLanguage = "French"
Case 1034, 3082
GetLanguage = "Spanish"
Case Else
GetLanguage = "Language ID: " & iMaxID & " not yet encoded in this macro"
End Select
doc.Close wdDoNotSaveChanges
End Function
Private Sub Web1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
Dim strURL As String
WriteLog "DocumentComplete: " & URL
If (pDisp Is Web1) Then
sh.Cells(r, 2).value = GetLanguage(Web1.Document.body.innerText)
r = r + 1
If sh.Cells(r, 1).value <> "" Then
strURL = sh.Cells(r, 1).value
WriteLog "Navigating to " & strURL
Web1.navigate strURL
Else
If bNewWordApp Then
wdApp.Quit
End If
End If
End If
End Sub
Private Sub Web1_StatusTextChange(ByVal Text As String)
WriteLog "StatusTextChange: " & Text
End Sub
Sub WriteLog(ByVal Text As String)
Dim f As Integer
Dim strFileName As String
strFileName = "web" & Format$(Now, "MMy") & ".log"
Text = Format$(Now, "HH:nn:ss") & " " & Text
Debug.Print Text
f = FreeFile
Open ActiveWorkbook.Path & "\" & strFileName For Append As #f
Print #f, Text
Close #f
End Sub
ASKER
In excel i create a userform and paste this code?
If you have a working file can you please attach
If you have a working file can you please attach
My file has other things in it.
You need a Userform with one command button with a default name. Copy and paste the code into the Userform's code. Set the references, add your URLs to the sheets, and there you go.
You need a Userform with one command button with a default name. Copy and paste the code into the Userform's code. Set the references, add your URLs to the sheets, and there you go.
ASKER
I get system call failed
sh.Cells(r, 2).Value = GetLanguage(Web1.Document. body.inner Text)
sh.Cells(r, 2).Value = GetLanguage(Web1.Document.
That's a new one on me. Does the error message have a number?
ASKER
First a word blank is opened and then IE and site is loaded and then i get
Run time error '-2147417856 (80010100);: Automation error System Call failed
Run time error '-2147417856 (80010100);: Automation error System Call failed
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I got this
-------------------------- -
Microsoft Visual Basic
-------------------------- -
Compile error:
Sub or Function not defined
-------------------------- -
OK Help
-------------------------- -
log has
16:36:06 Navigating to https://www.experts-exchange.com
16:36:06 StatusTextChange: Looking up https://www.experts-exchange.com
16:36:06 StatusTextChange: Connecting to site 64.156.132.150
16:36:07 StatusTextChange: Website found. Waiting for reply...
16:36:20 StatusTextChange: Start downloading from site: https://www.experts-exchange.com/
16:36:20 StatusTextChange: Downloading from site: https://www.experts-exchange.com/
16:36:20 StatusTextChange:
16:36:20 StatusTextChange:
16:36:20 StatusTextChange:
16:36:20 StatusTextChange: Waiting for https://www.experts-exchange.com/...
16:36:20 StatusTextChange: Waiting for https://www.experts-exchange.com/...
16:38:00 StatusTextChange: Done
16:38:04 DocumentComplete: https://www.experts-exchange.com/
17:41:19 Navigating to https://www.experts-exchange.com
17:41:19 StatusTextChange: Looking up https://www.experts-exchange.com
17:41:23 StatusTextChange: Connecting to site 64.156.132.150
17:41:23 StatusTextChange: Website found. Waiting for reply...
17:41:23 StatusTextChange: Start downloading from site: https://www.experts-exchange.com/
17:41:23 StatusTextChange: Downloading from site: https://www.experts-exchange.com/
17:41:24 StatusTextChange:
17:41:24 StatusTextChange:
17:41:24 StatusTextChange:
17:41:24 StatusTextChange: Waiting for https://www.experts-exchange.com/...
17:41:28 StatusTextChange: Waiting for https://www.experts-exchange.com/...
17:41:28 StatusTextChange: Waiting for https://www.experts-exchange.com/...
17:43:09 StatusTextChange: Done
17:43:13 DocumentComplete: https://www.experts-exchange.com/
--------------------------
Microsoft Visual Basic
--------------------------
Compile error:
Sub or Function not defined
--------------------------
OK Help
--------------------------
log has
16:36:06 Navigating to https://www.experts-exchange.com
16:36:06 StatusTextChange: Looking up https://www.experts-exchange.com
16:36:06 StatusTextChange: Connecting to site 64.156.132.150
16:36:07 StatusTextChange: Website found. Waiting for reply...
16:36:20 StatusTextChange: Start downloading from site: https://www.experts-exchange.com/
16:36:20 StatusTextChange: Downloading from site: https://www.experts-exchange.com/
16:36:20 StatusTextChange:
16:36:20 StatusTextChange:
16:36:20 StatusTextChange:
16:36:20 StatusTextChange: Waiting for https://www.experts-exchange.com/...
16:36:20 StatusTextChange: Waiting for https://www.experts-exchange.com/...
16:38:00 StatusTextChange: Done
16:38:04 DocumentComplete: https://www.experts-exchange.com/
17:41:19 Navigating to https://www.experts-exchange.com
17:41:19 StatusTextChange: Looking up https://www.experts-exchange.com
17:41:23 StatusTextChange: Connecting to site 64.156.132.150
17:41:23 StatusTextChange: Website found. Waiting for reply...
17:41:23 StatusTextChange: Start downloading from site: https://www.experts-exchange.com/
17:41:23 StatusTextChange: Downloading from site: https://www.experts-exchange.com/
17:41:24 StatusTextChange:
17:41:24 StatusTextChange:
17:41:24 StatusTextChange:
17:41:24 StatusTextChange: Waiting for https://www.experts-exchange.com/...
17:41:28 StatusTextChange: Waiting for https://www.experts-exchange.com/...
17:41:28 StatusTextChange: Waiting for https://www.experts-exchange.com/...
17:43:09 StatusTextChange: Done
17:43:13 DocumentComplete: https://www.experts-exchange.com/
You guys are mad.... And I mean that in a happy positive way, but this is not the right way to go about checking for spammers.
Sure, there are a few countries out there that are well known, but, spammers can happen from anywhere. Just as legitimate users can happen from anywhere.
What you should be doing is checking IP addresses and checking for multiple occurances against known members (known members being OK). Some free code I have used before is : http://www.nirsoft.net/utils/ipnetinfo.html
That whole messaging of "connecting to site" then "starting download" looks very very suspiscous and not just checking the country of origin.
You indicated that you were "manually" checking, but now you have a bot to log in and download and check - you will want to be very careful because detection and use of "bots" to access sites can be considered hacking or using techniques to avoid legitimite user access.
Be very careful where this thread leads - it will get shut down in a heart beat - already running very close to infringing Terms Of Use. Specifically :
The pity is you are using GrahamSkan to help build such a system, and one could argue for the commercial benefit of another (competitive) site.
Exactly what is your relationship with techyv ?
Sure, there are a few countries out there that are well known, but, spammers can happen from anywhere. Just as legitimate users can happen from anywhere.
What you should be doing is checking IP addresses and checking for multiple occurances against known members (known members being OK). Some free code I have used before is : http://www.nirsoft.net/utils/ipnetinfo.html
That whole messaging of "connecting to site" then "starting download" looks very very suspiscous and not just checking the country of origin.
You indicated that you were "manually" checking, but now you have a bot to log in and download and check - you will want to be very careful because detection and use of "bots" to access sites can be considered hacking or using techniques to avoid legitimite user access.
Be very careful where this thread leads - it will get shut down in a heart beat - already running very close to infringing Terms Of Use. Specifically :
5.2.The Services and Site may not be used by scripts, machines or automated Services without express written consent of the Site ownership.Which is exactly what you are doing above...
The pity is you are using GrahamSkan to help build such a system, and one could argue for the commercial benefit of another (competitive) site.
Exactly what is your relationship with techyv ?
ASKER
Hi Mark,
Thanks for the message. I definitely agree with your comments but it so difficult to find such spammers and banning ip ranges and ip's have been done before by me but it does not stop them.
I have a spam module that blocks 90% of such comments and the 10% is what i end up missing few from deleting them while we do manual moderation. 99% is from China and Kenya and 99% are from the same people with 100's of same URL's from Nike etc.
My intentions are clear and no way to hacking etc as i am against the same as well.
Finding a websites PR or finding a websites rank with alexa is also something that you may feel is like hacking as we retrieve similar kind of information from such sites. Or something close.
I just need help with a script that can find if a website is English Yes/NO
As my website is English a NON English link will make no sense in my site and this is one way a helpful way to find if that comment that has this link is a spammers comment.
This will identify a way for me to delete them.
I am no way close to argument with you Experts and would love to hear a better approach if any
I have tried different ways
1. Banning ip's and a huge range from China
2. Different spam modules
3. Manual Moderation
4. Admin moderation for each comment before approval
5. Admin moderation if a link or email id if found etc
But still some escapes such techniques and get posted
Thanks for the message. I definitely agree with your comments but it so difficult to find such spammers and banning ip ranges and ip's have been done before by me but it does not stop them.
I have a spam module that blocks 90% of such comments and the 10% is what i end up missing few from deleting them while we do manual moderation. 99% is from China and Kenya and 99% are from the same people with 100's of same URL's from Nike etc.
My intentions are clear and no way to hacking etc as i am against the same as well.
Finding a websites PR or finding a websites rank with alexa is also something that you may feel is like hacking as we retrieve similar kind of information from such sites. Or something close.
I just need help with a script that can find if a website is English Yes/NO
As my website is English a NON English link will make no sense in my site and this is one way a helpful way to find if that comment that has this link is a spammers comment.
This will identify a way for me to delete them.
I am no way close to argument with you Experts and would love to hear a better approach if any
I have tried different ways
1. Banning ip's and a huge range from China
2. Different spam modules
3. Manual Moderation
4. Admin moderation for each comment before approval
5. Admin moderation if a link or email id if found etc
But still some escapes such techniques and get posted
Sorry, been tied up with other things.
What were the messages that you posted before the log text?
If you are reporting a new error, can you say where it occurs, please?
What were the messages that you posted before the log text?
If you are reporting a new error, can you say where it occurs, please?
ASKER
Thanks Graham for getting back
I am getting the error as in
ID: 38734016
Can you please tell me how i can check where it errors
I am getting the error as in
ID: 38734016
Can you please tell me how i can check where it errors
When there is a compile error, the failing code should be highlighted in dark blue.
ASKER
Attached file
can you tell me if i set it up right?
The references and the button click etc
As for me now nothing changes
can you tell me if i set it up right?
The references and the button click etc
As for me now nothing changes
bSharath,
You have asked more that 7000 questions on this site. Many of them had VBA macro code as an answer. It is hard to know what difficulty someone with that level of experience can be having.
The file that you posted seems OK except that the references to the Microsoft Internet Controls and the Microsoft HTML Object Library are not set.
You have asked more that 7000 questions on this site. Many of them had VBA macro code as an answer. It is hard to know what difficulty someone with that level of experience can be having.
The file that you posted seems OK except that the references to the Microsoft Internet Controls and the Microsoft HTML Object Library are not set.
Web Page Language Identification Based on URLs:
http://www.vldb.org/pvldb/1/1453880.pdf