How to Identify multiple languages by unicode in a Microsoft Word document and change the font

LouisLyle
LouisLyle used Ask the Experts™
on
I have a large 1500 page MS Word document which contains three different languages: English, polytonic Greek (ancient) and Hebrew. The Hebrew and Greek use the appropriate Unicode ranges. I had kept them separate by using separate font for each, but after changing some of the headings, some of the fonts got changed to Word's defaults, and do not display correctly. What I need is a function to walk through the document and identify each character (or range of characters) depending on the ASCW value, and then change the font of that character or strings of characters.

The values for these ranges are as follows:
Hebrew   U+0590 - U+05FF   (1424-1535) FB00-FB4F  64256-64335
Greek: U+0370  U+03FF   (8801023);  U+1F00  U+1FFF   (79368191)

While it would be neat to identify each word and or string of words using smart tags(?) etc., I want to force the font of each Greek word to be "Cardo" and each Hebrew word "Ezra Sil". Some unicode fonts such as Arial MS Unicode would work fine for all text. but I would like to be able to search just the Hebrew words, Greek words, if I want. Even perhaps index just the Hebrew and Greek words.

The reason I want to have a different font long term is that I am going to convert the document into an html document, and want to use CSS styles for the Greek and Hebrew respectively. Otherwise browsers do not always show the text correctly.

I'm a little concerned about memory usage. When I had previously walked through the document character by character, Word crawled. I ended up using Selection.Find.Font and that was very fast.

Here are three lines from the document (it was a book)

añ½½q." The words in the N. T. known to be Hebrew and not
Aramaic are as follows: ²±´´}½=+вѷӼպß,; »»·»¿Å¹q=+Ô·Ü°ÜÕ¼¾Ù¸Ô¼,; ¼u½
 = +и޵ß,; Á¼±³µ´´}½=+Ô·è Þ°Ò´Ó¼Õºß,; ÁÁ±²}½=+âµè¸ÛÕºß,; ²qÄ¿Â=+Ѽ·ê,; ²µµ»¶µ²¿{²


Louis S.
Dim schar as string
.........
'Function to identify a font's character 
'????
 
Select Case ascw(schar)
Case 1424 to 1535
    objSelection.font.name="Ezra Sil"
Case 64256 to 64335
  objSelection.font.name="Ezra Sil"
Case 880 to 1023
  objSelection.font.name="Cardo"
Case 7936-8191
  objSelection.font.name="Cardo"
Case Else
       'Must be punctation, English or something else
      'Do nothing
End Select

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
hi

You have installed the Arial Unicode MS font is intended for use when you open a document that is formatted with a different language, and you do not have the specific language font(s) installed on your computer system. If you work primarily with documents that were created in different languages, you should install the specific fonts and proofing tools for those languages

And also refer this Microsoft KB Articel : http://support.microsoft.com/kb/287247

Author

Commented:
The question I have is how to treat different language alphabets as separate entities, so that one can search by "font name" for text in a specific language.. I have many Greek polytonic fonts (Palatino Linotype, Segoi UI, Arial MS Unicode, et al.). Also, many fonts support the Hebrew characters. The point of my question is how to separate these in a document. I used a "font" separation approach. In the document (A 1922 Book by AT Robertson "A Greek Grammar of the New Testament") I chose Cardo as the Greek font (It's a unicode font that supports both Greek and Hebrew), Ezra SIL as the Hebrew font (It only supports English and Hebrew), and allowed MS Word to select any English font.

I had to write a conversion utility because the Hebrew and Greek in the document were originally ascii characters in the 33-128 range. I converted the fonts into unicode. Getting the Hebrew to show correctly was a pain because MS Word inserted a 202B character in front of every word and a 202A (Left to Right) character after every character that had more than a single Hebrew Unicode consonant. My document shows correctly in all fonts now. Word ends up only showing a 202A character before the Hebrew word and a 202B character after a Hebrew word. This caused problems when cutting and pasting the text into Notepad or HTML (the Hebrew always showed up at the end of the line.). I replaced the 202A (8234) character with a 202C (8236) character, and that problem was fixed.

I have created a solution to my problem. But it is still extremely slow. I think it is either because of a memory issue, screen updating etc.

Here is my code.  I have programmed a lot in MS Access, but MSWord is a different creature to me.  Two functions are included in my code snippet: ChangeGreekFont and ChangeHebrewFont.

I DO NOT UNDERSTAND WHY THE CONVERSION FUNCTION IS SO SLOW. Perhaps it's a memory issue. My machine should be able to handle it with no problem.  After about 2000 words, the function slows down -- Word "whites out" but I can still key in CTL+Break and interrupt the function. I have not used a Word .find function -- which I have found to be fast.
Thanks for your reply.

Louis S.


Public Function ChangeGreekFont(Optional GreekFontName As String = "Cardo") As Boolean
'Greek: U+0370  U+03FF   (880-023);  U+1F00 || U+1FFF   (7936-8191)
Dim rWords As Word.Words, sWord As String, iMidChar As Integer, iWordLen As Integer, objWord As Word.Application
 
Dim dblWord As Double, objRng As Range, dblASCW As Double, dblGreekWordCount As Double
Dim doc As Document, bIsGreekWord As Boolean, objsel As Selection, bChanged As Boolean
 
Set objWord = Word.Application
objWord.ScreenUpdating = False
Set doc = ActiveDocument
doc.Range(0, 0).Select 'move to beginning of document or
'objWord.Selection.HomeKey wdStory
 
Set rWords = doc.Words
For dblWord = 1 To rWords.Count
Set objRng = rWords.Item(dblWord)
 
sWord = objRng.Text
iWordLen = Len(sWord)
If iWordLen > 2 Then
'Get the middle character and find it's code range.
iMidChar = iWordLen \ 2
Else
 iMidChar = 1
End If
 
dblASCW = AscW(Mid(sWord, iMidChar, 1))
 
 
Select Case dblASCW
Case 880 To 1093
  bIsGreekWord = True
Case 7936 To 8191
  bIsGreekWord = True
Case Else
    bIsGreekWord = False
       'Must be punctation, English or something else
      'Do nothing
End Select
 
objRng.Select
bChanged = objRng.Font.Name <> GreekFontName
If bIsGreekWord = True Then
dblGreekWordCount = dblGreekWordCount + 1
Debug.Print objRng.Start, objRng.End, objRng.End - objRng.Start, dblGreekWordCount, objRng.Text, bChanged
objRng.Font.Name = GreekFontName
End If
Next
 
doc.Save
objWord.ScreenUpdating = True
Set objRng = Nothing
Set rWords = Nothing
Set doc = Nothing
 
End Function
 
 
Public Function ChangeHebrewFont(Optional HebrewFontName As String = "Ezra SIL") As Boolean
'Hebrew   U+0590 - U+05FF   (1424-1535) FB00-FB4F  64256-64335
 
Dim rWords As Word.Words, sWord As String, iMidChar As Integer, iWordLen As Integer, objWord As Word.Application
Dim dblWord As Double, objRng As Range, dblASCW As Double, dblHebrewWordCount As Double
Dim doc As Document, bIsHebrewWord As Boolean, objsel As Selection, bChanged As Boolean
Dim iCounter As Double
 
 
Set doc = ActiveDocument
Set objWord = Word.Application
objWord.ScreenUpdating = False
doc.Range(0, 0).Select 'move to beginning of document or
'objWord.Selection.HomeKey wdStory
 
Set rWords = doc.Words
For dblWord = 1 To rWords.Count
Set objRng = rWords.Item(dblWord)
 
sWord = objRng.Text
iWordLen = Len(sWord)
If iWordLen > 2 Then
'Get the middle character and find it's code range.
iMidChar = iWordLen \ 2
Else
 iMidChar = 1
End If
 
dblASCW = AscW(Mid(sWord, iMidChar, 1))
iCounter = iCounter + 1
If iCounter Mod 100 = 0 Then
    DoEvents
End If
 
Select Case dblASCW
Case 1424 To 1535
  bIsHebrewWord = True
Case 64256 To 64335
  bIsHebrewWord = True
Case Else
    bIsHebrewWord = False
       'Must be punctation, English or something else
      'Do nothing
End Select
 
 
If bIsHebrewWord = True Then
dblHebrewWordCount = dblHebrewWordCount + 1
 
objRng.Select
bChanged = objRng.Font.Name <> HebrewFontName
Debug.Print objRng.Start, objRng.End, objRng.End - objRng.Start, dblHebrewWordCount, objRng.Text, bChanged
objRng.Font.Name = HebrewFontName
If Err = 0 Then
ChangeHebrewFont = True
Else
ChangeHebrewFont = False
End If
End If
Next
 
doc.Save
objWord.ScreenUpdating = True
Set objRng = Nothing
Set rWords = Nothing
Set doc = Nothing
 
 
End Function

Open in new window

GrahamSkanRetired
Top Expert 2012

Commented:
If you step through the Words collection by index it will get slower as the index number increases.
I think this is because Word finds the index by counting from the start for each word.

Use the For Each method instead.


for each objRange in doc.range.words

Instead of:

For dblWord = 1 To rWords.Count


Also, have you tried letting Word detect the language with objRange.LanguageID?





Starting with Angular 5

Learn the essential features and functions of the popular JavaScript framework for building mobile, desktop and web applications.

Author

Commented:
I have never used the objRange.LanguageID. Not sure how to do it. Doesn't Word use a list of words for the languageID?  It may work for Hebrew but I don't know about Ancient Greek. I would like to try it sometime.

Anyways, the For Each method on the doc.words.range worked great and flew through the document much faster (4.5 million characters, 1565 pages, 77775 Greek words check). It did in about 10 minutes with the screen updating off, but the debug.print still running. Thanks.

The code below works OK. If someone uses it they may want to comment out the debug.print lines. Error handling is not really dealt with either. There are a few unused variables also.  I should also add a counter for the number of words which fonts have been changed.

Thanks again.
Louis S.

Public Function ChangeHebrewFont(Optional HebrewFontName As String = "Ezra SIL") As Boolean
'Hebrew   U+0590 - U+05FF   (1424-1535) FB00-FB4F  64256-64335
 
Dim rWords As Word.Words, sWord As String, iMidChar As Integer, iWordLen As Integer, objWord As Word.Application
Dim dblWord As Double, objRng As Range, dblASCW As Double, dblHebrewWordCount As Double
Dim doc As Document, bIsHebrewWord As Boolean, objsel As Selection, bChanged As Boolean
Dim iCounter As Double, objRange As Range
 
 
Set doc = ActiveDocument
Set objWord = Word.Application
objWord.ScreenUpdating = False
doc.Range(0, 0).Select 'move to beginning of document or
'objWord.Selection.HomeKey wdStory
'Set rWords = doc.Words
'For dblWord = 1 To rWords.Count
For Each objRange In doc.Range.Words
'Set objRange = rWords.Item(dblWord)
 
sWord = objRange.Text
iWordLen = Len(sWord)
If iWordLen > 2 Then
'Get the middle character and find it's code range. The word is all 'Greek, Hebrew, etc. (an assumption)
iMidChar = iWordLen \ 2
Else
 iMidChar = 1
End If
 
dblASCW = AscW(Mid(sWord, iMidChar, 1))
iCounter = iCounter + 1
If iCounter Mod 100 = 0 Then
    DoEvents
End If
 
Select Case dblASCW
Case 1424 To 1535
  bIsHebrewWord = True
Case 64256 To 64335
  bIsHebrewWord = True
Case Else
    bIsHebrewWord = False
       'Must be punctation, English or something else
      'Do nothing
End Select
 
 
If bIsHebrewWord = True Then
dblHebrewWordCount = dblHebrewWordCount + 1
objRange.Select
bChanged = objRange.Font.Name <> HebrewFontName
Debug.Print objRange.Start, objRange.End, objRange.End - objRange.Start, dblHebrewWordCount, objRange.Text, bChanged
If bChanged Then
objRange.Font.Name = HebrewFontName
End If
If Err = 0 Then
ChangeHebrewFont = True
Else
ChangeHebrewFont = False
End If
End If
Next
 
doc.Save
objWord.ScreenUpdating = True
Set objRange = Nothing
Set rWords = Nothing
Set doc = Nothing
 
 
End Function
 
Public Function ChangeGreekFont(Optional GreekFontName As String = "Cardo") As Boolean
'Greek: U+0370  U+03FF   (880-023);  U+1F00 || U+1FFF   (7936-8191)
Dim rWords As Word.Words, sWord As String, iMidChar As Integer, iWordLen As Integer, objWord As Word.Application
 
Dim dblWord As Double, dblASCW As Double, dblGreekWordCount As Double
Dim doc As Document, bIsGreekWord As Boolean, objsel As Selection, bChanged As Boolean, objRange As Range
 
Set objWord = Word.Application
objWord.ScreenUpdating = False
Set doc = ActiveDocument
doc.Range(0, 0).Select 'move to beginning of document or
'objWord.Selection.HomeKey wdStory
 
Set rWords = doc.Words
For Each objRange In doc.Range.Words
'For dblWord = 1 To rWords.Count
'Set objRng = rWords.Item(dblWord)
 
sWord = objRange.Text
iWordLen = Len(sWord)
If iWordLen > 2 Then
'Get the middle character and find it's code range.
iMidChar = iWordLen \ 2
Else
 iMidChar = 1
End If
 
dblASCW = AscW(Mid(sWord, iMidChar, 1))
 
 
Select Case dblASCW
Case 880 To 1093
  bIsGreekWord = True
Case 7936 To 8191
  bIsGreekWord = True
Case Else
    bIsGreekWord = False
       'Must be punctation, English or something else
      'Do nothing
End Select
 
If bIsGreekWord = True Then
objRange.Select
bChanged = objRange.Font.Name <> GreekFontName
dblGreekWordCount = dblGreekWordCount + 1
Debug.Print objRange.Start, objRange.End, objRange.End - objRange.Start, dblGreekWordCount, objRange.Text, bChanged
If bChanged Then
On Error Resume Next
objRange.Font.Name = GreekFontName
'If Err = 0 Then
'    ChangeGreekFont = True
'End If
End If
End If
Next
 
doc.Save
objWord.ScreenUpdating = True
Set objRng = Nothing
Set rWords = Nothing
Set doc = Nothing
 
End Function

Open in new window

GrahamSkanRetired
Top Expert 2012

Commented:
I'm not sure that you were in a position to use it. If each part (range) of the document was originally set to its particular language, then you should be able to recall the setting using the LanguageID.

Now that you have solved your problem, you can, if you want, select the language portions and set them to the correct language.

2007: Review tab, proofing group, Set Language button.
<2007: Tools/Language

Author

Commented:
No, the Greek is not picked up by Word as a distinct language. The Hebrew was. So I would have to script it. The words are interspersed. There are perhaps only a couple of solid blocks of Greek text, but about 40000 Greek words.
Commented:
Question PAQ'd, 125 points refunded, and stored in the solution database.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial