Link to home
Avatar of APD Toronto
APD TorontoFlag for Canada

asked on

Replacing Unicode Characters

Hi Experts,

I have a 100+ page book in MS Word that was written in a Cyrillic font over 10 years ago. The font is nothing fancy, it was just designed simply to use the standard US Keyboard to map and output Cyrillic characters. Nowadays, as we all know all OS have Cyrillic, and all World characters built in, and it is just a matter of enabling them.

My question is, how would I convert my book into the Unicodes for Cyrillic?

I used to code in VBA, but haven't done so in years. Now I code in PHP, and if I was to code it in PHP, I would do a 2-D array, with the first layer having 64 elements (the Macedonian alphabet has 32 letters) and it would look something like this:

[0]
    [letter] => A
    [Ascii] => 64
    [Uni] => 1040
[1]
    [letter] => B
    [Ascii] => 65
    [Uni] => 1041

Open in new window


Then, I would loop  through through array and use str_replace

Would this be possible in VBA, and  maybe even to run it through Access as for other files, other fonts has been used with slightly different mappings.

By the way, I am using the following Unicode Table

https://www.rapidtables.com/code/text/unicode-characters.html

Thank you
Avatar of gr8gonzo
gr8gonzo
Flag of United States of America image

So just to confirm, you're saying the characters themselves aren't truly Cyrillic, but the font simply makes them look that way.

So if you're reading the document with the font installed on your computer, you might see "яблоко" but if you didn't have the font installed, you would see "abxded", because the font displays the letter "a" as "я". Is that correct?
Can you provide an sample of your word document, I would like to make some test before answering.
Avatar of APD Toronto

ASKER

gr8gonzo - exactly
Might be good to provide both your document + a copy of your font.
I suppose in that case, you could probably use this VBA code:
Sub ReplaceAllDeltaSymbolsWithBetaSymbols()
    'Call the main "ReplaceAllSymbols" macro (below), 
    'and tell it which character code  and font to search for, and which to replace with
    Call ReplaceAllSymbols(FindChar:= ChrW(-3996), FindFont:= "Symbol", _
            ReplaceChar:=-3998, ReplaceFont:="Symbol")
End Sub

Sub ReplaceAllSymbols(FindChar As String, FindFont As String, _
        ReplaceChar As String, ReplaceFont As String)

Dim FoundFont As String, OriginalRange As Range, strFound As Boolean
Application.ScreenUpdating = False

Set OriginalRange = Selection.Range
'start at beginning of document
ActiveDocument.Range(0, 0).Select

strFound = False
With Selection.Find
    .ClearFormatting
    .Text = FindChar
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
   
    Do While .Execute
        'keep searching until nothing found
        If Dialogs(wdDialogInsertSymbol).Font = FindFont Then
            'Insert the replacement symbol where the found symbol was 
            Selection.InsertSymbol Font:=ReplaceFont, _
           CharacterNumber:=ReplaceChar, Unicode:=True
        Else
            Selection.Collapse wdCollapseEnd
        End If
    Loop
  
End With

OriginalRange.Select

Set OriginalRange = Nothing
Application.ScreenUpdating = True

End Sub

Open in new window


Original source:
https://wordmvp.com/FAQs/MacrosVBA/FindReplaceSymbols.htm

The ReplaceAllSymbols function is what performs the replacement in the document for a single character, and ReplaceAllDeltaSymbolsWithBetaSymbols is just an example of how to use it.

You would come up with your own version of ReplaceAllDeltaSymbolsWithBetaSymbols (e.g. "ConvertToTrueCyrillic" or something), and then inside, you would copy the "Call" statement for each character that needs to be re-mapped.
Here is a little excerpt and the font used.

Basically, I need to be able to change the font to Arial and see the same characters , not `, ~, \

I'm not exactly sure how to use the above code,  and setup the mappings.

Ideally, as mentioned, would like to setup an Access table for different font mappings.

EE won't let me attach, but here is a link: http://test.aces-project.com/EE-Cyrillic.zip
ASKER CERTIFIED SOLUTION
Avatar of gr8gonzo
gr8gonzo
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Thank You. I also added the Access layer