Problems with foreign languages, when converting Rich text to html

Graff
Graff used Ask the Experts™
on
My client deals in 3 languages; English, Russian and Ukrainian. I developing a program for them that will enable them to write up various documents (news, events, etc) in all three languages and then store them in a MySQL database which will then be displayed on the website using php.

My problem is when I convert my rich text to html the Russian and Ukrainian text are turned into question marks. Any help would be appriciated.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
Hi

1. how do you convert your tich text to html ?
2. in the html - do you use any charset ?
   did you try to use unicode (UTF-8) ?

Yaniv

Author

Commented:
1. I use a function that basically examines the text character by character to determine what its properties are. (I got the module here: http://pscode.com/vb/scripts/ShowCode.asp?txtCodeId=5267&lngWId=1&txtForceRefresh=1231200220564197825)

2. I haven't even go to the part where it is used by the website... as soon as I try to convert it, it treats any non-english character as a ?

Author

Commented:
1. I use a function that basically examines the text character by character to determine what its properties are. (I got the module here: http://pscode.com/vb/scripts/ShowCode.asp?txtCodeId=5267&lngWId=1&txtForceRefresh=1231200220564197825)

2. I haven't even go to the part where it is used by the website... as soon as I try to convert it, it treats any non-english character as a ?
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
oops hit f5 by mistake.

Commented:
if you change in the downloaded code the line:
strHTML$ = strHTML$ & rtbRichTextBox.SelText
to:
     ch = Mid(rtbRichTextBox.SelRTF, InStrRev(rtbRichTextBox.SelRTF, "\") + 2, 4)
     If IsNumeric(ch) Then
       strHTML$ = strHTML$ & "&#" & ch & ";"
     Else
       strHTML$ = strHTML$ & rtbRichTextBox.SelText
     End If

then you'll get a code that in IE shows the Russian (and i belive the Ukrainian) language.
but then it will be hard to edit the text because all you'll see in the code is Т&#1077... instead of the characters.
(you can write the code that will take these numbers and put them back in the RTF for editing)

it is not the best way to deal with it - but i don't know another one so...
here is site that you might want to look about writing cyrillic htmls:
http://ourworld.compuserve.com/homepages/paulgor/cpage_e.htm

anyway, i hope it helps you
Yaniv

Commented:
Hi again

a better way (addition to the first one):
first of all - replace the txtHTML text box to the one in Microsoft Forms 2.0 (it gives you unicode abilities)
and don't forget to add in your html that you are using charset UTF-8 (unicode).
after you get the string back from the function (RichToHTML) - you can call to a method that will replace the numbers back to letters:

Private Sub fixcyr()
Dim i As Long

  'check the relevent range that you need
  For i = 1000 To 1200    
     txtHTML.Text = Replace(txtHTML.Text, "&#" & Trim(Str(i)) & ";", ChrW(i))
  Next i
End Sub

Yaniv

Author

Commented:
Sorry about the late response, I've been away. The code to convert the characters you supplied does not work. The ch never meets the condition.

Commented:
the code i wrote to you works on my computer, so i can't tell what's wrong at yours...

when you put in the rich textbox some words in russian it doesn't convert each letter to a number ?
try to debug and tell me what ch gets instead

Yaniv

Author

Commented:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset204 Tahoma;}{\f1\fnil\fcharset0 MS Sans Serif;}}
\viewkind4\uc1\pard\lang1049\f0\fs20\'f4\'fb\'e2\'e0 \'f4\'fb\'e2\'e0\lang1033\f1\fs17
\par }

Here's my my save file

I wrote ôûâà ôûâà (asdf asdf in russian) \'f4\'fb\'e2\'e0 must be the characters

Commented:
\'f4\'fb\'e2\'e0  is what you need
if you do chrw(&Hf4) - you'll get - ô (the hexadecimal F4)
so get this out from the rtbRichTextBox.SelRTF and replace it later in the second function...

Yaniv

Author

Commented:
ok but it doesn't show up as russian it shows up ô
Commented:
change the ch part to:
    If (Len(rtbRichTextBox.SelRTF) - InStrRev(rtbRichTextBox.SelRTF, "'")) <= 5 Then
      ch = Mid(rtbRichTextBox.SelRTF, InStrRev(rtbRichTextBox.SelRTF, "\") + 2, 2)
      strHTML$ = strHTML$ & "&H" & ch & ";"
    Else
      ch = " "
      strHTML$ = strHTML$ & rtbRichTextBox.SelText
    End If

change fixcyr to:
  For i = 200 To 500
    txtHTML.Text = Replace(LCase(txtHTML.Text), "&h" & LCase(Hex(i)) & ";", ChrW("&h" & Hex(Trim(Str(i + 848)))))
  Next i

again - check the exact range for i
and check if it works in Ukrainian

Yaniv

Author

Commented:
Welp I think I finally got the encoding working this is the final code:

        lngForText = InStrRev(rtbRichTextBox.SelRTF, "\'")
        lngForTextU = InStrRev(rtbRichTextBox.SelRTF, "\u")
        lngForTextQ = InStr(lngForTextU, rtbRichTextBox.SelRTF, "?")
        If lngForText > 0 Then
            ch = Mid(rtbRichTextBox.SelRTF, lngForText + 2, 2)
            'ch = Mid(rtbRichTextBox.SelRTF, InStrRev(rtbRichTextBox.SelRTF, "\'") + 2, 2)
            strHTML$ = strHTML$ & HexToDec(ch)
        ElseIf lngForTextU > 0 And lngForTextQ > 0 Then
            If Mid(rtbRichTextBox.SelRTF, lngForTextU + 6, 1) = "?" Then
           
                ch = Mid(rtbRichTextBox.SelRTF, lngForTextU + 2, 4)
                strHTML$ = strHTML$ & "&#" & ch & ";"
            End If
        Else
            strHTML$ = strHTML$ & rtbRichTextBox.SelText
        End If

I found that sometimes my code was hex sometimes it was Unicode so I modified it to handle both.  Thanks for your help.

Author

Commented:
Kalsky was instrumental in helping me determine the code needed to solve my problem, without it I would still be scratching my head wondering where to start.

Author

Commented:
erm wrong code :)

lngForText = InStrRev(rtbRichTextBox.SelRTF, "\'")
        lngForTextU = InStrRev(rtbRichTextBox.SelRTF, "\u")
       
        If lngForText > 0 Then
            ch = Mid(rtbRichTextBox.SelRTF, lngForText + 2, 2)
            'ch = Mid(rtbRichTextBox.SelRTF, InStrRev(rtbRichTextBox.SelRTF, "\'") + 2, 2)
            strHTML$ = strHTML$ & HexToDec(ch)
        ElseIf lngForTextU > 0 And Mid(rtbRichTextBox.SelRTF, lngForTextU + 6, 1) = "?" Then
                ch = Mid(rtbRichTextBox.SelRTF, lngForTextU + 2, 4)
                strHTML$ = strHTML$ & "&#" & ch & ";"
        Else
            ch = " "
            strHTML$ = strHTML$ & rtbRichTextBox.SelText
        End If

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial