asked on

Html encoding problem in powerpoint

Today i'm facing a problem with this code as its returns html values to symbols like <,> etc, which are shown as < & gt; etc.,
After googling, i came to know that its a html to text encoding problem
can you help me what function and where to implement that code in my code.

Thanks a lot in advance.

Jamie Garroch (MVP)

OK, so this is a continuation of this earlier question:

https://www.experts-exchange.com/questions/29070031/How-to-maintain-indentation-in-powerpoint-using-VBA-REGEX-is-working.html

Could you give an example of what is not working as expected. For example, if I put the following into a text box on a slide:

<여보세요>

and run your code, I get this:

[Hello]

That's because the < and > characters are being replaced by the ConvertToGet function to [ and ] respectively, before passing to Google Translate.

jocker panda

ASKER

Yes, you are right..
Because i added that replace function..
But as you know as there are many symbols like ¬~¢¥¿[{<> etc which i cannot replace using replace function for each one of them before translation.
So, i request you to suggest a way to replace the html text with equivalent powerpoint text.

Jamie Garroch (MVP)

You either have to manage al possible symbol conversions (swap them before and after translation) or prevent those symbols from being passed to Google Translate in the first place. What you could do is write a pre-processing function to extract everything that is 'real' Korean text and prevent everything else, such as special symbols, from being passed to Google. Something like this perhaps:

' Return string with only Korean characters
Function StripSymbols(sText As String) As String
  Dim x As Long
  Dim ch As String
  Dim chCode As Long
  Dim sOut As String
  ' Code space range http://memory.loc.gov/diglib/codetables/9.3.html
  Const KoreanStart = &H3131& ' Dec 12593
  Const KoreanEnd = &HCB4C& ' Dec 52044
  ' Recurse all characters in the string
  For x = 1 To Len(sText)
    ' get the next character
    ch = Mid(sText, x, 1)
    ' Convert integer value of the character to the signed character code
    chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
    If chCode >= KoreanStart And chCode <= KoreanEnd Then sOut = sOut & ch
  Next
  StripSymbols = sOut
End Function

Open in new window

I also made some other changes to make the code more modular and handle all 4 selection cases. I also noticed that you're double-checking for a table using .HasTable and .Type = 19. This will fail if the table is in a placeholder so I suggest just using .HasTable.

Option Explicit

Sub TranslateKorToEng()
  Dim oSelShp As Shape
  Dim r As Long, c As Long, x As Long, y As Long
  Dim lGrpItem As Long
  
  Select Case ActiveWindow.Selection.Type
    Case ppSelectionText
      With ActiveWindow.Selection.TextRange
        .Text = GetGoogleTranslation(.Text)
      End With
    Case ppSelectionShapes
      For Each oSelShp In ActiveWindow.Selection.ShapeRange ' checks multiple shapes
        If oSelShp.Type = msoGroup Then 'Check Whether selected Shape is Group or not
          For lGrpItem = 1 To oSelShp.GroupItems.Count 'using for group through each text box in a group
            With oSelShp.GroupItems(lGrpItem)
              If .HasTextFrame Then 'check for textframe
                If .TextFrame.HasText Then ' check for textframe. text
                  With .TextFrame2
                    .WordWrap = msoCTrue ' word wrap the text box
                    .AutoSize = msoAutoSizeTextToFitShape   ' this 2 line for text fit to TextBox (Font size is changed)
                    With .TextRange
                      .Text = GetGoogleTranslation(.Text)
                    End With
                  End With
                End If
              End If
            End With
          Next
          
        '¿©±ä µµÇü, ÅØ½ºÆ® "óÀÚ Ã¼Å·
        ElseIf oSelShp.HasTextFrame Then 'for only shape
          If oSelShp.TextFrame.HasText Then 'and it has text
            With oSelShp.TextFrame2
              .WordWrap = msoCTrue 'Set WordWrap
              .AutoSize = msoAutoSizeTextToFitShape
              With .TextRange
                .Text = GetGoogleTranslation(.Text)
              End With
            End With
          End If
           
        '¿©±ä Å×ÀÌºí checking
        ElseIf oSelShp.HasTable Then
          With oSelShp
            r = 0
            For y = 1 To .Table.Columns.Count
              If r = 0 Then
                For x = 1 To .Table.Rows.Count
                  With .Table.Cell(x, y)
                    If .Selected Then 'check if the column.row is selected or not
                      If .Shape.HasTextFrame Then 'checks for text frame
                        With .Shape.TextFrame.TextRange
                          .Text = GetGoogleTranslation(.Text)
                        End With
                      End If
                    End If
                  End With
                Next
              Else
                GoTo Column
              End If
            Next
Column:
            c = 0
            For x = 1 To .Table.Rows.Count
              If c = 0 Then
                For y = 1 To .Table.Columns.Count
                  With .Table.Cell(x, y)
                    If .Selected Then 'ÀüÃ¼ Å×ÀÌºíÀ" ÇÑ ¼¿¾¿ °ËÅäÇÏ¸é ¼±ÅÃµÈ°Í¸¸ Ã³¸®ÇÑ´Ù
                      With .Shape.TextFrame.TextRange
                        .Text = GetGoogleTranslation(.Text)
                      End With
                    End If
                  End With
                Next
              End If
            Next
          End With ' oSelShp
        End If
      Next oSelShp
    Case Else
      MsgBox "Invalid selection", vbCritical + vbOKOnly, "Cannot Translate"
  End Select
End Sub

' Translate the src text from Korean to English using Google Translate
Function GetGoogleTranslation(src As String) As String
  Dim oHTTP As Object
  Dim URL As String
  Dim getParam As String, trans As String
  
  Const translateFrom = "ko"  ' source language parameter
  Const translateTo = "en"    ' result language parameter
  Const RegExpPat = "div[^""]*?""ltr"".*?>(.+?)</div>" ' pattern to find the translated text in the Google Translate HTML page
  
  Set oHTTP = CreateObject("MSXML2.ServerXMLHTTP")  ' create object(shape,textframe,table) using HTTP format
  
  ' replace special symbols and apply text to convert to get function
  getParam = ConvertToGet(src)
  
  ' Creates and modifies the URL of UTF-8 translator with getParam
  URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
  oHTTP.Open "GET", URL, False 'open the http object
  oHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" ' set browser header
  oHTTP.send ("") ' send command to browser
         
  If InStr(oHTTP.responseText, "div dir=""ltr""") > 0 Then
    ' Find the translated text from the returned HTML
    trans = RegexExecute(oHTTP.responseText, RegExpPat)
    ' Clean up the text
    trans = ReplaceEnter(trans)
    trans = Clean(trans)
    ' Return the translated text
    GetGoogleTranslation = trans
  End If
  
  Set oHTTP = Nothing
End Function

' SB Functions for ++ sign to Enter
' new line symbol while entering data from ppt
Function ReplaceEnter(txt As String)
  txt = Replace(txt, "++", vbCrLf) 'new line
  ReplaceEnter = txt
End Function

' Convert special characters to HTML-compliant hex codes prior to sending in web page query
Function ConvertToGet(val As String)
    val = Replace(val, " ", "+")
    'val = Replace(val, vbNewLine, "+")
    val = Replace(val, "(", "%28")
    val = Replace(val, ")", "%29")
    
    val = Replace(val, "&", "and")
    'val = Replace(val, "<", "[")
    'val = Replace(val, ">", "]")
    val = Replace(val, "<", "%3C") ' was previously being replaced by "["
    val = Replace(val, ">", "%3E") ' was previously being replaced by "]"
    
    ' two kinds of new line in ppt : shift enter and enter (replace with &H2B = Dec 43 = "+")
    val = Replace(val, Chr$(11), "+%2B%2B")  ' LineFeed vbLf Chr$(11)
    val = Replace(val, Chr$(13), "+%2B%2B")  ' LineFeed vbCrLf Chr$(13)
    
    ConvertToGet = val
End Function

' Original from http://analystcave.com/excel-google-translate-functionality/
'Function Clean(val As String)
'    val = Replace(val, "&quot;", """")
'    val = Replace(val, "%2C", ",")
'    val = Replace(val, "&#39;", "'")
'    Clean = val
'End Function

Function Clean(val As String) 'to clean symbol code to symbol
    val = Replace(val, """", """")
    val = Replace(val, "%2C", ",")
    val = Replace(val, "'", "'")
    val = Replace(val, "&lt;", "<")
    val = Replace(val, "&gt;", ">")
    Clean = val
End Function

' Find the plain text from div><br><div dir="ltr" class="t0">Hello</div>
' str = html content from the page returned by the Google translate query
' reg = regular expression search pattern string e.g. div[^"]*?"ltr".*?>(.+?)</div>
' match div
' 12 special metacharacters (escape with a backslash) \^$.|?*+()[{
' [..] = character class
' [^ = NOT
' ^ = position matches at the start of the string
' $ = position matches at the end of the string
' . = any character
' ? = optional preceding token
' * = attempt to match the preceding token zero or more times
' + =
' | = OR
' () = create a numbered capturing group
' [^"] = sequence, ^ negates, so anything that is not "
' *? = optional anything preceeding "ltr"
' "ltr" = match ltr
' . = match any character
' *? = optionally anything preceeding ">"
' (.+?) = create a numbered capturing group
' e.g. looking for this HTML part : <div dir="ltr" class="t0">Hello World</div>
Public Function RegexExecute(str As String, reg As String, _
                             Optional matchIndex As Long, _
                             Optional subMatchIndex As Long) As String
    Dim regex As Object
    Dim matches As Object
    Dim xlErrValue
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = reg
    regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
    If regex.Test(str) Then
        ' Return a collection of string matches from the source HTML based on the pattern
        ' e.g. matches(0) = div><br><div dir="ltr" class="t0">Hello World</div>
        Set matches = regex.Execute(str)
        ' Return the translated plain text from the match via the (.+?) part of the expression pattern
        ' e.g. matches(0).SubMatches(0) = Hello World
        RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
        Exit Function
    End If
ErrHandl:
    RegexExecute = CVErr(xlErrValue)
End Function

Open in new window

jocker panda

ASKER

Dear Jamie,

First of All, thanks for your prompt response and detailed comments for the code.
i had gone through and tested your code and below are my observations

1. I was not able to use strip symbol function properly i think .. i used it as below, correct me if im wrong
.Text = stripsymbols(GetGoogleTranslation(.Text))
The problem with above line is, it is removing all the text from the text box not only symbols.
Moreover, often i have to convert not only korean text but also english sentences along with the korean text like "<Hello 여보세요>.
How to do for such kind of sentences?

2. I was looking for something like below

Function HtmlToText(sHTML) As String
  Dim oDoc As HTMLDocument
  Set oDoc = New HTMLDocument
  oDoc.body.innerHTML = sHTML
  HtmlToText = oDoc.body.innerText
End Function

Open in new window

which is from this website [url=https://stackoverflow.com/questions/5327512/convert-html-to-plain-text-in-vba[/url]
can you make similar function for powerpoint ?
can i apply such function for my cause?

3. About the Modular translation code which you suggested, it is working fine with text boxes,shapes and grouped items but, the problem is with tables as it is taking lot of time to translate tables than my code.
Can you pls check and let me know if you are able to find solution for above issues.

4.Another issue i discovered is with maintaining lines as you know there are two kinds of new lines a)Enter b)Shift+Enter
My previous code is able to solve 'Enter' by replacing "++" with VbCrLf but i'm trying to separate with Shift+Enter
My solution is to use
val = Replace(val, " chr$(13)", "%2D%2D")
which will return after translation ", ,"
Then again i'm replacing that with val=Replace(val, ", " "Chr$(13)")

The problem with above replace function is, it is giving new line to words with commas in text format like "rat, bat, cat," to
rat
bat
cat

Can you help me on this too?

Thanks in advance.

Jamie Garroch (MVP)

Ah. OK. You have a mix of English and Korean. There are two ways to do this. Note that the code example you provided above does not convert HTML characters but merely returns the text from within an HTML element.

The two methods you could user are (a) filter characters you don't want Google to translate or (b) convert them to HTML codes prior to passing them for translation.

As you have a mix of English, Korean and numerous punctuation and other symbols, there is no shortcut way of achieving what you want without examining every character, one by one. I think you need to revert to the previous method where you transform the "illegal" symbols prior to passing to Google. Here is a new function to do that:

' ====================================================================
' Function to create an HTML-safe string from a given string of text.
' Author  : Jamie Garroch of http://youpresent.co.uk/
' Inputs  : sText is a plain text string which may contain English,
'           Korean, punctuation, line feed and other symbols
' Outputs : returns an HTML-safe string where any non-alphanumeric
'           characters (in English or Korean) are replaced with their
'           HTML-formatted hexadecimal equivalents.
' Example :
'           input = <Hello World!>
'           output = %3CHello%20World%21%3E
' ====================================================================
Function MakeSafeForHTML(sText As String) As String
  Dim newText As String
  Dim chCode As Long
  Dim x As Long
  Dim bKeepChar As Boolean
  Const KoreanStart = &H3131& ' Dec 12593
  Const KoreanEnd = &HCB4C& ' Dec 52044
  For x = 1 To Len(sText)
    ' Convert integer value of the character to the signed character code
    chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
    ' Check the character and set a flag if it's ok to keep it as it is
    Select Case chCode
      Case KoreanStart To KoreanEnd: bKeepChar = True
      Case Asc("a") To Asc("z"): bKeepChar = True
      Case Asc("A") To Asc("Z"): bKeepChar = True
      Case Asc("0") To Asc("9"): bKeepChar = True
      Case Else: bKeepChar = False
    End Select
    ' Add the character to a new string or convert it to its hexadecimal equivalent
    If bKeepChar Then
      newText = newText & Mid(sText, x, 1)
    Else
      newText = newText & "%" & Hex(chCode)
    End If
  Next
  ' Return the HTML-safe string
  MakeSafeForHTML = newText
End Function

Open in new window

And it's called like this, using the first instance in your code:

Sub TranslateKorToEng()
  Dim oSelShp As Shape
  Dim r As Long, c As Long, x As Long, y As Long
  Dim lGrpItem As Long
  
  Select Case ActiveWindow.Selection.Type
    Case ppSelectionText
      With ActiveWindow.Selection.TextRange
        .Text = GetGoogleTranslation(MakeSafeForHTML(.Text))
      End With
    Case ppSelectionShapes
    ' rest of procedure...

Open in new window

I tested it with the following selected text on a slide:

hello <여보세요>

And the text was changed to this:

hello <Hello>

I'm not sure what you're trying to achieve with tables. Are you looking to translate only selected cells or all sells of a selected table?

I'll look at the line feed bit next.

jocker panda

ASKER

i'm trying to do both
1. to translate complete table when i select the table
2. to translate only text in a cell when i select only text.

but for some weird reason when i press ctrl+A the table is getting selected but it is not getting translated with my code .

jocker panda

ASKER

jamie, i think the problem with your function 'MakesafeForHTML' Is.. it is converting symbols before they are passed on to translator
i think we should decode the symbols which are generated in html to normal text.

jocker panda

ASKER

and about the new line (shift+enter) , i think i'm able to solve that with

val = Replace(val, Chr$(11), "%2D%2D") using this replace with text before translation

val = Replace(val, "- ", Chr$(11)) using this replace with text after translation.

Jamie Garroch (MVP)

This version addresses the table comment. It still needs some work with HTML character transcoding but I think this is looking pretty good now.

Option Explicit

Private oHTTP As Object

Sub TranslateKorToEng()
  Dim oSelShp As Shape, oGrpItem As Shape
  Dim lRow As Long, lCol As Long, bWholeTable As Boolean
  
  Set oHTTP = CreateObject("MSXML2.ServerXMLHTTP")  ' create object(shape,textframe,table) using HTTP format
    
  Select Case ActiveWindow.Selection.Type
    Case ppSelectionText
      ' Translate seletced text within a shape or text within a single selected cell of a table
      With ActiveWindow.Selection.TextRange
        .Text = GetGoogleTranslation(.Text)
      End With
    Case ppSelectionShapes
      For Each oSelShp In ActiveWindow.Selection.ShapeRange ' checks multiple shapes
        If oSelShp.Type = msoGroup Then ' Check whether selected Shape is Group or not
          For Each oGrpItem In oSelShp.GroupItems ' using for group through each text box in a group
            With oGrpItem
              If .HasTextFrame Then 'check for textframe
                If .TextFrame.HasText Then ' check for textframe. text
                  With .TextFrame2
                    .WordWrap = msoCTrue ' word wrap the text box
                    .AutoSize = msoAutoSizeTextToFitShape   ' this 2 line for text fit to TextBox (Font size is changed)
                    With .TextRange
                      .Text = GetGoogleTranslation(.Text)
                    End With
                  End With
                End If
              End If
            End With
          Next
          
        '¿©±ä µµÇü, ÅØ½ºÆ® "óÀÚ Ã¼Å·
        ElseIf oSelShp.HasTextFrame Then 'for only shape
          If oSelShp.TextFrame.HasText Then 'and it has text
            With oSelShp.TextFrame2
              .WordWrap = msoCTrue 'Set WordWrap
              .AutoSize = msoAutoSizeTextToFitShape
              With .TextRange
                .Text = GetGoogleTranslation(.Text)
              End With
            End With
          End If
           
        '¿©±ä Å×ÀÌºí checking
        ElseIf oSelShp.HasTable Then
          ' If the full table is selected (or all cells), set a flag to translate all cells containing text else only translate those cells that are selected
          If WholeTableIsSelected(oSelShp) Then bWholeTable = True
          With oSelShp.Table
            For lCol = 1 To .Columns.Count
              For lRow = 1 To .Rows.Count
                With .Cell(lRow, lCol)
                  If bWholeTable Then
                    With .Shape.TextFrame
                      If .HasText Then ' if the cell contains any text, translate it
                        With .TextRange
                          .Text = GetGoogleTranslation(.Text)
                        End With
                      End If
                    End With
                  ElseIf .Selected Then ' check if the cell is selected or not
                    With .Shape.TextFrame
                      If .HasText Then ' if the cell contains any text, translate it
                        With .TextRange
                          .Text = GetGoogleTranslation(.Text)
                        End With
                      End If
                    End With
                  End If
                End With ' .Cell
              Next
            Next
          End With ' oSelShp.Table
        End If
      Next oSelShp
    Case Else
      MsgBox "Invalid selection", vbCritical + vbOKOnly, "Cannot Perform Translation"
  End Select
  
  ' Clean up
  Set oHTTP = Nothing
End Sub

' Detect if the full table is selected as opposed to one or more cells within it
' If the full table is selected, all cells are selected (but not highlighted)
' If some [highlighted] cells, but not all are selected, this is detectable
Function WholeTableIsSelected(oShp As Shape) As Boolean
  Dim lRow As Long, lCol As Long
  If oShp.HasTable Then
    With oShp.Table
      For lRow = 1 To .Rows.Count
        For lCol = 1 To .Columns.Count
          If .Cell(lRow, lCol).Selected = False Then Exit Function
        Next
      Next
    End With
  End If
  ' If we got here, it means that no cells are not selected so the full table must be selected,
  ' either as the full table object or as a selection of all the cells.
  WholeTableIsSelected = True
End Function


' Translate the src text from Korean to English using Google Translate
Function GetGoogleTranslation(src As String) As String
  Dim URL As String
  Dim getParam As String, trans As String
  
  Const translateFrom = "ko"  ' source language parameter
  Const translateTo = "en"    ' result language parameter
  Const RegExpPat = "div[^""]*?""ltr"".*?>(.+?)</div>" ' pattern to find the translated text in the Google Translate HTML page
  
  ' replace special symbols and apply text to convert to get function
  getParam = MakeSafeForHTML(src)
  'getParam = ConvertToGet(src)
  
  ' Creates and modifies the URL of UTF-8 translator with getParam
  URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
  oHTTP.Open "GET", URL, False 'open the http object
  oHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" ' set browser header
  oHTTP.send ("") ' send command to browser
         
  If InStr(oHTTP.responseText, "div dir=""ltr""") > 0 Then
    ' Find the translated text from the returned HTML
    trans = RegexExecute(oHTTP.responseText, RegExpPat)
    ' Clean up the text
    trans = ReplaceEnter(trans)
    trans = Clean(trans)
    ' Return the translated text
    GetGoogleTranslation = trans
  End If
  
End Function

' SB Functions for ++ sign to Enter
' new line symbol while entering data from ppt
Function ReplaceEnter(txt As String)
  txt = Replace(txt, " ++ ", vbCrLf)        ' Enter
  txt = Replace(txt, " + ", vbVerticalTab)  ' Shift+Enter or Enter in Title placeholder
  ReplaceEnter = txt
End Function

' Original from http://analystcave.com/excel-google-translate-functionality/
'Function Clean(val As String)
'    val = Replace(val, "&quot;", """")
'    val = Replace(val, "%2C", ",")
'    val = Replace(val, "&#39;", "'")
'    Clean = val
'End Function

Function Clean(val As String) 'to clean symbol code to symbol
    val = Replace(val, """", """")
    val = Replace(val, "%2C", ",")
    val = Replace(val, "'", "'")
    val = Replace(val, "&lt;", "<")
    val = Replace(val, "&gt;", ">")
    Clean = val
End Function

' Find the plain text from div><br><div dir="ltr" class="t0">Hello</div>
' str = html content from the page returned by the Google translate query
' reg = regular expression search pattern string e.g. div[^"]*?"ltr".*?>(.+?)</div>
' match div
' 12 special metacharacters (escape with a backslash) \^$.|?*+()[{
' [..] = character class
' [^ = NOT
' ^ = position matches at the start of the string
' $ = position matches at the end of the string
' . = any character
' ? = optional preceding token
' * = attempt to match the preceding token zero or more times
' + =
' | = OR
' () = create a numbered capturing group
' [^"] = sequence, ^ negates, so anything that is not "
' *? = optional anything preceeding "ltr"
' "ltr" = match ltr
' . = match any character
' *? = optionally anything preceeding ">"
' (.+?) = create a numbered capturing group
' e.g. looking for this HTML part : <div dir="ltr" class="t0">Hello World</div>
Public Function RegexExecute(str As String, reg As String, _
                             Optional matchIndex As Long, _
                             Optional subMatchIndex As Long) As String
    Dim regex As Object
    Dim matches As Object
    Dim xlErrValue
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = reg
    regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
    If regex.Test(str) Then
        ' Return a collection of string matches from the source HTML based on the pattern
        ' e.g. matches(0) = div><br><div dir="ltr" class="t0">Hello World</div>
        Set matches = regex.Execute(str)
        ' Return the translated plain text from the match via the (.+?) part of the expression pattern
        ' e.g. matches(0).SubMatches(0) = Hello World
        RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
        Exit Function
    End If
ErrHandl:
    RegexExecute = CVErr(xlErrValue)
End Function

' ====================================================================
' Function to create an HTML-safe string from a given string of text.
' Author  : Jamie Garroch of http://youpresent.co.uk/
' Inputs  : sText is a plain text string which may contain English,
'           Korean, punctuation, line feed and other symbols
' Outputs : returns an HTML-safe string where any non-alphanumeric
'           characters (in English or Korean) are replaced with their
'           HTML-formatted hexadecimal equivalents.
' Example :
'           input = <Hello World!>
'           output = %3CHello%20World%21%3E
' ====================================================================
Function MakeSafeForHTML(sText As String) As String
  Dim newText As String, newChar As String
  Dim chCode As Long
  Dim x As Long
  Dim bKeepChar As Boolean
  Const KoreanStart = &H3131& ' Dec 12593
  Const KoreanEnd = &HCB4C& ' Dec 52044
    
  For x = 1 To Len(sText)
    ' Convert integer value of the character to the signed character code
    chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
    ' Check the character and set a flag if it's ok to keep it as it is
    Select Case chCode
      Case KoreanStart To KoreanEnd: newChar = Mid(sText, x, 1)
      Case Asc("a") To Asc("z"): newChar = Mid(sText, x, 1)
      Case Asc("A") To Asc("Z"): newChar = Mid(sText, x, 1)
      Case Asc("0") To Asc("9"): newChar = Mid(sText, x, 1)
      Case Asc("&"): newChar = "and"
      ' Replace line feed characters with plain text '+' or '++' so they can be restored after translation
      Case 11: newChar = "%2B"      ' Vertical Tab (vbVerticalTab : Shift+Enter or Enter in title placeholders)
      Case 13: newChar = "%2B%2B"   ' Carriage Return (vbCrLf : Enter)
    End Select
    ' Add the character to a new string or convert it to its hexadecimal equivalent
    If newChar = "" Then newText = newText & "%" & Hex(chCode) Else newText = newText & newChar
    newChar = ""
  Next
  ' Return the HTML-safe string
  MakeSafeForHTML = newText
End Function

Open in new window

jocker panda

ASKER

can you look into HTML transcoding
thank you :)

Jamie Garroch (MVP)

I updated the URL and HTML transcoding pre and post processing functions as follows:

' Reverse the transcoding for PowerPoint line feed characters
Function ReplaceEnter(txt As String)
  txt = Replace(txt, " ++ ", vbCrLf)        ' Enter
  txt = Replace(txt, " + ", vbVerticalTab)  ' Shift+Enter or Enter in Title placeholder
  ReplaceEnter = txt
End Function

' Clean HTML codes returned from Google in the translated string
Function Clean(sText As String)
   ' HTML predefined character entity references
   sText = Replace(sText, "&quot;", """")
   sText = Replace(sText, "&apos;", "'")
   sText = Replace(sText, "&lt;", "<")
   sText = Replace(sText, "&gt;", ">")
   sText = Replace(sText, "&amp;", "&")
  
   ' HTML Numerical (decimal)
   sText = Replace(sText, "&#39;", "'")
   sText = Replace(sText, "&#44;", ",")
   
   Clean = sText
End Function

' ====================================================================
' Function to create an HTML-safe string from a given string of text.
' Author  : Jamie Garroch of http://youpresent.co.uk/
' Inputs  : sText is a plain text string which may contain English,
'           Korean, punctuation, line feed and other symbols
' Outputs : returns an HTML-safe string where any non-alphanumeric
'           characters (in English or Korean) are replaced with their
'           HTML-formatted hexadecimal equivalents.
' Example :
'           input = <Hello World!>
'           output = %3CHello%20World%21%3E
' ====================================================================
Function MakeSafeForHTML(sText As String) As String
  Dim newText As String, newChar As String
  Dim chCode As Long
  Dim x As Long
  Dim bKeepChar As Boolean
  Const KoreanStart = &H3131& ' Dec 12593
  Const KoreanEnd = &HCB4C& ' Dec 52044
    
  For x = 1 To Len(sText)
    ' If AscW returns negative then convert integer value of the UTF-8 character to a signed character code
    If AscW(Mid(sText, x, 1)) < 0 Then
      chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
    Else
      chCode = Asc(Mid(sText, x, 1))
    End If
    ' Check the character and set a flag if it's ok to keep it as it is
    Select Case chCode
      Case KoreanStart To KoreanEnd: newChar = Mid(sText, x, 1)
      Case Asc(" "): newChar = "%20"
      Case Asc("a") To Asc("z"): newChar = Mid(sText, x, 1)
      Case Asc("A") To Asc("Z"): newChar = Mid(sText, x, 1)
      Case Asc("0") To Asc("9"): newChar = Mid(sText, x, 1)
      Case Asc("&"): newChar = "%26" ' &amp; in HTML, dec 38
      Case Asc("‘"): newChar = "%27" ' opening single quote char 145
      Case Asc("’"): newChar = "%27" ' closing single quote char 146
      Case Asc("“"): newChar = "%22" ' opening double quote char 147
      Case Asc("”"): newChar = "%22" ' closing double quote char 148
      ' Replace line feed characters with plain text '+' or '++' so they can be restored after translation
      Case 11: newChar = "%2B"       ' Transpose to "+" for Vertical Tab (vbVerticalTab : Shift+Enter or Enter in title placeholders)
      Case 13: newChar = "%2B%2B"    ' Transpose to "++" for Carriage Return (vbCrLf : Enter)
    End Select
    
    ' Add the character to a new string or convert it to its hexadecimal equivalent
    If newChar = "" Then
      newText = newText & "%" & Hex(chCode)
    Else
      newText = newText & newChar
    End If
    
    newChar = ""
  Next
  
  If DebugFlag Then Debug.Print "Sending: " & newText
  
  ' Return the HTML-safe string
  MakeSafeForHTML = newText
End Function

Open in new window

If you need any further help, please consider starting a Gig as this question has become very involved now.

jocker panda

ASKER

Hi jamie,
thanks a lot for your help .
but i'm still unable to use your function efficiently.

if i translate below lines which are in korean

▷ 김허남(1998), 바람직한 한국 : 홍익인간 이념 실천 방안, 서울 : 삼보사▷ 김현선(1999), 비교신화학적 관점에서 본 홍익인간의 이념▷ 권성아(1998), 홍익인간사상과 민족통일교육, 강원대 박사학위논문▷ 권성아·신창호(1999), 홍익인간사상과 통일교육, 서울 : 집문당▷ 이석호(2001), 인간의 이해 - 철학적 인간학 입문, 철학과 현실사▷ 정영훈(1999), 홍익인간 이념의 유래와 현대적 의의, 한국정신문화연구원, 홍익인간 이념 연구▷ 정영훈 외(1999), 홍익인간 이념 연구, 성남 : 정신문화연구원

i'm getting the translated English text with lot of codes likes 9C,4D,+% B7 as below

++ B7 Kim C8 M (1998), 5C 5C country: 4D Human resource idea 9C plan, Seoul: Sambo companyB7 Kim 04 line (1999) From the standpoint of viewpoint, 4D is the ideology of the human being +% B7 Kwon Sung-ah (1998), 4D Iwo-Man thought and national B5 day education, Dr. Gangwon University, 59th papers +% B7 Kwon Sung- ), 4D Iwo-Man Thought and? B5day Education, Seoul: Gwangmun-dong +% B7 Lee, 38 (2001)4% B7 Presided by C8 (1999), 4D is the origin of the human ideology, 4 is the significance of the opposition, 5C is the national newspaper, 54 is the researcher, A Study on Human Ideology, Seongnam: Jung Shin-hyuk 54 Researcher.

In addition to that i think, stripping each korean word and passing it to google translator will change the meaning of the word and returning different meanings as a result.

I'm attaching a file which will help you to test.

By the way, what is a gig ?
Presentation1.pptx

Jamie Garroch (MVP)

Thanks for the additional info. FYI, a Gig is a project rather than a question and is paid for. They are used when questions become more than just a question, as is the case here because there is a lot of research and testing to be done to create a fully working code rather than just correct one or two lines of code. You can learn more here https://www.experts-exchange.com/gigs/

Jamie Garroch (MVP)

Some of the above I can fix (illegal URL characters in the ASCII code space) but others I have not understood yet. For example, if I take the first character ▷ this is Unicode character &H25B7 but if you paste that into a Google Translate web page and translate from Korean, you can see from the URL that is created that it transposes to a URL-safe format of %E2%96%B7

Until I understand this, I cannot progress any further.

jocker panda

ASKER

ok, thanks a lot for your effort

before we close this question i have 2 more requests
1. how to maintain indentation with bullet points after translation at different levels like below

a) hello
b)hello

presently, i'm using textrange.paragraphs.text in the text selection but it seems its not working with second line bullet point onwards
and returning from translator like
a) hello
b) hello

2. Is it possible to share code to translate objects,text boxes,tables of selected slide/presentation like your in the ppam file which you have shared with me.

ASKER CERTIFIED SOLUTION

Jamie Garroch (MVP)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

jocker panda

ASKER

thanks a lot jimie
i will apply your code to translator.

have a nice day.

Jamie Garroch (MVP)

FYI, I reduced the number of errors in your example Korean tex above by change the code space constant in MakeSafeForHTML:

Const KoreanEnd = &HD5C8&

Open in new window

Do you know what the correct lower and upper limits are for all Korean characters? I don't think this part of the transcoder is 100% correct yet and needs more UTF-8 knowledge than I have.

Jamie Garroch (MVP)

I think I'm making progress. Instead of looking for KoreanStart and KoreanEnd, I think I can just make the transcoding decision based on the number of bytes used to encode the UTF-8 character as all Hangul and CJK characters have 4 bytes:

Jamie Garroch (MVP)

After a LOT of reading and experimentation, this is real output from the modified code with your original Korean text as the input above:

▷ Kim, HJ (1998), A desirable Korea: Hongik Human Ideology Practice, Seoul: Sambo
▷ Kim, Hyun-seon (1999), The Idea of Humanism
▷ Kwon Sung-ah (1998), Hongik Human Thought and National Unification Education, Doctoral Thesis of Kangwon National University
▷ Kwon Sung-ah, Shin Chang-ho (1999), Hongik Human Thought and Unification Education, Seoul:
▷ Lee Suk Ho (2001), Human Understanding - Introduction to Philosophical Anthropology, Philosophy and the Reality
▷ Jung Young-hoon (1999), The Origin of Hongik Human Ideology and Contemporary Implications, Korea Mental Culture Institute, Hongik Human Ideology Research
▷ Jung Young-Hoon et al. (1999), Hongik Human Ideology Research, Seongnam:

How does that look?

jocker panda

ASKER

wow !
how do you do it?

Jamie Garroch (MVP)

Here's the latest MakeSafeForHTML (I renamed the function "EncodeForURL" since what it is actually doing is making a percent-encoding as per RFC3986:

' ====================================================================
' Function to create a URL string from a given string of text.
' Author  : Jamie Garroch of http://youpresent.co.uk/
' Inputs  : sText is a plain text string which may contain English,
'           Korean, punctuation, line feed and other symbols
' Outputs : returns a URL-safe string where any non-alphanumeric
'           characters (in English or Korean) are replaced with their
'           URL-safe hexadecimal equivalents.
' Example :
'           input = <Hello World!>
'           output = %3CHello%20World%21%3E
' ====================================================================
Function EncodeForURL(sText As String) As String
  Dim newText As String, newChar As String
  Dim chCode As Long
  Dim x As Long
  Dim bKeepChar As Boolean
  Dim lCount As Long
  
  ' Replace PowerPoint line feed characters with special plain text sequences so they can be restored after translation
  'sText = Replace(sText, Chr(13), sCR)
  'sText = Replace(sText, Chr(11), sVT)

  ' URL encoding references
  ' https://www.w3schools.com/tags/ref_urlencode.asp
  ' https://www.tutorialspoint.com/html/html_url_encoding.htm
  ' RFC3986 https://tools.ietf.org/html/rfc3986#section-2.1
  For x = 1 To Len(sText)
    ' Convert integer value of the UTF-8 character to a signed Long (32-bit) character code
    chCode = AscW(Mid(sText, x, 1)) And &HFFFF&

    ' https://en.wikipedia.org/wiki/UTF-8
    ' UTF-8 uses 1 to 4 bytes to represent each character
    Select Case chCode
      Case 8: Stop
      ' Character is 4-bytes and hence in the CJK code space area so leave it unchanged
      Case Is > &HFFF&: newChar = ChrW(chCode)
      'Case Is > &HFFF&: newChar = UnicodeToUTF(chCode)
      
      'Case 11, 13: newChar = "%0A" ' LF = BEST RESULT SO FAR (keeps bullets but no CR)
      Case 11, 13: newChar = "%0A%25%25%0A"
      'Case 11, 13: newChar = "%0B" ' VT
      'Case 11, 13: newChar = "%0D" ' CR
      'Case 11, 13: newChar = Chr(chCode)
    
      ' Unreserved characters in the ASCII code space can be sent in plain text
      ' A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
      ' a b c d e f g h i j k l m n o p q r s t u v w x y z
      ' 0 1 2 3 4 5 6 7 8 9 - _ . ~
      Case 48 To 57: newChar = Chr(chCode)  ' 0 to 9
      Case 65 To 90: newChar = Chr(chCode)  ' A to Z
      Case 97 To 122: newChar = Chr(chCode) ' a to z
      Case 45: newChar = Chr(chCode)        ' - (hyphen)
      Case 95: newChar = Chr(chCode)        ' _ (underscore)
      Case 46: newChar = Chr(chCode)        ' . (full stop)
      Case 126: newChar = Chr(chCode)       ' ~ (approximate)
      
      ' URI escape character
      Case 37: newChar = "%25"  ' %
      
      ' Reserved Characters Encoding
      ' ! * ' ( ) ; : @ & = + $ , / ? % # [ ]
      Case 36: newChar = "%24"  ' $
      Case 38: newChar = "%26"  ' &
      Case 43: newChar = "%2b"  ' +
      Case 44: newChar = "%2c"  ' ,
      Case 47: newChar = "%2f"  ' /
      Case 58: newChar = "%3a"  ' :
      Case 59: newChar = "%3b"  ' ;
      Case 61: newChar = "%3d"  ' =
      Case 63: newChar = "%3f"  ' ?
      Case 64: newChar = "%40"  ' @

      ' Unsafe Characters Encoding
      'Case &H20& To &H7F&: newChar = "%" & CStr(Hex(chCode))
      Case 32: newChar = "%20"  ' space
      Case 34: newChar = "%22"  ' "
      Case 60: newChar = "%3c"  ' <
      Case 62: newChar = "%3e"  ' >
      Case 35: newChar = "%23"  ' #
      Case 37: newChar = "%25"  ' %
      Case 123: newChar = "%7b" ' {
      Case 125: newChar = "%7d" ' }
      Case 124: newChar = "%7c" ' |
      Case 92: newChar = "%5c"  ' \
      Case 94: newChar = "%5e"  ' ^
      Case 126: newChar = "%7e" ' ~
      Case 91: newChar = "%5b"  ' [
      Case 93: newChar = "%5d"  ' ]
      Case 96: newChar = "%60"  ' `
      
      ' ASCII Control Characters Encoding
      ' 00-&H1F hex (0-31 decimal) and &H7F (127 decimal)
      Case 0 To 31: newChar = "%" & CStr(Hex(chCode)) ' 32 x Control Codes
      Case 127: newChar = "%" & CStr(Hex(chCode))     ' DEL
      
      ' Non-ASCII control characters encoding
      ' (encoding for the entire "top half" of the ISO-Latin set &H80 - &HFF)
      Case 128 To 255: newChar = "%" & CStr(Hex(chCode))
      
      Case Else: newChar = Chr(chCode)
      'Case Else: newChar = "%" & CStr(Hex(chCode))
    End Select
    
    ' Add the plain/encoded character to the new string
    newText = newText & newChar
    
    newChar = ""
  Next
  
  'newText = Replace(newText, Chr(13), "%20ENTER%20") '"%5bCR%5d"
  'newText = Replace(newText, Chr(11), "%20ENTER%20") '"%5bVT%5d"
  
  If DebugFlag Then Debug.Print "Sending: " & newText
  
  ' Return the URL-safe string
  EncodeForURL = newText
End Function

Open in new window

And here is the corresponding post-processor function to transpose the linefeed charatcers:

' Reverse the transcoding for PowerPoint line feed characters
Function ReplaceEnter(txt As String)
  ' Note: Google Translate is returning additional spaces after escaped characters
  txt = Replace(txt, "%%" & " ", vbCrLf)          ' Enter
  txt = Replace(txt, "%%" & " ", vbVerticalTab)   ' Shift+Enter or Enter in Title placeholder
  ReplaceEnter = txt
End Function

Open in new window