jocker panda
asked on
Html encoding problem in powerpoint
Today i'm facing a problem with this code as its returns html values to symbols like <,> etc, which are shown as < & gt; etc.,
After googling, i came to know that its a html to text encoding problem
can you help me what function and where to implement that code in my code.
Thanks a lot in advance.
After googling, i came to know that its a html to text encoding problem
can you help me what function and where to implement that code in my code.
Thanks a lot in advance.
ASKER
Yes, you are right..
Because i added that replace function..
But as you know as there are many symbols like ¬~¢¥¿[{<> etc which i cannot replace using replace function for each one of them before translation.
So, i request you to suggest a way to replace the html text with equivalent powerpoint text.
Because i added that replace function..
But as you know as there are many symbols like ¬~¢¥¿[{<> etc which i cannot replace using replace function for each one of them before translation.
So, i request you to suggest a way to replace the html text with equivalent powerpoint text.
You either have to manage al possible symbol conversions (swap them before and after translation) or prevent those symbols from being passed to Google Translate in the first place. What you could do is write a pre-processing function to extract everything that is 'real' Korean text and prevent everything else, such as special symbols, from being passed to Google. Something like this perhaps:
I also made some other changes to make the code more modular and handle all 4 selection cases. I also noticed that you're double-checking for a table using .HasTable and .Type = 19. This will fail if the table is in a placeholder so I suggest just using .HasTable.
' Return string with only Korean characters
Function StripSymbols(sText As String) As String
Dim x As Long
Dim ch As String
Dim chCode As Long
Dim sOut As String
' Code space range http://memory.loc.gov/diglib/codetables/9.3.html
Const KoreanStart = &H3131& ' Dec 12593
Const KoreanEnd = &HCB4C& ' Dec 52044
' Recurse all characters in the string
For x = 1 To Len(sText)
' get the next character
ch = Mid(sText, x, 1)
' Convert integer value of the character to the signed character code
chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
If chCode >= KoreanStart And chCode <= KoreanEnd Then sOut = sOut & ch
Next
StripSymbols = sOut
End Function
I also made some other changes to make the code more modular and handle all 4 selection cases. I also noticed that you're double-checking for a table using .HasTable and .Type = 19. This will fail if the table is in a placeholder so I suggest just using .HasTable.
Option Explicit
Sub TranslateKorToEng()
Dim oSelShp As Shape
Dim r As Long, c As Long, x As Long, y As Long
Dim lGrpItem As Long
Select Case ActiveWindow.Selection.Type
Case ppSelectionText
With ActiveWindow.Selection.TextRange
.Text = GetGoogleTranslation(.Text)
End With
Case ppSelectionShapes
For Each oSelShp In ActiveWindow.Selection.ShapeRange ' checks multiple shapes
If oSelShp.Type = msoGroup Then 'Check Whether selected Shape is Group or not
For lGrpItem = 1 To oSelShp.GroupItems.Count 'using for group through each text box in a group
With oSelShp.GroupItems(lGrpItem)
If .HasTextFrame Then 'check for textframe
If .TextFrame.HasText Then ' check for textframe. text
With .TextFrame2
.WordWrap = msoCTrue ' word wrap the text box
.AutoSize = msoAutoSizeTextToFitShape ' this 2 line for text fit to TextBox (Font size is changed)
With .TextRange
.Text = GetGoogleTranslation(.Text)
End With
End With
End If
End If
End With
Next
'¿©±ä µµÇü, ÅؽºÆ® "óÀÚ Ã¼Å·
ElseIf oSelShp.HasTextFrame Then 'for only shape
If oSelShp.TextFrame.HasText Then 'and it has text
With oSelShp.TextFrame2
.WordWrap = msoCTrue 'Set WordWrap
.AutoSize = msoAutoSizeTextToFitShape
With .TextRange
.Text = GetGoogleTranslation(.Text)
End With
End With
End If
'¿©±ä Å×À̺í checking
ElseIf oSelShp.HasTable Then
With oSelShp
r = 0
For y = 1 To .Table.Columns.Count
If r = 0 Then
For x = 1 To .Table.Rows.Count
With .Table.Cell(x, y)
If .Selected Then 'check if the column.row is selected or not
If .Shape.HasTextFrame Then 'checks for text frame
With .Shape.TextFrame.TextRange
.Text = GetGoogleTranslation(.Text)
End With
End If
End If
End With
Next
Else
GoTo Column
End If
Next
Column:
c = 0
For x = 1 To .Table.Rows.Count
If c = 0 Then
For y = 1 To .Table.Columns.Count
With .Table.Cell(x, y)
If .Selected Then 'Àüü Å×À̺íÀ" ÇÑ ¼¿¾¿ °ËÅäÇÏ¸é ¼±ÅõȰ͸¸ ó¸®ÇÑ´Ù
With .Shape.TextFrame.TextRange
.Text = GetGoogleTranslation(.Text)
End With
End If
End With
Next
End If
Next
End With ' oSelShp
End If
Next oSelShp
Case Else
MsgBox "Invalid selection", vbCritical + vbOKOnly, "Cannot Translate"
End Select
End Sub
' Translate the src text from Korean to English using Google Translate
Function GetGoogleTranslation(src As String) As String
Dim oHTTP As Object
Dim URL As String
Dim getParam As String, trans As String
Const translateFrom = "ko" ' source language parameter
Const translateTo = "en" ' result language parameter
Const RegExpPat = "div[^""]*?""ltr"".*?>(.+?)</div>" ' pattern to find the translated text in the Google Translate HTML page
Set oHTTP = CreateObject("MSXML2.ServerXMLHTTP") ' create object(shape,textframe,table) using HTTP format
' replace special symbols and apply text to convert to get function
getParam = ConvertToGet(src)
' Creates and modifies the URL of UTF-8 translator with getParam
URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
oHTTP.Open "GET", URL, False 'open the http object
oHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" ' set browser header
oHTTP.send ("") ' send command to browser
If InStr(oHTTP.responseText, "div dir=""ltr""") > 0 Then
' Find the translated text from the returned HTML
trans = RegexExecute(oHTTP.responseText, RegExpPat)
' Clean up the text
trans = ReplaceEnter(trans)
trans = Clean(trans)
' Return the translated text
GetGoogleTranslation = trans
End If
Set oHTTP = Nothing
End Function
' SB Functions for ++ sign to Enter
' new line symbol while entering data from ppt
Function ReplaceEnter(txt As String)
txt = Replace(txt, "++", vbCrLf) 'new line
ReplaceEnter = txt
End Function
' Convert special characters to HTML-compliant hex codes prior to sending in web page query
Function ConvertToGet(val As String)
val = Replace(val, " ", "+")
'val = Replace(val, vbNewLine, "+")
val = Replace(val, "(", "%28")
val = Replace(val, ")", "%29")
val = Replace(val, "&", "and")
'val = Replace(val, "<", "[")
'val = Replace(val, ">", "]")
val = Replace(val, "<", "%3C") ' was previously being replaced by "["
val = Replace(val, ">", "%3E") ' was previously being replaced by "]"
' two kinds of new line in ppt : shift enter and enter (replace with &H2B = Dec 43 = "+")
val = Replace(val, Chr$(11), "+%2B%2B") ' LineFeed vbLf Chr$(11)
val = Replace(val, Chr$(13), "+%2B%2B") ' LineFeed vbCrLf Chr$(13)
ConvertToGet = val
End Function
' Original from http://analystcave.com/excel-google-translate-functionality/
'Function Clean(val As String)
' val = Replace(val, """, """")
' val = Replace(val, "%2C", ",")
' val = Replace(val, "'", "'")
' Clean = val
'End Function
Function Clean(val As String) 'to clean symbol code to symbol
val = Replace(val, """", """")
val = Replace(val, "%2C", ",")
val = Replace(val, "'", "'")
val = Replace(val, "<", "<")
val = Replace(val, ">", ">")
Clean = val
End Function
' Find the plain text from div><br><div dir="ltr" class="t0">Hello</div>
' str = html content from the page returned by the Google translate query
' reg = regular expression search pattern string e.g. div[^"]*?"ltr".*?>(.+?)</div>
' match div
' 12 special metacharacters (escape with a backslash) \^$.|?*+()[{
' [..] = character class
' [^ = NOT
' ^ = position matches at the start of the string
' $ = position matches at the end of the string
' . = any character
' ? = optional preceding token
' * = attempt to match the preceding token zero or more times
' + =
' | = OR
' () = create a numbered capturing group
' [^"] = sequence, ^ negates, so anything that is not "
' *? = optional anything preceeding "ltr"
' "ltr" = match ltr
' . = match any character
' *? = optionally anything preceeding ">"
' (.+?) = create a numbered capturing group
' e.g. looking for this HTML part : <div dir="ltr" class="t0">Hello World</div>
Public Function RegexExecute(str As String, reg As String, _
Optional matchIndex As Long, _
Optional subMatchIndex As Long) As String
Dim regex As Object
Dim matches As Object
Dim xlErrValue
On Error GoTo ErrHandl
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = reg
regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
If regex.Test(str) Then
' Return a collection of string matches from the source HTML based on the pattern
' e.g. matches(0) = div><br><div dir="ltr" class="t0">Hello World</div>
Set matches = regex.Execute(str)
' Return the translated plain text from the match via the (.+?) part of the expression pattern
' e.g. matches(0).SubMatches(0) = Hello World
RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
Exit Function
End If
ErrHandl:
RegexExecute = CVErr(xlErrValue)
End Function
ASKER
Dear Jamie,
First of All, thanks for your prompt response and detailed comments for the code.
i had gone through and tested your code and below are my observations
1. I was not able to use strip symbol function properly i think .. i used it as below, correct me if im wrong
.Text = stripsymbols(GetGoogleTran slation(.T ext))
The problem with above line is, it is removing all the text from the text box not only symbols.
Moreover, often i have to convert not only korean text but also english sentences along with the korean text like "<Hello 여보세요>.
How to do for such kind of sentences?
2. I was looking for something like below
which is from this website [url=https://stackoverflow.com/questions/5327512/convert-html-to-plain-text-in-vba[/url]
can you make similar function for powerpoint ?
can i apply such function for my cause?
3. About the Modular translation code which you suggested, it is working fine with text boxes,shapes and grouped items but, the problem is with tables as it is taking lot of time to translate tables than my code.
Can you pls check and let me know if you are able to find solution for above issues.
4.Another issue i discovered is with maintaining lines as you know there are two kinds of new lines a)Enter b)Shift+Enter
My previous code is able to solve 'Enter' by replacing "++" with VbCrLf but i'm trying to separate with Shift+Enter
My solution is to use
val = Replace(val, " chr$(13)", "%2D%2D")
which will return after translation ", ,"
Then again i'm replacing that with val=Replace(val, ", " "Chr$(13)")
The problem with above replace function is, it is giving new line to words with commas in text format like "rat, bat, cat," to
rat
bat
cat
Can you help me on this too?
Thanks in advance.
First of All, thanks for your prompt response and detailed comments for the code.
i had gone through and tested your code and below are my observations
1. I was not able to use strip symbol function properly i think .. i used it as below, correct me if im wrong
.Text = stripsymbols(GetGoogleTran
The problem with above line is, it is removing all the text from the text box not only symbols.
Moreover, often i have to convert not only korean text but also english sentences along with the korean text like "<Hello 여보세요>.
How to do for such kind of sentences?
2. I was looking for something like below
Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
which is from this website [url=https://stackoverflow.com/questions/5327512/convert-html-to-plain-text-in-vba[/url]
can you make similar function for powerpoint ?
can i apply such function for my cause?
3. About the Modular translation code which you suggested, it is working fine with text boxes,shapes and grouped items but, the problem is with tables as it is taking lot of time to translate tables than my code.
Can you pls check and let me know if you are able to find solution for above issues.
4.Another issue i discovered is with maintaining lines as you know there are two kinds of new lines a)Enter b)Shift+Enter
My previous code is able to solve 'Enter' by replacing "++" with VbCrLf but i'm trying to separate with Shift+Enter
My solution is to use
val = Replace(val, " chr$(13)", "%2D%2D")
which will return after translation ", ,"
Then again i'm replacing that with val=Replace(val, ", " "Chr$(13)")
The problem with above replace function is, it is giving new line to words with commas in text format like "rat, bat, cat," to
rat
bat
cat
Can you help me on this too?
Thanks in advance.
Ah. OK. You have a mix of English and Korean. There are two ways to do this. Note that the code example you provided above does not convert HTML characters but merely returns the text from within an HTML element.
The two methods you could user are (a) filter characters you don't want Google to translate or (b) convert them to HTML codes prior to passing them for translation.
As you have a mix of English, Korean and numerous punctuation and other symbols, there is no shortcut way of achieving what you want without examining every character, one by one. I think you need to revert to the previous method where you transform the "illegal" symbols prior to passing to Google. Here is a new function to do that:
And it's called like this, using the first instance in your code:
I tested it with the following selected text on a slide:
hello <여보세요>
And the text was changed to this:
hello <Hello>
I'm not sure what you're trying to achieve with tables. Are you looking to translate only selected cells or all sells of a selected table?
I'll look at the line feed bit next.
The two methods you could user are (a) filter characters you don't want Google to translate or (b) convert them to HTML codes prior to passing them for translation.
As you have a mix of English, Korean and numerous punctuation and other symbols, there is no shortcut way of achieving what you want without examining every character, one by one. I think you need to revert to the previous method where you transform the "illegal" symbols prior to passing to Google. Here is a new function to do that:
' ====================================================================
' Function to create an HTML-safe string from a given string of text.
' Author : Jamie Garroch of http://youpresent.co.uk/
' Inputs : sText is a plain text string which may contain English,
' Korean, punctuation, line feed and other symbols
' Outputs : returns an HTML-safe string where any non-alphanumeric
' characters (in English or Korean) are replaced with their
' HTML-formatted hexadecimal equivalents.
' Example :
' input = <Hello World!>
' output = %3CHello%20World%21%3E
' ====================================================================
Function MakeSafeForHTML(sText As String) As String
Dim newText As String
Dim chCode As Long
Dim x As Long
Dim bKeepChar As Boolean
Const KoreanStart = &H3131& ' Dec 12593
Const KoreanEnd = &HCB4C& ' Dec 52044
For x = 1 To Len(sText)
' Convert integer value of the character to the signed character code
chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
' Check the character and set a flag if it's ok to keep it as it is
Select Case chCode
Case KoreanStart To KoreanEnd: bKeepChar = True
Case Asc("a") To Asc("z"): bKeepChar = True
Case Asc("A") To Asc("Z"): bKeepChar = True
Case Asc("0") To Asc("9"): bKeepChar = True
Case Else: bKeepChar = False
End Select
' Add the character to a new string or convert it to its hexadecimal equivalent
If bKeepChar Then
newText = newText & Mid(sText, x, 1)
Else
newText = newText & "%" & Hex(chCode)
End If
Next
' Return the HTML-safe string
MakeSafeForHTML = newText
End Function
And it's called like this, using the first instance in your code:
Sub TranslateKorToEng()
Dim oSelShp As Shape
Dim r As Long, c As Long, x As Long, y As Long
Dim lGrpItem As Long
Select Case ActiveWindow.Selection.Type
Case ppSelectionText
With ActiveWindow.Selection.TextRange
.Text = GetGoogleTranslation(MakeSafeForHTML(.Text))
End With
Case ppSelectionShapes
' rest of procedure...
I tested it with the following selected text on a slide:
hello <여보세요>
And the text was changed to this:
hello <Hello>
I'm not sure what you're trying to achieve with tables. Are you looking to translate only selected cells or all sells of a selected table?
I'll look at the line feed bit next.
ASKER
i'm trying to do both
1. to translate complete table when i select the table
2. to translate only text in a cell when i select only text.
but for some weird reason when i press ctrl+A the table is getting selected but it is not getting translated with my code .
1. to translate complete table when i select the table
2. to translate only text in a cell when i select only text.
but for some weird reason when i press ctrl+A the table is getting selected but it is not getting translated with my code .
ASKER
jamie, i think the problem with your function 'MakesafeForHTML' Is.. it is converting symbols before they are passed on to translator
i think we should decode the symbols which are generated in html to normal text.
i think we should decode the symbols which are generated in html to normal text.
ASKER
and about the new line (shift+enter) , i think i'm able to solve that with
val = Replace(val, Chr$(11), "%2D%2D") using this replace with text before translation
val = Replace(val, "- ", Chr$(11)) using this replace with text after translation.
val = Replace(val, Chr$(11), "%2D%2D") using this replace with text before translation
val = Replace(val, "- ", Chr$(11)) using this replace with text after translation.
This version addresses the table comment. It still needs some work with HTML character transcoding but I think this is looking pretty good now.
Option Explicit
Private oHTTP As Object
Sub TranslateKorToEng()
Dim oSelShp As Shape, oGrpItem As Shape
Dim lRow As Long, lCol As Long, bWholeTable As Boolean
Set oHTTP = CreateObject("MSXML2.ServerXMLHTTP") ' create object(shape,textframe,table) using HTTP format
Select Case ActiveWindow.Selection.Type
Case ppSelectionText
' Translate seletced text within a shape or text within a single selected cell of a table
With ActiveWindow.Selection.TextRange
.Text = GetGoogleTranslation(.Text)
End With
Case ppSelectionShapes
For Each oSelShp In ActiveWindow.Selection.ShapeRange ' checks multiple shapes
If oSelShp.Type = msoGroup Then ' Check whether selected Shape is Group or not
For Each oGrpItem In oSelShp.GroupItems ' using for group through each text box in a group
With oGrpItem
If .HasTextFrame Then 'check for textframe
If .TextFrame.HasText Then ' check for textframe. text
With .TextFrame2
.WordWrap = msoCTrue ' word wrap the text box
.AutoSize = msoAutoSizeTextToFitShape ' this 2 line for text fit to TextBox (Font size is changed)
With .TextRange
.Text = GetGoogleTranslation(.Text)
End With
End With
End If
End If
End With
Next
'¿©±ä µµÇü, ÅؽºÆ® "óÀÚ Ã¼Å·
ElseIf oSelShp.HasTextFrame Then 'for only shape
If oSelShp.TextFrame.HasText Then 'and it has text
With oSelShp.TextFrame2
.WordWrap = msoCTrue 'Set WordWrap
.AutoSize = msoAutoSizeTextToFitShape
With .TextRange
.Text = GetGoogleTranslation(.Text)
End With
End With
End If
'¿©±ä Å×À̺í checking
ElseIf oSelShp.HasTable Then
' If the full table is selected (or all cells), set a flag to translate all cells containing text else only translate those cells that are selected
If WholeTableIsSelected(oSelShp) Then bWholeTable = True
With oSelShp.Table
For lCol = 1 To .Columns.Count
For lRow = 1 To .Rows.Count
With .Cell(lRow, lCol)
If bWholeTable Then
With .Shape.TextFrame
If .HasText Then ' if the cell contains any text, translate it
With .TextRange
.Text = GetGoogleTranslation(.Text)
End With
End If
End With
ElseIf .Selected Then ' check if the cell is selected or not
With .Shape.TextFrame
If .HasText Then ' if the cell contains any text, translate it
With .TextRange
.Text = GetGoogleTranslation(.Text)
End With
End If
End With
End If
End With ' .Cell
Next
Next
End With ' oSelShp.Table
End If
Next oSelShp
Case Else
MsgBox "Invalid selection", vbCritical + vbOKOnly, "Cannot Perform Translation"
End Select
' Clean up
Set oHTTP = Nothing
End Sub
' Detect if the full table is selected as opposed to one or more cells within it
' If the full table is selected, all cells are selected (but not highlighted)
' If some [highlighted] cells, but not all are selected, this is detectable
Function WholeTableIsSelected(oShp As Shape) As Boolean
Dim lRow As Long, lCol As Long
If oShp.HasTable Then
With oShp.Table
For lRow = 1 To .Rows.Count
For lCol = 1 To .Columns.Count
If .Cell(lRow, lCol).Selected = False Then Exit Function
Next
Next
End With
End If
' If we got here, it means that no cells are not selected so the full table must be selected,
' either as the full table object or as a selection of all the cells.
WholeTableIsSelected = True
End Function
' Translate the src text from Korean to English using Google Translate
Function GetGoogleTranslation(src As String) As String
Dim URL As String
Dim getParam As String, trans As String
Const translateFrom = "ko" ' source language parameter
Const translateTo = "en" ' result language parameter
Const RegExpPat = "div[^""]*?""ltr"".*?>(.+?)</div>" ' pattern to find the translated text in the Google Translate HTML page
' replace special symbols and apply text to convert to get function
getParam = MakeSafeForHTML(src)
'getParam = ConvertToGet(src)
' Creates and modifies the URL of UTF-8 translator with getParam
URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
oHTTP.Open "GET", URL, False 'open the http object
oHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" ' set browser header
oHTTP.send ("") ' send command to browser
If InStr(oHTTP.responseText, "div dir=""ltr""") > 0 Then
' Find the translated text from the returned HTML
trans = RegexExecute(oHTTP.responseText, RegExpPat)
' Clean up the text
trans = ReplaceEnter(trans)
trans = Clean(trans)
' Return the translated text
GetGoogleTranslation = trans
End If
End Function
' SB Functions for ++ sign to Enter
' new line symbol while entering data from ppt
Function ReplaceEnter(txt As String)
txt = Replace(txt, " ++ ", vbCrLf) ' Enter
txt = Replace(txt, " + ", vbVerticalTab) ' Shift+Enter or Enter in Title placeholder
ReplaceEnter = txt
End Function
' Original from http://analystcave.com/excel-google-translate-functionality/
'Function Clean(val As String)
' val = Replace(val, """, """")
' val = Replace(val, "%2C", ",")
' val = Replace(val, "'", "'")
' Clean = val
'End Function
Function Clean(val As String) 'to clean symbol code to symbol
val = Replace(val, """", """")
val = Replace(val, "%2C", ",")
val = Replace(val, "'", "'")
val = Replace(val, "<", "<")
val = Replace(val, ">", ">")
Clean = val
End Function
' Find the plain text from div><br><div dir="ltr" class="t0">Hello</div>
' str = html content from the page returned by the Google translate query
' reg = regular expression search pattern string e.g. div[^"]*?"ltr".*?>(.+?)</div>
' match div
' 12 special metacharacters (escape with a backslash) \^$.|?*+()[{
' [..] = character class
' [^ = NOT
' ^ = position matches at the start of the string
' $ = position matches at the end of the string
' . = any character
' ? = optional preceding token
' * = attempt to match the preceding token zero or more times
' + =
' | = OR
' () = create a numbered capturing group
' [^"] = sequence, ^ negates, so anything that is not "
' *? = optional anything preceeding "ltr"
' "ltr" = match ltr
' . = match any character
' *? = optionally anything preceeding ">"
' (.+?) = create a numbered capturing group
' e.g. looking for this HTML part : <div dir="ltr" class="t0">Hello World</div>
Public Function RegexExecute(str As String, reg As String, _
Optional matchIndex As Long, _
Optional subMatchIndex As Long) As String
Dim regex As Object
Dim matches As Object
Dim xlErrValue
On Error GoTo ErrHandl
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = reg
regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
If regex.Test(str) Then
' Return a collection of string matches from the source HTML based on the pattern
' e.g. matches(0) = div><br><div dir="ltr" class="t0">Hello World</div>
Set matches = regex.Execute(str)
' Return the translated plain text from the match via the (.+?) part of the expression pattern
' e.g. matches(0).SubMatches(0) = Hello World
RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
Exit Function
End If
ErrHandl:
RegexExecute = CVErr(xlErrValue)
End Function
' ====================================================================
' Function to create an HTML-safe string from a given string of text.
' Author : Jamie Garroch of http://youpresent.co.uk/
' Inputs : sText is a plain text string which may contain English,
' Korean, punctuation, line feed and other symbols
' Outputs : returns an HTML-safe string where any non-alphanumeric
' characters (in English or Korean) are replaced with their
' HTML-formatted hexadecimal equivalents.
' Example :
' input = <Hello World!>
' output = %3CHello%20World%21%3E
' ====================================================================
Function MakeSafeForHTML(sText As String) As String
Dim newText As String, newChar As String
Dim chCode As Long
Dim x As Long
Dim bKeepChar As Boolean
Const KoreanStart = &H3131& ' Dec 12593
Const KoreanEnd = &HCB4C& ' Dec 52044
For x = 1 To Len(sText)
' Convert integer value of the character to the signed character code
chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
' Check the character and set a flag if it's ok to keep it as it is
Select Case chCode
Case KoreanStart To KoreanEnd: newChar = Mid(sText, x, 1)
Case Asc("a") To Asc("z"): newChar = Mid(sText, x, 1)
Case Asc("A") To Asc("Z"): newChar = Mid(sText, x, 1)
Case Asc("0") To Asc("9"): newChar = Mid(sText, x, 1)
Case Asc("&"): newChar = "and"
' Replace line feed characters with plain text '+' or '++' so they can be restored after translation
Case 11: newChar = "%2B" ' Vertical Tab (vbVerticalTab : Shift+Enter or Enter in title placeholders)
Case 13: newChar = "%2B%2B" ' Carriage Return (vbCrLf : Enter)
End Select
' Add the character to a new string or convert it to its hexadecimal equivalent
If newChar = "" Then newText = newText & "%" & Hex(chCode) Else newText = newText & newChar
newChar = ""
Next
' Return the HTML-safe string
MakeSafeForHTML = newText
End Function
ASKER
can you look into HTML transcoding
thank you :)
thank you :)
I updated the URL and HTML transcoding pre and post processing functions as follows:
If you need any further help, please consider starting a Gig as this question has become very involved now.
' Reverse the transcoding for PowerPoint line feed characters
Function ReplaceEnter(txt As String)
txt = Replace(txt, " ++ ", vbCrLf) ' Enter
txt = Replace(txt, " + ", vbVerticalTab) ' Shift+Enter or Enter in Title placeholder
ReplaceEnter = txt
End Function
' Clean HTML codes returned from Google in the translated string
Function Clean(sText As String)
' HTML predefined character entity references
sText = Replace(sText, """, """")
sText = Replace(sText, "'", "'")
sText = Replace(sText, "<", "<")
sText = Replace(sText, ">", ">")
sText = Replace(sText, "&", "&")
' HTML Numerical (decimal)
sText = Replace(sText, "'", "'")
sText = Replace(sText, ",", ",")
Clean = sText
End Function
' ====================================================================
' Function to create an HTML-safe string from a given string of text.
' Author : Jamie Garroch of http://youpresent.co.uk/
' Inputs : sText is a plain text string which may contain English,
' Korean, punctuation, line feed and other symbols
' Outputs : returns an HTML-safe string where any non-alphanumeric
' characters (in English or Korean) are replaced with their
' HTML-formatted hexadecimal equivalents.
' Example :
' input = <Hello World!>
' output = %3CHello%20World%21%3E
' ====================================================================
Function MakeSafeForHTML(sText As String) As String
Dim newText As String, newChar As String
Dim chCode As Long
Dim x As Long
Dim bKeepChar As Boolean
Const KoreanStart = &H3131& ' Dec 12593
Const KoreanEnd = &HCB4C& ' Dec 52044
For x = 1 To Len(sText)
' If AscW returns negative then convert integer value of the UTF-8 character to a signed character code
If AscW(Mid(sText, x, 1)) < 0 Then
chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
Else
chCode = Asc(Mid(sText, x, 1))
End If
' Check the character and set a flag if it's ok to keep it as it is
Select Case chCode
Case KoreanStart To KoreanEnd: newChar = Mid(sText, x, 1)
Case Asc(" "): newChar = "%20"
Case Asc("a") To Asc("z"): newChar = Mid(sText, x, 1)
Case Asc("A") To Asc("Z"): newChar = Mid(sText, x, 1)
Case Asc("0") To Asc("9"): newChar = Mid(sText, x, 1)
Case Asc("&"): newChar = "%26" ' & in HTML, dec 38
Case Asc("‘"): newChar = "%27" ' opening single quote char 145
Case Asc("’"): newChar = "%27" ' closing single quote char 146
Case Asc("“"): newChar = "%22" ' opening double quote char 147
Case Asc("”"): newChar = "%22" ' closing double quote char 148
' Replace line feed characters with plain text '+' or '++' so they can be restored after translation
Case 11: newChar = "%2B" ' Transpose to "+" for Vertical Tab (vbVerticalTab : Shift+Enter or Enter in title placeholders)
Case 13: newChar = "%2B%2B" ' Transpose to "++" for Carriage Return (vbCrLf : Enter)
End Select
' Add the character to a new string or convert it to its hexadecimal equivalent
If newChar = "" Then
newText = newText & "%" & Hex(chCode)
Else
newText = newText & newChar
End If
newChar = ""
Next
If DebugFlag Then Debug.Print "Sending: " & newText
' Return the HTML-safe string
MakeSafeForHTML = newText
End Function
If you need any further help, please consider starting a Gig as this question has become very involved now.
ASKER
Hi jamie,
thanks a lot for your help .
but i'm still unable to use your function efficiently.
if i translate below lines which are in korean
▷ 김허남(1998), 바람직한 한국 : 홍익인간 이념 실천 방안, 서울 : 삼보사▷ 김현선(1999), 비교신화학적 관점에서 본 홍익인간의 이념▷ 권성아(1998), 홍익인간사상과 민족통일교육, 강원대 박사학위논문▷ 권성아·신창호(1999), 홍익인간사상과 통일교육, 서울 : 집문당▷ 이석호(2001), 인간의 이해 - 철학적 인간학 입문, 철학과 현실사▷ 정영훈(1999), 홍익인간 이념의 유래와 현대적 의의, 한국정신문화연구원, 홍익인간 이념 연구▷ 정영훈 외(1999), 홍익인간 이념 연구, 성남 : 정신문화연구원
i'm getting the translated English text with lot of codes likes 9C,4D,+% B7 as below
++ B7 Kim C8 M (1998), 5C 5C country: 4D Human resource idea 9C plan, Seoul: Sambo companyB7 Kim 04 line (1999) From the standpoint of viewpoint, 4D is the ideology of the human being +% B7 Kwon Sung-ah (1998), 4D Iwo-Man thought and national B5 day education, Dr. Gangwon University, 59th papers +% B7 Kwon Sung- ), 4D Iwo-Man Thought and? B5day Education, Seoul: Gwangmun-dong +% B7 Lee, 38 (2001)4% B7 Presided by C8 (1999), 4D is the origin of the human ideology, 4 is the significance of the opposition, 5C is the national newspaper, 54 is the researcher, A Study on Human Ideology, Seongnam: Jung Shin-hyuk 54 Researcher.
In addition to that i think, stripping each korean word and passing it to google translator will change the meaning of the word and returning different meanings as a result.
I'm attaching a file which will help you to test.
By the way, what is a gig ?
Presentation1.pptx
thanks a lot for your help .
but i'm still unable to use your function efficiently.
if i translate below lines which are in korean
▷ 김허남(1998), 바람직한 한국 : 홍익인간 이념 실천 방안, 서울 : 삼보사▷ 김현선(1999), 비교신화학적 관점에서 본 홍익인간의 이념▷ 권성아(1998), 홍익인간사상과 민족통일교육, 강원대 박사학위논문▷ 권성아·신창호(1999), 홍익인간사상과 통일교육, 서울 : 집문당▷ 이석호(2001), 인간의 이해 - 철학적 인간학 입문, 철학과 현실사▷ 정영훈(1999), 홍익인간 이념의 유래와 현대적 의의, 한국정신문화연구원, 홍익인간 이념 연구▷ 정영훈 외(1999), 홍익인간 이념 연구, 성남 : 정신문화연구원
i'm getting the translated English text with lot of codes likes 9C,4D,+% B7 as below
++ B7 Kim C8 M (1998), 5C 5C country: 4D Human resource idea 9C plan, Seoul: Sambo companyB7 Kim 04 line (1999) From the standpoint of viewpoint, 4D is the ideology of the human being +% B7 Kwon Sung-ah (1998), 4D Iwo-Man thought and national B5 day education, Dr. Gangwon University, 59th papers +% B7 Kwon Sung- ), 4D Iwo-Man Thought and? B5day Education, Seoul: Gwangmun-dong +% B7 Lee, 38 (2001)4% B7 Presided by C8 (1999), 4D is the origin of the human ideology, 4 is the significance of the opposition, 5C is the national newspaper, 54 is the researcher, A Study on Human Ideology, Seongnam: Jung Shin-hyuk 54 Researcher.
In addition to that i think, stripping each korean word and passing it to google translator will change the meaning of the word and returning different meanings as a result.
I'm attaching a file which will help you to test.
By the way, what is a gig ?
Presentation1.pptx
Thanks for the additional info. FYI, a Gig is a project rather than a question and is paid for. They are used when questions become more than just a question, as is the case here because there is a lot of research and testing to be done to create a fully working code rather than just correct one or two lines of code. You can learn more here https://www.experts-exchange.com/gigs/
Some of the above I can fix (illegal URL characters in the ASCII code space) but others I have not understood yet. For example, if I take the first character ▷ this is Unicode character &H25B7 but if you paste that into a Google Translate web page and translate from Korean, you can see from the URL that is created that it transposes to a URL-safe format of %E2%96%B7
Until I understand this, I cannot progress any further.
Until I understand this, I cannot progress any further.
ASKER
ok, thanks a lot for your effort
before we close this question i have 2 more requests
1. how to maintain indentation with bullet points after translation at different levels like below
a) hello
b)hello
presently, i'm using textrange.paragraphs.text in the text selection but it seems its not working with second line bullet point onwards
and returning from translator like
a) hello
b) hello
2. Is it possible to share code to translate objects,text boxes,tables of selected slide/presentation like your in the ppam file which you have shared with me.
before we close this question i have 2 more requests
1. how to maintain indentation with bullet points after translation at different levels like below
a) hello
b)hello
presently, i'm using textrange.paragraphs.text in the text selection but it seems its not working with second line bullet point onwards
and returning from translator like
a) hello
b) hello
2. Is it possible to share code to translate objects,text boxes,tables of selected slide/presentation like your in the ppam file which you have shared with me.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thanks a lot jimie
i will apply your code to translator.
have a nice day.
i will apply your code to translator.
have a nice day.
FYI, I reduced the number of errors in your example Korean tex above by change the code space constant in MakeSafeForHTML:
Do you know what the correct lower and upper limits are for all Korean characters? I don't think this part of the transcoder is 100% correct yet and needs more UTF-8 knowledge than I have.
Const KoreanEnd = &HD5C8&
Do you know what the correct lower and upper limits are for all Korean characters? I don't think this part of the transcoder is 100% correct yet and needs more UTF-8 knowledge than I have.
After a LOT of reading and experimentation, this is real output from the modified code with your original Korean text as the input above:
▷ Kim, HJ (1998), A desirable Korea: Hongik Human Ideology Practice, Seoul: Sambo
▷ Kim, Hyun-seon (1999), The Idea of Humanism
▷ Kwon Sung-ah (1998), Hongik Human Thought and National Unification Education, Doctoral Thesis of Kangwon National University
▷ Kwon Sung-ah, Shin Chang-ho (1999), Hongik Human Thought and Unification Education, Seoul:
▷ Lee Suk Ho (2001), Human Understanding - Introduction to Philosophical Anthropology, Philosophy and the Reality
▷ Jung Young-hoon (1999), The Origin of Hongik Human Ideology and Contemporary Implications, Korea Mental Culture Institute, Hongik Human Ideology Research
▷ Jung Young-Hoon et al. (1999), Hongik Human Ideology Research, Seongnam:
How does that look?
▷ Kim, HJ (1998), A desirable Korea: Hongik Human Ideology Practice, Seoul: Sambo
▷ Kim, Hyun-seon (1999), The Idea of Humanism
▷ Kwon Sung-ah (1998), Hongik Human Thought and National Unification Education, Doctoral Thesis of Kangwon National University
▷ Kwon Sung-ah, Shin Chang-ho (1999), Hongik Human Thought and Unification Education, Seoul:
▷ Lee Suk Ho (2001), Human Understanding - Introduction to Philosophical Anthropology, Philosophy and the Reality
▷ Jung Young-hoon (1999), The Origin of Hongik Human Ideology and Contemporary Implications, Korea Mental Culture Institute, Hongik Human Ideology Research
▷ Jung Young-Hoon et al. (1999), Hongik Human Ideology Research, Seongnam:
How does that look?
ASKER
wow !
how do you do it?
how do you do it?
Here's the latest MakeSafeForHTML (I renamed the function "EncodeForURL" since what it is actually doing is making a percent-encoding as per RFC3986:
And here is the corresponding post-processor function to transpose the linefeed charatcers:
' ====================================================================
' Function to create a URL string from a given string of text.
' Author : Jamie Garroch of http://youpresent.co.uk/
' Inputs : sText is a plain text string which may contain English,
' Korean, punctuation, line feed and other symbols
' Outputs : returns a URL-safe string where any non-alphanumeric
' characters (in English or Korean) are replaced with their
' URL-safe hexadecimal equivalents.
' Example :
' input = <Hello World!>
' output = %3CHello%20World%21%3E
' ====================================================================
Function EncodeForURL(sText As String) As String
Dim newText As String, newChar As String
Dim chCode As Long
Dim x As Long
Dim bKeepChar As Boolean
Dim lCount As Long
' Replace PowerPoint line feed characters with special plain text sequences so they can be restored after translation
'sText = Replace(sText, Chr(13), sCR)
'sText = Replace(sText, Chr(11), sVT)
' URL encoding references
' https://www.w3schools.com/tags/ref_urlencode.asp
' https://www.tutorialspoint.com/html/html_url_encoding.htm
' RFC3986 https://tools.ietf.org/html/rfc3986#section-2.1
For x = 1 To Len(sText)
' Convert integer value of the UTF-8 character to a signed Long (32-bit) character code
chCode = AscW(Mid(sText, x, 1)) And &HFFFF&
' https://en.wikipedia.org/wiki/UTF-8
' UTF-8 uses 1 to 4 bytes to represent each character
Select Case chCode
Case 8: Stop
' Character is 4-bytes and hence in the CJK code space area so leave it unchanged
Case Is > &HFFF&: newChar = ChrW(chCode)
'Case Is > &HFFF&: newChar = UnicodeToUTF(chCode)
'Case 11, 13: newChar = "%0A" ' LF = BEST RESULT SO FAR (keeps bullets but no CR)
Case 11, 13: newChar = "%0A%25%25%0A"
'Case 11, 13: newChar = "%0B" ' VT
'Case 11, 13: newChar = "%0D" ' CR
'Case 11, 13: newChar = Chr(chCode)
' Unreserved characters in the ASCII code space can be sent in plain text
' A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
' a b c d e f g h i j k l m n o p q r s t u v w x y z
' 0 1 2 3 4 5 6 7 8 9 - _ . ~
Case 48 To 57: newChar = Chr(chCode) ' 0 to 9
Case 65 To 90: newChar = Chr(chCode) ' A to Z
Case 97 To 122: newChar = Chr(chCode) ' a to z
Case 45: newChar = Chr(chCode) ' - (hyphen)
Case 95: newChar = Chr(chCode) ' _ (underscore)
Case 46: newChar = Chr(chCode) ' . (full stop)
Case 126: newChar = Chr(chCode) ' ~ (approximate)
' URI escape character
Case 37: newChar = "%25" ' %
' Reserved Characters Encoding
' ! * ' ( ) ; : @ & = + $ , / ? % # [ ]
Case 36: newChar = "%24" ' $
Case 38: newChar = "%26" ' &
Case 43: newChar = "%2b" ' +
Case 44: newChar = "%2c" ' ,
Case 47: newChar = "%2f" ' /
Case 58: newChar = "%3a" ' :
Case 59: newChar = "%3b" ' ;
Case 61: newChar = "%3d" ' =
Case 63: newChar = "%3f" ' ?
Case 64: newChar = "%40" ' @
' Unsafe Characters Encoding
'Case &H20& To &H7F&: newChar = "%" & CStr(Hex(chCode))
Case 32: newChar = "%20" ' space
Case 34: newChar = "%22" ' "
Case 60: newChar = "%3c" ' <
Case 62: newChar = "%3e" ' >
Case 35: newChar = "%23" ' #
Case 37: newChar = "%25" ' %
Case 123: newChar = "%7b" ' {
Case 125: newChar = "%7d" ' }
Case 124: newChar = "%7c" ' |
Case 92: newChar = "%5c" ' \
Case 94: newChar = "%5e" ' ^
Case 126: newChar = "%7e" ' ~
Case 91: newChar = "%5b" ' [
Case 93: newChar = "%5d" ' ]
Case 96: newChar = "%60" ' `
' ASCII Control Characters Encoding
' 00-&H1F hex (0-31 decimal) and &H7F (127 decimal)
Case 0 To 31: newChar = "%" & CStr(Hex(chCode)) ' 32 x Control Codes
Case 127: newChar = "%" & CStr(Hex(chCode)) ' DEL
' Non-ASCII control characters encoding
' (encoding for the entire "top half" of the ISO-Latin set &H80 - &HFF)
Case 128 To 255: newChar = "%" & CStr(Hex(chCode))
Case Else: newChar = Chr(chCode)
'Case Else: newChar = "%" & CStr(Hex(chCode))
End Select
' Add the plain/encoded character to the new string
newText = newText & newChar
newChar = ""
Next
'newText = Replace(newText, Chr(13), "%20ENTER%20") '"%5bCR%5d"
'newText = Replace(newText, Chr(11), "%20ENTER%20") '"%5bVT%5d"
If DebugFlag Then Debug.Print "Sending: " & newText
' Return the URL-safe string
EncodeForURL = newText
End Function
And here is the corresponding post-processor function to transpose the linefeed charatcers:
' Reverse the transcoding for PowerPoint line feed characters
Function ReplaceEnter(txt As String)
' Note: Google Translate is returning additional spaces after escaped characters
txt = Replace(txt, "%%" & " ", vbCrLf) ' Enter
txt = Replace(txt, "%%" & " ", vbVerticalTab) ' Shift+Enter or Enter in Title placeholder
ReplaceEnter = txt
End Function
https://www.experts-exchange.com/questions/29070031/How-to-maintain-indentation-in-powerpoint-using-VBA-REGEX-is-working.html
Could you give an example of what is not working as expected. For example, if I put the following into a text box on a slide:
<여보세요>
and run your code, I get this:
[Hello]
That's because the < and > characters are being replaced by the ConvertToGet function to [ and ] respectively, before passing to Google Translate.