alexspi
asked on
VBA: UTF-8 URLencode string in greek
In VBA, I need to URLencode a string written in greek, using the UTF-8 encoding. This means that the string "±²³" (in case this does not read right, it says "lowercase alpha, lowercase beta, lowercase gamma") should be encoded as "%CE%B1%CE%B2%CE%B3" (the same way http://coderstoolbox.net/string/ does the conversion if you select URL, Encode, UTF-8). Ideally, I would like a function where you input the string in greek and get back the encoded string.
From what i see in http://www.w3.org/International/O-URL-code.html the process requires two steps:
- Convert the character string into a sequence of bytes using the UTF-8 encoding
- Convert each byte that is not an ASCII letter or digit to %HH, where HH is the hexadecimal value of the byte
You can also see the relationship between the greek characters and hex UTF-8 here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128
The code found here (http://www.devx.com/vb2themax/Tip/19160) does not work, since -from what I understand- it is limited to ASCII characters, while greek requires two bytes per character, but it might serve as a starting point for someone who is willing to give it a try.
Thanks!
Alex
From what i see in http://www.w3.org/International/O-URL-code.html the process requires two steps:
- Convert the character string into a sequence of bytes using the UTF-8 encoding
- Convert each byte that is not an ASCII letter or digit to %HH, where HH is the hexadecimal value of the byte
You can also see the relationship between the greek characters and hex UTF-8 here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128
The code found here (http://www.devx.com/vb2themax/Tip/19160) does not work, since -from what I understand- it is limited to ASCII characters, while greek requires two bytes per character, but it might serve as a starting point for someone who is willing to give it a try.
Thanks!
Alex
Please try this code.
'lowercase alpha, beta, gamma
sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
Function URLEncode returns %CE%B1%CE%B2%CE%B3
'lowercase alpha, beta, gamma
sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
Function URLEncode returns %CE%B1%CE%B2%CE%B3
Option Explicit
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
Private Sub Form_Load()
Dim sUni As String
'lowercase alpha, beta, gamma
sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
Debug.Print URLEncode(sUni)
End Sub
Public Function URLEncode(ByVal StringToEncode As String) As String
Dim i As Integer
StringToEncode = ADO_EncodeUTF8(StringToEncode)
For i = 1 To Len(StringToEncode)
URLEncode = URLEncode & "%" & Hex(Asc(Mid(StringToEncode, i, 1)))
Next
End Function
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
Dim objStream As Object
Dim data() As Byte
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8"
objStream.Mode = adModeReadWrite
objStream.Type = adTypeText
objStream.Open
objStream.WriteText strUTF16
objStream.Flush
objStream.Position = 0
objStream.Type = adTypeBinary
objStream.Read 3 ' skip BOM
data = objStream.Read()
objStream.Close
ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
End Function
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Danaseaman, thanks for your reply. I tried your code and got a
"Run-time error '3001': The application is using arguments that are of the wrong type, are out of acceptable range, or are in conflict with one another."
in line 46: objStream.Type = adTypeText
I'm new in VBA and have no clue about ADO, so any ideas about how to resolve this will be greatly appreciated!
"Run-time error '3001': The application is using arguments that are of the wrong type, are out of acceptable range, or are in conflict with one another."
in line 46: objStream.Type = adTypeText
I'm new in VBA and have no clue about ADO, so any ideas about how to resolve this will be greatly appreciated!
Try changing the constants to:
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adModeReadWrite = 3
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adModeReadWrite = 3
ASKER
Thanks again Danaseaman! Your code was correct; I had mistakenly pasted the constants at the wrong place...
The problem now is that the output is correct in about 50% of the greek characters... and I can't find any pattern or reasoning why it encodes some characters correctly and some not. I have attached a file with all greek characters, demonstrating this (the .doc has the incorrectly encoded characters highlighted; other than that it is the same as the .txt)
many thanks!
BTW, if there is a way I can increase the points beyond 500, let me know how and I will gladly do so.
encoding.txt
encoding.doc
The problem now is that the output is correct in about 50% of the greek characters... and I can't find any pattern or reasoning why it encodes some characters correctly and some not. I have attached a file with all greek characters, demonstrating this (the .doc has the incorrectly encoded characters highlighted; other than that it is the same as the .txt)
many thanks!
BTW, if there is a way I can increase the points beyond 500, let me know how and I will gladly do so.
encoding.txt
encoding.doc
I think I fixed it.
Option Explicit
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
Private Sub Form_Load()
Dim sUni As String
'lowercase alpha, beta, gamma
sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3) & ChrW(&H386) & ChrW(&H3CE)
Debug.Print URLEncode(sUni)
End Sub
Public Function URLEncode(ByVal StringToEncode As String) As String
Dim i As Integer
Dim iAsc As Long
Dim sAsc As String
Dim sTemp As String
StringToEncode = ADO_EncodeUTF8(StringToEncode)
For i = 1 To Len(StringToEncode)
sAsc = Mid(StringToEncode, i, 1)
iAsc = Asc(sAsc)
Select Case iAsc
Case 32 'space
sTemp = "+"
Case 48 To 57, 65 To 90, 97 To 122
sTemp = sAsc
Case Else
Debug.Print sAsc, iAsc
sTemp = "%" & Hex(iAsc)
End Select
URLEncode = URLEncode & sTemp
Next
End Function
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
Dim objStream As Object
Dim data() As Byte
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8"
objStream.Mode = adModeReadWrite
objStream.Type = adTypeText
objStream.Open
objStream.WriteText strUTF16
objStream.Flush
objStream.Position = 0
objStream.Type = adTypeBinary
objStream.Read 3 ' skip BOM
data = objStream.Read()
objStream.Close
ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
End Function
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
... the string "±²³" should be encoded as %CE%B1%CE%B2%CE%B3 ...
should read
... the string "lowercase alpha, lowercase beta, lowercase gamma" should be encoded as %CE%B1%CE%B2%CE%B3 ...