VBA: UTF-8 URLencode string in greek

In VBA, I need to URLencode a string written in greek, using the UTF-8 encoding. This means that the string "±²³" (in case this does not read right, it says "lowercase alpha,  lowercase beta, lowercase gamma") should be encoded as "%CE%B1%CE%B2%CE%B3" (the same way http://coderstoolbox.net/string/ does the conversion if you select URL, Encode, UTF-8). Ideally, I would like a function where you input the string in greek and get back the encoded string.

From what i see in http://www.w3.org/International/O-URL-code.html the process requires two steps:
- Convert the character string into a sequence of bytes using the UTF-8 encoding
- Convert each byte that is not an ASCII letter or digit to %HH, where HH is the hexadecimal value of the byte

You can also see the relationship between the greek characters and hex UTF-8 here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128 

The code found here (http://www.devx.com/vb2themax/Tip/19160) does not work, since -from what I understand- it is limited to ASCII characters, while greek requires two bytes per character, but it might serve as a starting point for someone who is willing to give it a try.

Thanks!
Alex
alexspiAsked:
Who is Participating?
 
alexspiConnect With a Mentor Author Commented:
Well, the latest version was giving completely wrong results for me. I made some changes to the code, so that the ADO_EncodeUTF8 function returns a Byte array instead of a String, and then the URLEncode function works with these bytes directly. It seems to be working fine as far as I have tested it. Many thanks again for your time and effort!
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
   Dim iAsc             As Long
   Dim sTemp            As String
   
   Dim ByteArrayToEncode() As Byte
 
   ByteArrayToEncode = ADO_EncodeUTF8(StringToEncode)
   
   For i = 0 To UBound(ByteArrayToEncode)
      iAsc = ByteArrayToEncode(i)
      Select Case iAsc
         Case 32 'space
            sTemp = "+"
         Case 48 To 57, 65 To 90, 97 To 122
            sTemp = Chr(ByteArrayToEncode(i))
         Case Else
            Debug.Print iAsc
            sTemp = "%" & Hex(iAsc)
      End Select
      URLEncode = URLEncode & sTemp
   Next
 
End Function
 
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As Byte()
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = data
 
End Function

Open in new window

0
 
alexspiAuthor Commented:
I noticed that posting the question messed up some of the characters. I'll give it another try: when I say above
... the string "±²³" should be encoded as %CE%B1%CE%B2%CE%B3 ...
should read
... the string "lowercase alpha,  lowercase beta, lowercase gamma" should be encoded as %CE%B1%CE%B2%CE%B3 ...
0
 
danaseamanCommented:
Please try this code.
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)

Function URLEncode returns %CE%B1%CE%B2%CE%B3


Option Explicit
 
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
Private Sub Form_Load()
   Dim sUni             As String
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode(sUni)
End Sub
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
 
   StringToEncode = ADO_EncodeUTF8(StringToEncode)
   For i = 1 To Len(StringToEncode)
      URLEncode = URLEncode & "%" & Hex(Asc(Mid(StringToEncode, i, 1)))
   Next
 
End Function
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 
End Function

Open in new window

0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

 
danaseamanConnect With a Mentor Commented:
Revised code to pass Ascii chars as is and replace space with "+".

   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode("Hello " & sUni)

Returns:
Hello+%CE%B1%CE%B2%CE%B3

Option Explicit
 
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
Private Sub Form_Load()
   Dim sUni             As String
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode("Hello " & sUni)
End Sub
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
   Dim iAsc             As Long
   Dim sAsc             As String
   Dim sTemp            As String
 
   StringToEncode = ADO_EncodeUTF8(StringToEncode)
   For i = 1 To Len(StringToEncode)
      sAsc = Mid(StringToEncode, i, 1)
      iAsc = AscW(sAsc)
      Select Case iAsc
         Case 32 'space
            sTemp = "+"
         Case 48 To 57, 65 To 90, 97 To 122
            sTemp = sAsc
         Case Else
            sTemp = "%" & Hex(iAsc)
      End Select
      URLEncode = URLEncode & sTemp
   Next
 
End Function
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 
End Function

Open in new window

0
 
alexspiAuthor Commented:
Danaseaman, thanks for your reply. I tried your code and got a
"Run-time error '3001': The application is using arguments that are of the wrong type, are out of acceptable range, or are in conflict with one another."
in line 46: objStream.Type = adTypeText

I'm new in VBA and have no clue about ADO, so any ideas about how to resolve this will be greatly appreciated!
0
 
danaseamanCommented:
Try changing the  constants to:

Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adModeReadWrite = 3

0
 
alexspiAuthor Commented:
Thanks again Danaseaman! Your code was correct; I had mistakenly pasted the constants at the wrong place...

The problem now is that the output is correct in about 50% of the greek characters... and I can't find any pattern or reasoning why it encodes some characters correctly and some not. I have attached a file with all greek characters, demonstrating this (the .doc has the incorrectly encoded characters highlighted; other than that it is the same as the .txt)

many thanks!
BTW, if there is a way I can increase the points beyond 500, let me know how and I will gladly do so.
encoding.txt
encoding.doc
0
 
danaseamanCommented:
I think I fixed it.

Option Explicit
 
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
Private Sub Form_Load()
   Dim sUni             As String
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3) & ChrW(&H386) & ChrW(&H3CE)
   Debug.Print URLEncode(sUni)
End Sub
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
   Dim iAsc             As Long
   Dim sAsc             As String
   Dim sTemp            As String
 
   StringToEncode = ADO_EncodeUTF8(StringToEncode)
   For i = 1 To Len(StringToEncode)
      sAsc = Mid(StringToEncode, i, 1)
      iAsc = Asc(sAsc)
      Select Case iAsc
         Case 32 'space
            sTemp = "+"
         Case 48 To 57, 65 To 90, 97 To 122
            sTemp = sAsc
         Case Else
            Debug.Print sAsc, iAsc
            sTemp = "%" & Hex(iAsc)
      End Select
      URLEncode = URLEncode & sTemp
   Next
 
End Function
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 
End Function

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.