Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 6816
  • Last Modified:

VBA: UTF-8 URLencode string in greek

In VBA, I need to URLencode a string written in greek, using the UTF-8 encoding. This means that the string "±²³" (in case this does not read right, it says "lowercase alpha,  lowercase beta, lowercase gamma") should be encoded as "%CE%B1%CE%B2%CE%B3" (the same way http://coderstoolbox.net/string/ does the conversion if you select URL, Encode, UTF-8). Ideally, I would like a function where you input the string in greek and get back the encoded string.

From what i see in http://www.w3.org/International/O-URL-code.html the process requires two steps:
- Convert the character string into a sequence of bytes using the UTF-8 encoding
- Convert each byte that is not an ASCII letter or digit to %HH, where HH is the hexadecimal value of the byte

You can also see the relationship between the greek characters and hex UTF-8 here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128 

The code found here (http://www.devx.com/vb2themax/Tip/19160) does not work, since -from what I understand- it is limited to ASCII characters, while greek requires two bytes per character, but it might serve as a starting point for someone who is willing to give it a try.

Thanks!
Alex
0
alexspi
Asked:
alexspi
  • 4
  • 4
2 Solutions
 
alexspiAuthor Commented:
I noticed that posting the question messed up some of the characters. I'll give it another try: when I say above
... the string "±²³" should be encoded as %CE%B1%CE%B2%CE%B3 ...
should read
... the string "lowercase alpha,  lowercase beta, lowercase gamma" should be encoded as %CE%B1%CE%B2%CE%B3 ...
0
 
danaseamanCommented:
Please try this code.
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)

Function URLEncode returns %CE%B1%CE%B2%CE%B3


Option Explicit
 
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
Private Sub Form_Load()
   Dim sUni             As String
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode(sUni)
End Sub
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
 
   StringToEncode = ADO_EncodeUTF8(StringToEncode)
   For i = 1 To Len(StringToEncode)
      URLEncode = URLEncode & "%" & Hex(Asc(Mid(StringToEncode, i, 1)))
   Next
 
End Function
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 
End Function

Open in new window

0
 
danaseamanCommented:
Revised code to pass Ascii chars as is and replace space with "+".

   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode("Hello " & sUni)

Returns:
Hello+%CE%B1%CE%B2%CE%B3

Option Explicit
 
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
Private Sub Form_Load()
   Dim sUni             As String
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode("Hello " & sUni)
End Sub
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
   Dim iAsc             As Long
   Dim sAsc             As String
   Dim sTemp            As String
 
   StringToEncode = ADO_EncodeUTF8(StringToEncode)
   For i = 1 To Len(StringToEncode)
      sAsc = Mid(StringToEncode, i, 1)
      iAsc = AscW(sAsc)
      Select Case iAsc
         Case 32 'space
            sTemp = "+"
         Case 48 To 57, 65 To 90, 97 To 122
            sTemp = sAsc
         Case Else
            sTemp = "%" & Hex(iAsc)
      End Select
      URLEncode = URLEncode & sTemp
   Next
 
End Function
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 
End Function

Open in new window

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
alexspiAuthor Commented:
Danaseaman, thanks for your reply. I tried your code and got a
"Run-time error '3001': The application is using arguments that are of the wrong type, are out of acceptable range, or are in conflict with one another."
in line 46: objStream.Type = adTypeText

I'm new in VBA and have no clue about ADO, so any ideas about how to resolve this will be greatly appreciated!
0
 
danaseamanCommented:
Try changing the  constants to:

Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adModeReadWrite = 3

0
 
alexspiAuthor Commented:
Thanks again Danaseaman! Your code was correct; I had mistakenly pasted the constants at the wrong place...

The problem now is that the output is correct in about 50% of the greek characters... and I can't find any pattern or reasoning why it encodes some characters correctly and some not. I have attached a file with all greek characters, demonstrating this (the .doc has the incorrectly encoded characters highlighted; other than that it is the same as the .txt)

many thanks!
BTW, if there is a way I can increase the points beyond 500, let me know how and I will gladly do so.
encoding.txt
encoding.doc
0
 
danaseamanCommented:
I think I fixed it.

Option Explicit
 
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
Private Sub Form_Load()
   Dim sUni             As String
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3) & ChrW(&H386) & ChrW(&H3CE)
   Debug.Print URLEncode(sUni)
End Sub
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
   Dim iAsc             As Long
   Dim sAsc             As String
   Dim sTemp            As String
 
   StringToEncode = ADO_EncodeUTF8(StringToEncode)
   For i = 1 To Len(StringToEncode)
      sAsc = Mid(StringToEncode, i, 1)
      iAsc = Asc(sAsc)
      Select Case iAsc
         Case 32 'space
            sTemp = "+"
         Case 48 To 57, 65 To 90, 97 To 122
            sTemp = sAsc
         Case Else
            Debug.Print sAsc, iAsc
            sTemp = "%" & Hex(iAsc)
      End Select
      URLEncode = URLEncode & sTemp
   Next
 
End Function
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 
End Function

Open in new window

0
 
alexspiAuthor Commented:
Well, the latest version was giving completely wrong results for me. I made some changes to the code, so that the ADO_EncodeUTF8 function returns a Byte array instead of a String, and then the URLEncode function works with these bytes directly. It seems to be working fine as far as I have tested it. Many thanks again for your time and effort!
Private Const adTypeBinary As Long = 1
Private Const adTypeText As Long = 2
Private Const adModeReadWrite As Long = 3
 
 
Public Function URLEncode(ByVal StringToEncode As String) As String
   Dim i                As Integer
   Dim iAsc             As Long
   Dim sTemp            As String
   
   Dim ByteArrayToEncode() As Byte
 
   ByteArrayToEncode = ADO_EncodeUTF8(StringToEncode)
   
   For i = 0 To UBound(ByteArrayToEncode)
      iAsc = ByteArrayToEncode(i)
      Select Case iAsc
         Case 32 'space
            sTemp = "+"
         Case 48 To 57, 65 To 90, 97 To 122
            sTemp = Chr(ByteArrayToEncode(i))
         Case Else
            Debug.Print iAsc
            sTemp = "%" & Hex(iAsc)
      End Select
      URLEncode = URLEncode & sTemp
   Next
 
End Function
 
 
'Purpose: UTF16 to UTF8 using ADO
Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As Byte()
 
   Dim objStream        As Object
   Dim data()           As Byte
 
   Set objStream = CreateObject("ADODB.Stream")
   objStream.Charset = "utf-8"
   objStream.Mode = adModeReadWrite
   objStream.Type = adTypeText
   objStream.Open
   objStream.WriteText strUTF16
   objStream.Flush
   objStream.Position = 0
   objStream.Type = adTypeBinary
   objStream.Read 3 ' skip BOM
   data = objStream.Read()
   objStream.Close
   ADO_EncodeUTF8 = data
 
End Function

Open in new window

0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now