Solved

VBA: UTF-8 URLencode string in greek

Posted on 2008-09-29
8
6,379 Views
Last Modified: 2012-06-27
In VBA, I need to URLencode a string written in greek, using the UTF-8 encoding. This means that the string "±²³" (in case this does not read right, it says "lowercase alpha,  lowercase beta, lowercase gamma") should be encoded as "%CE%B1%CE%B2%CE%B3" (the same way http://coderstoolbox.net/string/ does the conversion if you select URL, Encode, UTF-8). Ideally, I would like a function where you input the string in greek and get back the encoded string.

From what i see in http://www.w3.org/International/O-URL-code.html the process requires two steps:
- Convert the character string into a sequence of bytes using the UTF-8 encoding
- Convert each byte that is not an ASCII letter or digit to %HH, where HH is the hexadecimal value of the byte

You can also see the relationship between the greek characters and hex UTF-8 here: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=896&number=128 

The code found here (http://www.devx.com/vb2themax/Tip/19160) does not work, since -from what I understand- it is limited to ASCII characters, while greek requires two bytes per character, but it might serve as a starting point for someone who is willing to give it a try.

Thanks!
Alex
0
Comment
Question by:alexspi
  • 4
  • 4
8 Comments
 

Author Comment

by:alexspi
ID: 22595018
I noticed that posting the question messed up some of the characters. I'll give it another try: when I say above
... the string "±²³" should be encoded as %CE%B1%CE%B2%CE%B3 ...
should read
... the string "lowercase alpha,  lowercase beta, lowercase gamma" should be encoded as %CE%B1%CE%B2%CE%B3 ...
0
 
LVL 22

Expert Comment

by:danaseaman
ID: 22642082
Please try this code.
   'lowercase alpha, beta, gamma
   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)

Function URLEncode returns %CE%B1%CE%B2%CE%B3


Option Explicit
 

Private Const adTypeBinary As Long = 1

Private Const adTypeText As Long = 2

Private Const adModeReadWrite As Long = 3
 

Private Sub Form_Load()

   Dim sUni             As String

   'lowercase alpha, beta, gamma

   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)

   Debug.Print URLEncode(sUni)

End Sub
 

Public Function URLEncode(ByVal StringToEncode As String) As String

   Dim i                As Integer
 

   StringToEncode = ADO_EncodeUTF8(StringToEncode)

   For i = 1 To Len(StringToEncode)

      URLEncode = URLEncode & "%" & Hex(Asc(Mid(StringToEncode, i, 1)))

   Next
 

End Function
 

'Purpose: UTF16 to UTF8 using ADO

Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 

   Dim objStream        As Object

   Dim data()           As Byte
 

   Set objStream = CreateObject("ADODB.Stream")

   objStream.Charset = "utf-8"

   objStream.Mode = adModeReadWrite

   objStream.Type = adTypeText

   objStream.Open

   objStream.WriteText strUTF16

   objStream.Flush

   objStream.Position = 0

   objStream.Type = adTypeBinary

   objStream.Read 3 ' skip BOM

   data = objStream.Read()

   objStream.Close

   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 

End Function

Open in new window

0
 
LVL 22

Assisted Solution

by:danaseaman
danaseaman earned 500 total points
ID: 22642202
Revised code to pass Ascii chars as is and replace space with "+".

   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)
   Debug.Print URLEncode("Hello " & sUni)

Returns:
Hello+%CE%B1%CE%B2%CE%B3

Option Explicit
 

Private Const adTypeBinary As Long = 1

Private Const adTypeText As Long = 2

Private Const adModeReadWrite As Long = 3
 

Private Sub Form_Load()

   Dim sUni             As String

   'lowercase alpha, beta, gamma

   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3)

   Debug.Print URLEncode("Hello " & sUni)

End Sub
 

Public Function URLEncode(ByVal StringToEncode As String) As String

   Dim i                As Integer

   Dim iAsc             As Long

   Dim sAsc             As String

   Dim sTemp            As String
 

   StringToEncode = ADO_EncodeUTF8(StringToEncode)

   For i = 1 To Len(StringToEncode)

      sAsc = Mid(StringToEncode, i, 1)

      iAsc = AscW(sAsc)

      Select Case iAsc

         Case 32 'space

            sTemp = "+"

         Case 48 To 57, 65 To 90, 97 To 122

            sTemp = sAsc

         Case Else

            sTemp = "%" & Hex(iAsc)

      End Select

      URLEncode = URLEncode & sTemp

   Next
 

End Function
 

'Purpose: UTF16 to UTF8 using ADO

Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 

   Dim objStream        As Object

   Dim data()           As Byte
 

   Set objStream = CreateObject("ADODB.Stream")

   objStream.Charset = "utf-8"

   objStream.Mode = adModeReadWrite

   objStream.Type = adTypeText

   objStream.Open

   objStream.WriteText strUTF16

   objStream.Flush

   objStream.Position = 0

   objStream.Type = adTypeBinary

   objStream.Read 3 ' skip BOM

   data = objStream.Read()

   objStream.Close

   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 

End Function

Open in new window

0
 

Author Comment

by:alexspi
ID: 22661883
Danaseaman, thanks for your reply. I tried your code and got a
"Run-time error '3001': The application is using arguments that are of the wrong type, are out of acceptable range, or are in conflict with one another."
in line 46: objStream.Type = adTypeText

I'm new in VBA and have no clue about ADO, so any ideas about how to resolve this will be greatly appreciated!
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 22

Expert Comment

by:danaseaman
ID: 22663344
Try changing the  constants to:

Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adModeReadWrite = 3

0
 

Author Comment

by:alexspi
ID: 22667647
Thanks again Danaseaman! Your code was correct; I had mistakenly pasted the constants at the wrong place...

The problem now is that the output is correct in about 50% of the greek characters... and I can't find any pattern or reasoning why it encodes some characters correctly and some not. I have attached a file with all greek characters, demonstrating this (the .doc has the incorrectly encoded characters highlighted; other than that it is the same as the .txt)

many thanks!
BTW, if there is a way I can increase the points beyond 500, let me know how and I will gladly do so.
encoding.txt
encoding.doc
0
 
LVL 22

Expert Comment

by:danaseaman
ID: 22670186
I think I fixed it.

Option Explicit
 

Private Const adTypeBinary As Long = 1

Private Const adTypeText As Long = 2

Private Const adModeReadWrite As Long = 3
 

Private Sub Form_Load()

   Dim sUni             As String

   'lowercase alpha, beta, gamma

   sUni = ChrW(&H3B1) & ChrW(&H3B2) & ChrW(&H3B3) & ChrW(&H386) & ChrW(&H3CE)

   Debug.Print URLEncode(sUni)

End Sub
 

Public Function URLEncode(ByVal StringToEncode As String) As String

   Dim i                As Integer

   Dim iAsc             As Long

   Dim sAsc             As String

   Dim sTemp            As String
 

   StringToEncode = ADO_EncodeUTF8(StringToEncode)

   For i = 1 To Len(StringToEncode)

      sAsc = Mid(StringToEncode, i, 1)

      iAsc = Asc(sAsc)

      Select Case iAsc

         Case 32 'space

            sTemp = "+"

         Case 48 To 57, 65 To 90, 97 To 122

            sTemp = sAsc

         Case Else

            Debug.Print sAsc, iAsc

            sTemp = "%" & Hex(iAsc)

      End Select

      URLEncode = URLEncode & sTemp

   Next
 

End Function
 

'Purpose: UTF16 to UTF8 using ADO

Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As String
 

   Dim objStream        As Object

   Dim data()           As Byte
 

   Set objStream = CreateObject("ADODB.Stream")

   objStream.Charset = "utf-8"

   objStream.Mode = adModeReadWrite

   objStream.Type = adTypeText

   objStream.Open

   objStream.WriteText strUTF16

   objStream.Flush

   objStream.Position = 0

   objStream.Type = adTypeBinary

   objStream.Read 3 ' skip BOM

   data = objStream.Read()

   objStream.Close

   ADO_EncodeUTF8 = StrConv(data, vbUnicode, 1033)
 

End Function

Open in new window

0
 

Accepted Solution

by:
alexspi earned 0 total points
ID: 22685792
Well, the latest version was giving completely wrong results for me. I made some changes to the code, so that the ADO_EncodeUTF8 function returns a Byte array instead of a String, and then the URLEncode function works with these bytes directly. It seems to be working fine as far as I have tested it. Many thanks again for your time and effort!
Private Const adTypeBinary As Long = 1

Private Const adTypeText As Long = 2

Private Const adModeReadWrite As Long = 3
 
 

Public Function URLEncode(ByVal StringToEncode As String) As String

   Dim i                As Integer

   Dim iAsc             As Long

   Dim sTemp            As String

   

   Dim ByteArrayToEncode() As Byte

 

   ByteArrayToEncode = ADO_EncodeUTF8(StringToEncode)

   

   For i = 0 To UBound(ByteArrayToEncode)

      iAsc = ByteArrayToEncode(i)

      Select Case iAsc

         Case 32 'space

            sTemp = "+"

         Case 48 To 57, 65 To 90, 97 To 122

            sTemp = Chr(ByteArrayToEncode(i))

         Case Else

            Debug.Print iAsc

            sTemp = "%" & Hex(iAsc)

      End Select

      URLEncode = URLEncode & sTemp

   Next

 

End Function

 
 

'Purpose: UTF16 to UTF8 using ADO

Public Function ADO_EncodeUTF8(ByVal strUTF16 As String) As Byte()

 

   Dim objStream        As Object

   Dim data()           As Byte

 

   Set objStream = CreateObject("ADODB.Stream")

   objStream.Charset = "utf-8"

   objStream.Mode = adModeReadWrite

   objStream.Type = adTypeText

   objStream.Open

   objStream.WriteText strUTF16

   objStream.Flush

   objStream.Position = 0

   objStream.Type = adTypeBinary

   objStream.Read 3 ' skip BOM

   data = objStream.Read()

   objStream.Close

   ADO_EncodeUTF8 = data

 

End Function

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
DO Loop not working 4 72
MS Date Picker 64 bit 32 bit issue 12 49
VBA: Personal Macro Retain/Highlight/Remove values in a selected column 4 24
Child Form in front 4 37
There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
Most everyone who has done any programming in VB6 knows that you can do something in code like Debug.Print MyVar and that when the program runs from the IDE, the value of MyVar will be displayed in the Immediate Window. Less well known is Debug.Asse…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now