encoding a query string

I looked at http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20335762.html when I was trying to find a way to encode a query string with #'s and +'s in it.  

I used the hexcode function posted there by dirge, and it worked great until I tried to get it to work with Korean characters.  Some Korean characters use 2 bytes, and the hexcode function only converts to 1 byte in hex.  I tried, just to see, changing the hexcode function to:

function hexnib(d) {
  if(d<10) return d; else return String.fromCharCode(65+d-10);
}

function hexcode(url) {
     var result="";
     for(var i=0;i<url.length;i++) {
        var cc=url.charCodeAt(i);
        var hex= "00" + hexnib((cc&240)>>4)+""+hexnib(cc&15);
        result+="%"+hex;
     }
     return result;
}

The only change I made was I added the "00" + in the line: var hex= "00" + hexnib((cc&240)>>4)+""+hexnib(cc&15);  to make it 2 bytes.

I used this to see if it would work for English characters (all of which would have zeros for the first two digits in a 4-digit hex number), but it didn't work.  When it gets to the server, it is not decoded correctly.  It gets converted on the server to empty string (presumably it was only seeing the 0's?)  Does this mean query strings cannot be encoded to the form %A492%B61A%AE53 etc. ?

If not, then how can Korean characters be passed in a query string?

thanks for the help!
maltomeal8Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SquareHeadCommented:
I had the same problem with double byte chars and encoding html entities for the querystring... I was not able to find a solution and ended up replacing the '#' char with something before adding it to the qs, then doing another replace on the receiving end... not an elegent solution by any means but it worked for me... :p
0
avnerCommented:
Have you tried using the escape() method ?


0
maltomeal8Author Commented:
The fact that escape() does not handle + correctly was why I used HexCode in the first place

I just noticed something interesting.  On Google, they seem to take what the user types in and put it into a query string.  So, I tried searching for the word français and I noticed it puts this string in the address bar:

http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=fran%C3%A7ais&btnG=Google+Search

it looks like the ç was converted to %C3%A7 but how is that possible?  When I use javascript's charCodeAt function on ç, it gives me 231, which is %00%E7 in hex.
Also, they are passing ie=UTF-8 which looks like a flag to say to decode unicode characters?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

maltomeal8Author Commented:
I think I have answered my own question (so I guess I'll keep my points).  Apparently a query string can only handle single byte characters.

I found on http://www.w3.org/TR/html4/interact/forms.html#h-17.13.1

that:
Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire [ISO10646] character set.
0
dirgeCommented:
The following is an update on my old script. It works fine with ??? for instance, when compared to what Google generates.

You may want to check out http://www1.tip.nl/~t876506/utf8tbl.html and http://selfaktuell.teamone.de/artikel/javascript/utf8b64/utf8.htm (German)

<html>
<head>
<script language="javascript">
<!--

function hexnib(d) {
   if(d<10) return d; else return String.fromCharCode(65+d-10);
}

function hexbyte(d) {
        return "%"+hexnib((d&240)>>4)+""+hexnib(d&15);
}

function hexcode(url) {
     var result="";
    var hex="";
     for(var i=0;i<url.length; i++) {
             var cc=url.charCodeAt(i);
             if (cc<128) {
                 result+=hexbyte(cc);
             } else if((cc>127) && (cc<2048)) {
                result+=  hexbyte((cc>>6)|192)
                        + hexbyte((cc&63)|128);
             } else {
                result+=  hexbyte((cc>>12)|224)
                        + hexbyte(((cc>>6)&63)|128)
                        + hexbyte((cc&63)|128);
             }
     }
    return result;
}

function encoder() {
   document.forms.test.r.value=hexcode(document.forms.test.s.value);
}

// -->
</script>
</head>
<body>
   <form name="test">
      URL (without http://) <input type="text" name="s"><br>
      Result: <input type="text" name="r"><br>
      <input type="button" value="Encode" onClick="encoder()">
      <input type="reset" value="Clear">
   </form>
</body>
</html>

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
dirgeCommented:
That's 'fine with "Korea" (in Korean)' -- not sure if you see it in your browser, but I don't. I just copied the characters from http://kr.yahoo.com/ 
0
dirgeCommented:
And..... ;-D it's not Google which generates the codes -- it's the browser, once you press Submit.

'Nuff said. Good luck.

0
maltomeal8Author Commented:
Thank you dirge!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
0
justkeysCommented:
(un)escape is NOT the same as url-encode/decode in IE and opera;

url-encode/decode = characters are translated in 1 to 4 "%xx" strings, which represent the unicode bytes:
the algorithm of url-encoding works like this:

                byte[] bytes = the_char.getBytes("UTF-8");
                for (int j = 0; j < bytes.length; j++)
                {
                    buffer.append("%");
                    String hex = Integer.toHexString(255 & bytes[j]);
                    buffer.append("00".substring(hex.length()));
                    buffer.append(hex);
                }

In javascript, i don't know how to do this (i don't know how to find the unicode index for a char in javascript), but for sure, the browser does it when you submit a form that contains "international" input (like chinese). Thats what happens when you look for the euro sign in google.

Netscape's (un)escape IS url-encode/decode; while IE and opera's (un)escape is NOT: in those browsers, escape translates "simple accented chars" to on single "%xx" expression, probably by using a table, because there is no relation between the hex code and the unicode value for the char. For more complex characters, the escape returns a "%uxxxx" where xxxx = the hex unicode for the character.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
JavaScript

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.