Solved

Want to know more about urlencode and urldecode

Posted on 2004-11-01
5,539 Views
Last Modified: 2013-12-13
I know that urlencode function can convert some special characters and also Chinese characters into %xxx form.

However, what I'd like to know more about is the encoding mechanism of this function.

For example, how is it known that "%2C" stands for the "," character? And for Chinese characters, it would become even more complicated! For instance, the encoded word for the Chinese character "我" is "%A7%DA". How could this be done?!

Also, is there any mapping table for the convertion that I could refer to?

Many Thanks!!!
0
Question by:hellohelloworld
    7 Comments
     
    LVL 48

    Accepted Solution

    by:

    Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
    See
    http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2
    0
     

    Author Comment

    by:hellohelloworld
    Is there a way that I could find the mapping table in my PC?
    0
     
    LVL 48

    Expert Comment

    by:hernst42
    the mapping is easy to build

    $table = array();
    for ($i = 0; $i<= 255; ++$i) {
       $table[chr($i)] = url_encode(chr($i));
    }

    As said the character is just encoded as hex to get the value of a character use ord
    see http://de3.php.net/manual/en/function.ord.php
    0
     

    Author Comment

    by:hellohelloworld
    yes, but what about the Chinese characters?
    Is there anything I could trace so that if I see "%xyz%A23", I know that represents a certain Chinese character without using urldecode?
    0
     
    LVL 48

    Expert Comment

    by:hernst42
    No, you can't guess it by the format of the %xx%yy. As chines characters are stored als multi-byte-characters you might need a very long list (all chinese characters) to get those things known if %xx%yy is a chinese character.
    0
     
    LVL 1

    Expert Comment

    by:hallvors
    What you want is possible but it is considerably more work that it is practical to put in. Just say decode and let PHP do the calculations :)

    Anyway, thanks for an interesting question. Researching it taught me about both how UTF-8 works and about URL encoding in general.

    First, link to an explanation of URL encoding:
    http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
    (disclosure: it's written by someone I know ;)

    Secondly, here is how to find the character from a URL encoding - manually!

    Your character above - "&#25105;" (according to babelfish.altavista.com it means "I" in Chinese, if you can't see it in your browser try to copy this and paste in your address bar: javascript:'<html>&#25105;</html>' ) is actually encoded as %E6%88%91.

    First tool we use is the Windows calculator: open it and change to Scientific mode in the View menu. Then choose "Hex" format and type the hex value from above (simply strip out the % -signs): e68891.

    Now click the "Bin" option to get the binary value of this hexadecimal number. Copy it and paste it in Notepad.

    111001101000100010010001

    This is the binary, UTF-8 encoded string. We want to un-UTF-8 it to find the Unicode value. Here is a technical documentation for UTF-8:
    ftp://ftp.isi.edu/in-notes/rfc2279.txt

    First, start at the end of the string add linebreaks for each 8 digits.

    11100110
    10001000
    10010001

    From the first line, remove all the initial 1 - digits. From each of the next lines, remove the inital "10" - it will now look like this:

    00110
    001000
    010001

    Remove the line breaks and put it all on one line again:

    00110001000010001

    Copy that whole string and go back to the calculator. It should still be on "Binary" format, so just paste this new string.

    If you now click "Dec" (for decimal or "normal" format), this is the exact number given in your first post because your browser translated a character not supported in the POST encoding to a HTML entity - 25105.

    Next, click "Hex". The calculator will say "6211". Now open the Windows "character map" utility. Activate "Advanced view" if it doesn't show the "Go to Unicode" box. Then, in the "Go to Unicode" box type 6211. Voila, it shows the character you are looking for.

    I'm sure you agree it is simpler to just type <? urldecode('%E6%88%91') ?> :-)
    0
     
    LVL 1

    Expert Comment

    by:hallvord
    I spent a long time on that reply though :(
    and posted it with my wrong and now deleted profile :((
    Oh well. It made me wiser and I also posted the mini-tutorial on my website..
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    What Should I Do With This Threat Intelligence?

    Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

    Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
    Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
    Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
    The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

    857 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now