• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1144
  • Last Modified:

URL encoding a string for unescape() in JavaScript

My Perl program reads in an HTML file (a document header) line by line and then concatenates each line into one string, which is stored in a variable $string.  What I want to do is embed this string in an HTML hidden field so that it can be called and used in a JavaScript pop-up window.  It seems that the only way to keep the HTML string from overflowing out of the hidden field is to encode it (otherwise all of the special characters screw things up).  JavaScript has a special encode() function which works like this:

According to the Rhino Java Script book,

"The only difference between [JavaScript escape() and URL encoding] is that in URL encoding, spaces are replaced with a '+' character, while escape() replaces spaces with the %20 sequence."

So, it seems like the best way to do this is to URL encode the string, and then replace the + with the %20 sequence. Then I can document.write (unescape(string)) to unfurl the HTML header.  Right?

Hope you can help me!!!

Thanks so much!!!!
0
askrinsky
Asked:
askrinsky
  • 6
  • 5
1 Solution
 
mkornellCommented:
The approach you suggested should work, but the only way to really know is to try it!You don't say which Perl-CGI library you are using, but it sounds like you know how to URL-encode a Perl string.Create the URL-encoded string for the entire file in your Perl script, then, as you suggested, replace the '+' signs with '%20', using the following line:$URL_encoded_string =~ s/\+/\%20/gs;  #replace '+' with '%20'Use that in your HIDDEN field, and JavaScript should be able to unescape it without a problem.But, don't take my word for it.  Try it.
0
 
askrinskyAuthor Commented:
"The approach you suggested should work, but the only way to really know is to try it! You don't say which Perl-CGI library you are using, but it sounds like you know how to URL-encode a Perl string."

mkornell --

Thank you for your response.  I hope you can expand upon it!  I do know how to substitute for the + and %20 characters, but I don't know how to URL encode a string from in Perl!  The books all discuss how to unencode a URL encoded string, but I guess I don't know enough about URL encoding to know how to do it.  Perhaps you do!  I'd appreciate your help, which is why I'm giving you an "A":)  The sooner the better!

Sincerely,

Anthony Krinsky
0
 
mkornellCommented:
Allrightythen :-) Guess I'd better earn my pay...Assuming you've slurped the file into $myfile $myfile =~ s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/egs;will do the job, and you don't even need a separate '+' to '%20' step.I claim no credit for this code.  It was lifted from CGI.pm, which I highly recommend (along with the entire libwww bundle), at CPAN.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
askrinskyAuthor Commented:
mkornell --

You are wise and generous in your wisdom.  Thank you.  I'll check out libwww as well :)

Anthony
0
 
askrinskyAuthor Commented:
<FRAMESET COLS="100%" ROWS="59,*" noborder border="0">
%3CFRAMESET%20COLS%3D%22100%25%22%20ROWS%3D%2259%2C%2A%22%20noborder%20border%3D%220%22%3E%0A <br>
        <FRAME NAME="menu" SRC="" SCROLLING="NO" MARGINHEIGHT=0 MARGINWIDTH=0>
%09%3CFRAME%20NAME%3D%22menu%22%20SRC%3D%22%22%20SCROLLING%3D%22NO%22%20MARGINHEIGHT%3D0%20MARGINWIDTH%3D0%3E%0A <br>
mkornnell:

I've taken your suggestion and it seems like your regxp is doing some magic.  Here's a sample:

<FRAME NAME="main" SRC="" SCROLLING="AUTO" MARGINHEIGHT=0 MARGINWIDTH=0>%09%3CFRAME%20NAME%3D%22main%22%20SRC%3D%22%22%20SCROLLING%3D%22AUTO%22%20MARGINHEIGHT%3D0%20MARGINWIDTH%3D0%3E%0A <br></FRAMESET>%3C%2FFRAMESET%3E%0A <br>

I now have two follow-up questions for you:

Is this what a URL encoded string without + (for spaces) looks like?

If it is, then this is what the JavaScript can "unescape()."

However, this seems unlikely since when this encoded string is buried in a hidden input field (as I plan to do), it still spills out all over the place.  I was hoping that the URL encoded string that JavaScript needs, encoded the HTML in a such a way that it could be embedded safely in a JavaScript page...

Perhaps what I'm asking for can't be done.... or perhaps the regxp you provided needs modification.

Or, maybe I'm asking the wrong questions...  I'll throw in another 10 points if you can help me through this one.

Many Thanks,

Anthony Krinsky







0
 
askrinskyAuthor Commented:
Hold on....  I think it's working :-)... I'll let you know in a moment....
0
 
mkornellCommented:
That is indeed what a URL-encoded string looks like.URL-encoding replaces "special" characters with their hex ASCII value preceeded by a '%'.For instance, '=' is hex 3D.  Thus the string "a=b" would be URL-encoded as "a%3Db"The set of special characters is defined in the URI standard (sorry, I forget the RFC number).  However, it is usually assumed that "special" means anything but alpha, numeric, _ (underscore), - (dash) and . (dot).The two exceptions are the '%' and the ' ' (space).  The standard says that ' ' can be encoded as '+' (for readability), but as you've encountered, this isn't always understood.And, to encode a '%', you can use a '%%' or '%25'Since the unencoding algorithm is so simple (find a '%', use the next two characters as a hex ascii value), it is usually safe to encode the non-special characters.  You could even encode every character in a string:$str =~ s/(.)/uc sprintf ("%%%02x", ord($1))/egs;but the result is less readble.  (Unless you were born with a built-in wetware hex-ASCII translator :-)
How did things go?
0
 
mkornellCommented:
My comments seem quite unreadable.  How do you get a line break into your posts?

--mark;
0
 
mkornellCommented:
Just figured it out.  Here's the comment again, with appropriate breaks.

That is indeed what a URL-encoded string looks like.

URL-encoding replaces "special" characters with their hex ASCII value preceeded by a '%'. For instance, '=' is hex 3D. Thus the string "a=b" would be URL-encoded as "a%3Db"

The set of special characters is defined in the URI standard (sorry, I forget the RFC number). However, it is usually assumed that "special" means anything but alpha, numeric, _ (underscore), - (dash) and . (dot).

The two exceptions are the '%' and the ' ' (space). The standard says that ' ' can be encoded as '+' (for readability), but as you've encountered, this isn't always understood.

And, to encode a '%', you can use a '%%' or '%25'.

Since the unencoding algorithm is so simple (find a '%', use the next two characters as a hex ascii value), it is usually safe to encode the non-special characters. You could even encode every character in a string:

$str =~ s/(.)/uc sprintf ("%%%02x", ord($1))/egs;

but the result is less than readble. (Unless you were born with a built-in wetware hex-ASCII translator :-)

How did things go?
0
 
askrinskyAuthor Commented:
Mark --

It seems to work like a charm... You are the man!

The only problem I'm having right now is that the encoded HTML strings are so long that they might be what's crashing my Netscape.  Maybe I need to chop them into smaller pieces.

By the way, how can I give you some more points?

Thanks for your help!!!!!

Anthony
0
 
mkornellCommented:
Don't worry about points.  A cheque in the mail would be more useful :-)That Netscape crashes with long strings doesn't surprise me.  You might want to try asking a question in the JavaScript area for workarounds.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now