URL encoding - Encode space as + or %20?

Richard2000
Richard2000 used Ask the Experts™
on
Hi,

I am writing a CGI program and need to ensure that the unsafe characters passed in parameters are encoded/decoded correctly.  Now I understand URL encoding overall, with unsafe characters being encoded using the % symbol and I know how to write code to encode/decode this.

What I'm unsure about is, how are spaces encoded?  What confuses me is that I've seen spaces often encoded as either a + or %20.  So suppose I have a CGI program that accepts a search query, the number of results to return and a country, how should this be encoded:

1) q=a+search+term&n=10&c=united+kingdom

or

2) q=a%20search%20term&n=10&c=united%20kingdom

Previously, I've always thought that spaces in URLs should be encoded as %20.  But the vast majority of search engines such as Google display spaces in a search term within the results page URL as +, not %20.  Also, when posting information in forms, I've seen spaces encoded as + rather than %20 too.

Also, if the parameter originally contained a + itself (e.g. C++), must these + characters *always* be encoded as %2B?  The reason I want to know is that I need to write a function to decode the parameters in my CGI program.  If it comes across a + in the encoded string I need to know whether to decode this back to a space or leave it alone as it really intended to be a + in the decoded string.

Thanks in Advance,

Richard
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
You must always encode the + -> hex, as both + and %20 can be used for the purpose. Forms usually submit spaces as +'es, but to be sure you can decode both + and %20 -> spaces.
As far as encoding them you can pick whichever you like. As long as you remember to encode the +'es

Hope that doesn't sound too messy.
Hi,

Quote from CGI/1.1 specification:
"Scripts MUST be prepared to recognise both '+' and '%20' as an encoded space in a URL-encoded value."

Regards, Geo

Author

Commented:
Hi,

Thanks for your comments.  I thought that would be the case with CGI programs, but I just wanted to make sure.

One thing I've noticed though - why isn't the + allowed to represent a space in other parts of the URL that are *not* parameters?

Take the following example.  I entered this into the IE 6 address bar:

www.mysite.com/a%20test%20file.html - This works fine.
www.mysite.com/a test file.html - This works fine and IE automatically displays the spaces as %20 after hitting return.
www.mysite.com/a+test+file.html - But this returns a file not found error.  Why is it that a + can't be used here to represent a space?

Any ideas?

Thanks in Advance,

Richard
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

Because '+' is allowed in path and filenames (both unix and windows). Therefore 'a+test+file.html' is a legal filename as it is and can't be found on your system.

Regards, Geo

Author

Commented:
Hi,

Thanks for your comment.  So it seems that when decoding a parameter in a script, both the + and %20 should be decoded back to a space.  However, when decoding URLs that are *not* used as parameters in a script, only the %20 should be decoded back to a space, since the + is a valid character in URLs.

I have posted a message in the Community Support area to share the points and close the question.

Best Regards,

Richard
force-accepted and question closed

geobul,

please collect your points here:

http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_20411252.html

DigitalXtreme
CS Moderator
Experts-Exchange

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial