Add to the limitations of the HTTP protocol the wider URI protocol: http://www.faqs.org/rfcs/r
Main Topics
Browse All TopicsDoes someone know which characters (say, ¢) or strings (say, %%%) cannot be part of an URL for sure? I can't find a list of them online.
Please provide me with a list or a few examples.
Thanks!
Here's some background info that isn't necessary for answering the question.
I want to replace characters in a list of URLs that I store within a program. The program cannot store some of them, e.g. those containing 2 "=" signs.
e.g.
http://aleph.unibas.ch/F?c
would yield s.th. like
http://aleph.unibas.ch/F?c
In another step I convert these strings back to "=". Therefore I want banned characters that will never convert sth that is part of an URL.
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Add to the limitations of the HTTP protocol the wider URI protocol: http://www.faqs.org/rfcs/r
The short answer is: yes. The longer answer is: see my previous comment. Depending on your programming environment, you can use functions like UrlEncode to automatically prevent wrong characters for ending up in your string.
Very basic: include this: [A-Za-z0-9_-] and escape all the rest, it will put you on a safe side.
I can answer in more detail, but then I would like to know what part of the URL you want protected: protocol part (only letters), domain part (letters, digits, slightly depending on host), port part (digits only), path part (letters, digits, underscore and hyphen, no slashes, plus sign, percent, spaces etc), query part (depends, question mark, equal sign and ampersand prohibited), anchor part (depends).
Business Accounts
Answer for Membership
by: abelPosted on 2009-06-12 at 09:17:14ID: 24613697
The list is in the HTTP specification http://www.faqs.org/rfcs/r fc2616. It depends a bit what you mean with URL. It can be an URI or a IRI, which are two different things (the second is the internationalized version). Also, the separate parts of the URI have different restraints, and some restraints are imposed by other instances (i.e., the domain name part can be more restrained due to limitations with hosting and dns/bind providers).
The equal sign you mention is a legitimate part of the query part of a URL and separated the parameters. If it is there for not separating the parameters, it should be url encoded. The international characters are allowed in IRIs but not in URIs. In the URIs they should be escaped according to the url encoding scheme which is, in short, the %-sign plus the UTF-8 bytes of the character. The ¢ for instance becomes %c2%a2 inside a URL.
-- Abel --