Correctly encode URLs c# for w3c xhtml validation

Hi All,

Sorry but only have a basic understanding of encoding.

I am running w3c validator of one of my pages, one of the errors is this -

reference to external entity in attribute value

This is generally the sign of an ampersand that was not properly escaped for inclusion in an attribute, in a href for example. You will need to escape all instances of '&' into '&'.

    * Line 528, column 489: reference to external entity in attribute value


So I assume I need to start urlencoding (especially the & ampersands) my (3rd party supplied) URLs for href/src etc in my html?

I have tried

(example src link is

HttpUtility.UrlPathEncode - but this does not touch the &'s

HttpUtility.UrlEncode - but this messes up the http:// by encoding that also

Can I just urlencode everything after the http://? is there a built in function for this.

Have I understood this?  You must always URLENCODE your url paths in an XHTML html document.


Paul MacDonaldDirector, Information SystemsCommented:
"Can I just urlencode everything after the http://?"
"is there a built in function for this"
No, but you can strip the prefix prior to and re-add it subsequent to, URLEncoding the URL
"You must always URLENCODE your url paths in an XHTML html document."
I wouldn't generally consider this an issue unless I were passing the URL as a parameter in another URL.  Since you're just trying to pass validation, it really comes down to how important that is for you.

wickedwAuthor Commented:
Hi Paulmacd,

Thats great, do you know of a regexp expression that I can use to grab everything upto the ? so I can remove start, encode and add start again?

Paul MacDonaldDirector, Information SystemsCommented:
I'm not a C# guy and it's not clear if you're using codebehind, but you could just REPLACE(url, "http://", "")
You could do the whole thing on one line like:
Dim strEncodedURL as String = "http://" & Server.UrlEncode( Replace(url, "http://", "") )
wickedwAuthor Commented:
Yeh thanks paul, but the only trouble with that is it encodes the /'s in the domain, say ...

Need to strip out all before the ? and encode rest

No worries, ill sort it, you can have the points :)
wickedwAuthor Commented:
A code example would have been great.
Paul MacDonaldDirector, Information SystemsCommented:
I don't think you'll run into any problems URLEncoding the whole thing.  
Certainly you can use something like:
Uri.UriSchemeHttp & Uri.SchemeDelimiter & Request.Url.Authority & URLEncode(Request.Url.PathAndQuery)
Lastly, you could brute force it:
Dim strOldURL As String = " "
Dim intStart As Integer = strOldURL.IndexOf(CChar("?"))
Dim strNewURL As String = strOldURL.Substring(0, intStart) & "?" & Server.UrlEncode(strOldURL.Substring(intStart + 1))
wickedwAuthor Commented:
Thanks Paul,  

I went down the brute force way as you suggested, other posts on stack overflow seemed to indicate not to encode the lot, and this seemed the best compromise, thanks for your help :)
Paul MacDonaldDirector, Information SystemsCommented:
No one sees the code but us anyway.  As long s it works...
