wickedw
asked on
Correctly encode URLs c# for w3c xhtml validation
Hi All,
Sorry but only have a basic understanding of encoding.
I am running w3c validator of one of my pages, one of the errors is this -
reference to external entity in attribute value
This is generally the sign of an ampersand that was not properly escaped for inclusion in an attribute, in a href for example. You will need to escape all instances of '&' into '&'.
* Line 528, column 489: reference to external entity in attribute value
…woods/A912_SP333_02_UA483 ?fmt=jpeg& qlt=90&wid =245&hei=4 10&color=2 55,255,255 &si…
So I assume I need to start urlencoding (especially the & ampersands) my (3rd party supplied) URLs for href/src etc in my html?
I have tried
(example src link is http://s7v1.scene7.com/is/image/Littlewoods/A912_SP333_02_UA483?fmt=jpeg&qlt=90&wid=245&hei=410)
HttpUtility.UrlPathEncode - but this does not touch the &'s
HttpUtility.UrlEncode - but this messes up the http:// by encoding that also
Can I just urlencode everything after the http://? is there a built in function for this.
Have I understood this? You must always URLENCODE your url paths in an XHTML html document.
Thanks
Matt
Sorry but only have a basic understanding of encoding.
I am running w3c validator of one of my pages, one of the errors is this -
reference to external entity in attribute value
This is generally the sign of an ampersand that was not properly escaped for inclusion in an attribute, in a href for example. You will need to escape all instances of '&' into '&'.
* Line 528, column 489: reference to external entity in attribute value
…woods/A912_SP333_02_UA483
So I assume I need to start urlencoding (especially the & ampersands) my (3rd party supplied) URLs for href/src etc in my html?
I have tried
(example src link is http://s7v1.scene7.com/is/image/Littlewoods/A912_SP333_02_UA483?fmt=jpeg&qlt=90&wid=245&hei=410)
HttpUtility.UrlPathEncode - but this does not touch the &'s
HttpUtility.UrlEncode - but this messes up the http:// by encoding that also
Can I just urlencode everything after the http://? is there a built in function for this.
Have I understood this? You must always URLENCODE your url paths in an XHTML html document.
Thanks
Matt
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yeh thanks paul, but the only trouble with that is it encodes the /'s in the domain, say http://abc.com/a/b/page?blah=blah& ...
Need to strip out all before the ? and encode rest
No worries, ill sort it, you can have the points :)
Need to strip out all before the ? and encode rest
No worries, ill sort it, you can have the points :)
ASKER
A code example would have been great.
I don't think you'll run into any problems URLEncoding the whole thing.
Certainly you can use something like:
Uri.UriSchemeHttp & Uri.SchemeDelimiter & Request.Url.Authority & URLEncode(Request.Url.Path AndQuery)
Lastly, you could brute force it:
Dim strOldURL As String = "http://abc.com/a/b/page?blah=blah& "
Dim intStart As Integer = strOldURL.IndexOf(CChar("? "))
Dim strNewURL As String = strOldURL.Substring(0, intStart) & "?" & Server.UrlEncode(strOldURL .Substring (intStart + 1))
Certainly you can use something like:
Uri.UriSchemeHttp & Uri.SchemeDelimiter & Request.Url.Authority & URLEncode(Request.Url.Path
Lastly, you could brute force it:
Dim strOldURL As String = "http://abc.com/a/b/page?blah=blah& "
Dim intStart As Integer = strOldURL.IndexOf(CChar("?
Dim strNewURL As String = strOldURL.Substring(0, intStart) & "?" & Server.UrlEncode(strOldURL
ASKER
Thanks Paul,
I went down the brute force way as you suggested, other posts on stack overflow seemed to indicate not to encode the lot, and this seemed the best compromise, thanks for your help :)
I went down the brute force way as you suggested, other posts on stack overflow seemed to indicate not to encode the lot, and this seemed the best compromise, thanks for your help :)
No one sees the code but us anyway. As long s it works...
ASKER
Thats great, do you know of a regexp expression that I can use to grab everything upto the ? so I can remove start, encode and add start again?
Thanks
Matt