Link to home
Start Free TrialLog in
Avatar of wickedw
wickedw

asked on

Correctly encode URLs c# for w3c xhtml validation

Hi All,

Sorry but only have a basic understanding of encoding.

I am running w3c validator of one of my pages, one of the errors is this -

reference to external entity in attribute value

This is generally the sign of an ampersand that was not properly escaped for inclusion in an attribute, in a href for example. You will need to escape all instances of '&' into '&'.

    * Line 528, column 489: reference to external entity in attribute value

      …woods/A912_SP333_02_UA483?fmt=jpeg&qlt=90&wid=245&hei=410&color=255,255,255&si…

So I assume I need to start urlencoding (especially the & ampersands) my (3rd party supplied) URLs for href/src etc in my html?

I have tried

(example src link is http://s7v1.scene7.com/is/image/Littlewoods/A912_SP333_02_UA483?fmt=jpeg&qlt=90&wid=245&hei=410)

HttpUtility.UrlPathEncode - but this does not touch the &'s

HttpUtility.UrlEncode - but this messes up the http:// by encoding that also

Can I just urlencode everything after the http://? is there a built in function for this.

Have I understood this?  You must always URLENCODE your url paths in an XHTML html document.

Thanks
Matt


ASKER CERTIFIED SOLUTION
Avatar of Paul MacDonald
Paul MacDonald
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of wickedw
wickedw

ASKER

Hi Paulmacd,

Thats great, do you know of a regexp expression that I can use to grab everything upto the ? so I can remove start, encode and add start again?

Thanks
Matt
I'm not a C# guy and it's not clear if you're using codebehind, but you could just REPLACE(url, "http://", "")
You could do the whole thing on one line like:
Dim strEncodedURL as String = "http://" & Server.UrlEncode( Replace(url, "http://", "") )
 
Avatar of wickedw

ASKER

Yeh thanks paul, but the only trouble with that is it encodes the /'s in the domain, say http://abc.com/a/b/page?blah=blah& ...

Need to strip out all before the ? and encode rest

No worries, ill sort it, you can have the points :)
Avatar of wickedw

ASKER

A code example would have been great.
I don't think you'll run into any problems URLEncoding the whole thing.  
Certainly you can use something like:
Uri.UriSchemeHttp & Uri.SchemeDelimiter & Request.Url.Authority & URLEncode(Request.Url.PathAndQuery)
Lastly, you could brute force it:
Dim strOldURL As String = "http://abc.com/a/b/page?blah=blah& "
Dim intStart As Integer = strOldURL.IndexOf(CChar("?"))
Dim strNewURL As String = strOldURL.Substring(0, intStart) & "?" & Server.UrlEncode(strOldURL.Substring(intStart + 1))
Avatar of wickedw

ASKER

Thanks Paul,  

I went down the brute force way as you suggested, other posts on stack overflow seemed to indicate not to encode the lot, and this seemed the best compromise, thanks for your help :)
No one sees the code but us anyway.  As long s it works...