Hello, i'm writing a simple web spider for whole web crawling. I'm using System.Uri class to resolve relative uri's found on webpages. Everything is fine with standard uri's however some servers use url rewritng scheme which Uri class fails to interpret. Example:
base url: http://www.wosp.org.pl/fundacja/index.php/11/2
relative url: index.php/11/2/0
All browsers resolves that relative url to: http://www.wosp.org.pl/fundacja/index.php/11/2/0
But Uri class resolves to http://www.wosp.org.pl/fundacja/index.php/11/index.php/11/2/0
Unfortunately that url does not generate 404 error so i'm getting more and more urls in my db going like http://www.wosp.org.pl/fundacja/index.php/11/index.php/11/2/index.php/11/2/index.php/11/2/
I've also tested java URI class which behaves in the same way.
So is there a bug in Uri class or maybe that relative url is not valid, but if it is so why does it work with all of the browsers.
Also i would welcome any suggestions on how to deal with that problem.