Link to home
Start Free TrialLog in
Avatar of skyrise11
skyrise11

asked on

Getting uniform url

How do you get an uniform web address using c#
example: converting strings "www.yahoo.com" or "http://yahoo.com" or "http://yahoo.com/" or any other possible combination to one unique string such as "http://www.yahoo.com"? Is there a function out there that I can use?
Avatar of DropZone
DropZone
Flag of United States of America image

What you are asking for is called "Canonicalization", from the verb "to canonicalize", meaning to convert to its base or canonical form.  You can use a RegEx object to do this.  However, there is a problem: you'll you have to know and be sure what the canonical form is.

For example, "yahoo.com" may be a valid URL.  It is certainly syntactically valid according to the RFC that defines URIs, so how do you "know" for sure that it requires a "www" before it?  Perhaps its "w3c.yahoo.com", or maybe "my.yahoo.com".  The only way to find this out would be to have a list of all known URLs before hand and look it up, which we can agree that is not a very practical solution.

You'll also have to consider that the URL may also contain a path at the end, or perhaps a QueryString, such as: "http://www.yahoo.com/mypage" or "http://www.yahoo.com/mypage?id=123".  These are all valid URLs, so you'll have to make sure that you canonicalize strictly the domain part.

Once you settle on the specific criteria that you want to evaluate, and you are comfortable that it defines the canonical form for your URLs, then its straightforward to create a regular expression pattern for it.  For that I can help.

    -dZ.
Avatar of skyrise11
skyrise11

ASKER

Well, my goal is basically to parse images from many sites and store them locally for quick access. In order to figure out which images are stored for which sites, I need to store their URL. Since, yahoo.com and www.yahoo.com, etc. all point to the same site, I want to reduce the number of times I have to parse a site and store its images.

Not sure which options would be best.
ASKER CERTIFIED SOLUTION
Avatar of DropZone
DropZone
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial