Getting uniform url

Posted on 2007-08-07
Last Modified: 2010-04-06
How do you get an uniform web address using c#
example: converting strings "" or "" or "" or any other possible combination to one unique string such as ""? Is there a function out there that I can use?
Question by:skyrise11
    LVL 18

    Expert Comment

    What you are asking for is called "Canonicalization", from the verb "to canonicalize", meaning to convert to its base or canonical form.  You can use a RegEx object to do this.  However, there is a problem: you'll you have to know and be sure what the canonical form is.

    For example, "" may be a valid URL.  It is certainly syntactically valid according to the RFC that defines URIs, so how do you "know" for sure that it requires a "www" before it?  Perhaps its "", or maybe "".  The only way to find this out would be to have a list of all known URLs before hand and look it up, which we can agree that is not a very practical solution.

    You'll also have to consider that the URL may also contain a path at the end, or perhaps a QueryString, such as: "" or "".  These are all valid URLs, so you'll have to make sure that you canonicalize strictly the domain part.

    Once you settle on the specific criteria that you want to evaluate, and you are comfortable that it defines the canonical form for your URLs, then its straightforward to create a regular expression pattern for it.  For that I can help.


    Author Comment

    Well, my goal is basically to parse images from many sites and store them locally for quick access. In order to figure out which images are stored for which sites, I need to store their URL. Since, and, etc. all point to the same site, I want to reduce the number of times I have to parse a site and store its images.

    Not sure which options would be best.
    LVL 18

    Accepted Solution

    I understand what you want to do, but like I said, there isn't a perfect solution to that without knowing first hand what is the correct URL.

    For example, if you had already "" on your list, when someone enters "", you could perform a domain search in your list and notice that they match, and complete it.  However, what if "" was the first one entered?  And also, what if both point to different servers?  It may be very common for web URLs to start with "www" but not absolute:  Perhaps "" resolves directly to "".

    Its a delicate issue.  You could force at least a 3-level domain (one with "third.second.tld") and perform a match on the existing ones, or you could keep a list of the most common ones you expect users are going to enter, and canonicalize them.

    A third option, and perhaps this may be the best one, is to perform the search, or the automatic canonicalization and confirm it with the user:  If the user enters "", present him with "" and ask if it is correct.  Additionally, you could perform an HTTP request directly to behind the scenes just to make sure it exists and valid (I do that with an old site directory I used to keep).


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Free Trending Threat Insights Every Day

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    This article offers some helpful and general tips for safe browsing and online shopping. It offers simple and manageable procedures that help to ensure the safety of one's personal information and the security of any devices.
    Read about why website design really matters in today's demanding market.
    The viewer will learn how to count occurrences of each item in an array.
    Shows how to create a shortcut to site-search Experts Exchange using Google in the Chrome browser. This eliminates the need to type out whenever you want to search the site. Launch the Search Engine Menu: In chrome, via you…

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    8 Experts available now in Live!

    Get 1:1 Help Now