Eliminate Duplicate URLs

Posted on 2005-05-13
Last Modified: 2010-03-31

I have a set of urls, count ranging from 3000-5000. Now I want to eliminate the duplicate urls in the set.

For example. and are duplicates. The slash is the difference.

One way is to use URL class equals method.  Since I have a huge set of urls, it might take some time.

Is there any other way to do this???
Question by:sumantedla
    LVL 92

    Expert Comment

    Try adding them to a HashSet, that should get rid of all dupes.

    Author Comment

    The urls are represented as String objects. If I use HashSet, will it work. I think here I have to conver them to URL objects.

    Am I right??? But what about performance??
    LVL 92

    Accepted Solution

    Yes, add URL's to the Set
    LVL 92

    Expert Comment

    Performance would most likely be better than if you tried handling the parsing of the string yourself.
    I wouldn't imagine a few thousand would be that slow, in fact probably pretty quick

    Author Comment

    It is taking nearly 3-4 minutes for just 2000 urls. Can we make it fast??
    LVL 92

    Expert Comment

    can u post your code

    Author Comment

    I need a method which takes HashMap as input and returns HashMap. For the HashMaps,
    key : url ( as string)
    value : not important  (Its a kind of vague design. I will change it later.)

          public static HashMap eliminateDuplicates(HashMap urls)
          {      HashMap uniqueUrls = new HashMap();
                {      Set keys = urls.keySet();
                      Iterator iterator = keys.iterator();
                      HashSet set = new HashSet();
                      while (iterator.hasNext())
                      {      String key = (String);
                            URL url = new URL(key);
                      Iterator setIterator = set.iterator();
                      while (setIterator.hasNext())
                }//      try
                catch(Exception e){      
                      return null;
                return uniqueUrls;      

    Featured Post

    Find Ransomware Secrets With All-Source Analysis

    Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    nested class vs inner class 5 38
    Increment alphanumeric sequence 6 58
    countHi challenge 25 63
    Unable to start eclipse ? 17 51
    Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
    Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
    Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
    This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now