Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 195
  • Last Modified:

Eliminate Duplicate URLs


I have a set of urls, count ranging from 3000-5000. Now I want to eliminate the duplicate urls in the set.

For example. http://google.com and http://google.com/ are duplicates. The slash is the difference.

One way is to use URL class equals method.  Since I have a huge set of urls, it might take some time.

Is there any other way to do this???
0
sumantedla
Asked:
sumantedla
  • 4
  • 3
1 Solution
 
objectsCommented:
Try adding them to a HashSet, that should get rid of all dupes.
0
 
sumantedlaAuthor Commented:
The urls are represented as String objects. If I use HashSet, will it work. I think here I have to conver them to URL objects.

Am I right??? But what about performance??
0
 
objectsCommented:
Yes, add URL's to the Set
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
objectsCommented:
Performance would most likely be better than if you tried handling the parsing of the string yourself.
I wouldn't imagine a few thousand would be that slow, in fact probably pretty quick
0
 
sumantedlaAuthor Commented:
It is taking nearly 3-4 minutes for just 2000 urls. Can we make it fast??
0
 
objectsCommented:
can u post your code
0
 
sumantedlaAuthor Commented:
I need a method which takes HashMap as input and returns HashMap. For the HashMaps,
key : url ( as string)
value : not important  (Its a kind of vague design. I will change it later.)


      public static HashMap eliminateDuplicates(HashMap urls)
      {      HashMap uniqueUrls = new HashMap();
            try
            {      Set keys = urls.keySet();
                  Iterator iterator = keys.iterator();
                  HashSet set = new HashSet();
                  while (iterator.hasNext())
                  {      String key = (String) iterator.next();
                        URL url = new URL(key);
                        set.add(url);
                        System.out.print(".");
                  }
                  System.out.println("");
                  Iterator setIterator = set.iterator();
                  while (setIterator.hasNext())
                  {
                        uniqueUrls.put(((URL)setIterator.next()).toString(),"");
                  }
            }//      try
            catch(Exception e){      
                  System.out.println(e);
                  e.printStackTrace();
                  return null;
            }
            return uniqueUrls;      
      }
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now