Solved

Finding the duplicates in a big collection

Posted on 2007-04-03
3
524 Views
Last Modified: 2013-11-07
Hello all,

I have a little problem.
I must find duplicates i a CollectionBase object.
Actually there are 3 properties that give the uniqueness of the records.

I am reading am XML file that gives me the collection of objects in the CollectionBase object.
Then i must "say/display" witch records are duplicated according to some TAG node values.

 "public class XmlEmployesCollection : CollectionBase"

The problem is that sometimes there are more than 15 000 objects in the XmlEmployesCollection.
What i need is some guidelines for completing this task; "Finding the duplicates in a big collection."

I am using .NET Framework v2.0 with C#.

Thanks in advance,
So.



0
Comment
Question by:barbulea
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 33

Expert Comment

by:raterus
ID: 18845606
Whenever I need to find duplicates, I pull out the trusty HashTable object and start adding values to it based on a "should be unique" key.  Before you add the value to the hashtable, make use of the ContainsKey method to see if you've already put it there.  If you have, you know you have a duplicate.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18845796
For key of the 3 properties it is not so easy but you need only additional steps.
For one Key/it can be complex key too/ you have at least 2 methods
- Sort the collection by key and remove one of the same neighbour item
- When you build a collection make an additional map by key and don't add the items which are already in map
It can be binary sort with preventing item duplication too.
0
 
LVL 6

Accepted Solution

by:
thuannguy earned 500 total points
ID: 18848656
You can use three Dictionary<> to store the objects. Let's consider a concrete example in which the three "KEYS" are Age, Salary and Name
      Dictionary<string, Employee> nameDict = new Dictionary<string, Employee>();
      Dictionary<int, Employee> ageDict = new Dictionary<int, Employee>();
      Dictionary<double, Employee> salaryDict = new Dictionary<double, Employee>();
      List<Employee> duplicateList = new List<Employee>();

      public void Add(Employee employee)
      {
         bool isDuplicate = true;
         if (!nameDict.ContainsKey(employee.Name))
         {
            isDuplicate = false;
            nameDict.Add(employee.Name, employee);
         }
         
         if (!ageDict.ContainsKey(employee.Age))
         {
            isDuplicate = false;
            ageDict.Add(employee.Age, employee);
         }
         
         if (!salaryDict.ContainsKey(employee.Salary))
         {
            isDuplicate = false;
            salaryDict.Add(employee.Salary, employee);
         }
         if (isDuplicate)
              duplicateList.Add(employee);//this object is duplicate, add it to the duplicate list
}

When you read an object from the Xml file, just use the Add method to add it to the container. In the Add method, we check if the three "KEYS" already exist. Since we only store the references to the objects in the three dictionary, the memory cost is not so much.
0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question