Solved

Finding the duplicates in a big collection

Posted on 2007-04-03
3
518 Views
Last Modified: 2013-11-07
Hello all,

I have a little problem.
I must find duplicates i a CollectionBase object.
Actually there are 3 properties that give the uniqueness of the records.

I am reading am XML file that gives me the collection of objects in the CollectionBase object.
Then i must "say/display" witch records are duplicated according to some TAG node values.

 "public class XmlEmployesCollection : CollectionBase"

The problem is that sometimes there are more than 15 000 objects in the XmlEmployesCollection.
What i need is some guidelines for completing this task; "Finding the duplicates in a big collection."

I am using .NET Framework v2.0 with C#.

Thanks in advance,
So.



0
Comment
Question by:barbulea
3 Comments
 
LVL 33

Expert Comment

by:raterus
ID: 18845606
Whenever I need to find duplicates, I pull out the trusty HashTable object and start adding values to it based on a "should be unique" key.  Before you add the value to the hashtable, make use of the ContainsKey method to see if you've already put it there.  If you have, you know you have a duplicate.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18845796
For key of the 3 properties it is not so easy but you need only additional steps.
For one Key/it can be complex key too/ you have at least 2 methods
- Sort the collection by key and remove one of the same neighbour item
- When you build a collection make an additional map by key and don't add the items which are already in map
It can be binary sort with preventing item duplication too.
0
 
LVL 6

Accepted Solution

by:
thuannguy earned 500 total points
ID: 18848656
You can use three Dictionary<> to store the objects. Let's consider a concrete example in which the three "KEYS" are Age, Salary and Name
      Dictionary<string, Employee> nameDict = new Dictionary<string, Employee>();
      Dictionary<int, Employee> ageDict = new Dictionary<int, Employee>();
      Dictionary<double, Employee> salaryDict = new Dictionary<double, Employee>();
      List<Employee> duplicateList = new List<Employee>();

      public void Add(Employee employee)
      {
         bool isDuplicate = true;
         if (!nameDict.ContainsKey(employee.Name))
         {
            isDuplicate = false;
            nameDict.Add(employee.Name, employee);
         }
         
         if (!ageDict.ContainsKey(employee.Age))
         {
            isDuplicate = false;
            ageDict.Add(employee.Age, employee);
         }
         
         if (!salaryDict.ContainsKey(employee.Salary))
         {
            isDuplicate = false;
            salaryDict.Add(employee.Salary, employee);
         }
         if (isDuplicate)
              duplicateList.Add(employee);//this object is duplicate, add it to the duplicate list
}

When you read an object from the Xml file, just use the Add method to add it to the container. In the Add method, we check if the three "KEYS" already exist. Since we only store the references to the objects in the three dictionary, the memory cost is not so much.
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Just a quick little trick I learned recently.  Now that I'm using jQuery with abandon in my asp.net applications, I have grown tired of the following syntax:      (CODE) I suppose it just offends my sense of decency to put inline VBScript on a…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question