Solved

Finding the duplicates in a big collection

Posted on 2007-04-03
3
521 Views
Last Modified: 2013-11-07
Hello all,

I have a little problem.
I must find duplicates i a CollectionBase object.
Actually there are 3 properties that give the uniqueness of the records.

I am reading am XML file that gives me the collection of objects in the CollectionBase object.
Then i must "say/display" witch records are duplicated according to some TAG node values.

 "public class XmlEmployesCollection : CollectionBase"

The problem is that sometimes there are more than 15 000 objects in the XmlEmployesCollection.
What i need is some guidelines for completing this task; "Finding the duplicates in a big collection."

I am using .NET Framework v2.0 with C#.

Thanks in advance,
So.



0
Comment
Question by:barbulea
3 Comments
 
LVL 33

Expert Comment

by:raterus
ID: 18845606
Whenever I need to find duplicates, I pull out the trusty HashTable object and start adding values to it based on a "should be unique" key.  Before you add the value to the hashtable, make use of the ContainsKey method to see if you've already put it there.  If you have, you know you have a duplicate.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18845796
For key of the 3 properties it is not so easy but you need only additional steps.
For one Key/it can be complex key too/ you have at least 2 methods
- Sort the collection by key and remove one of the same neighbour item
- When you build a collection make an additional map by key and don't add the items which are already in map
It can be binary sort with preventing item duplication too.
0
 
LVL 6

Accepted Solution

by:
thuannguy earned 500 total points
ID: 18848656
You can use three Dictionary<> to store the objects. Let's consider a concrete example in which the three "KEYS" are Age, Salary and Name
      Dictionary<string, Employee> nameDict = new Dictionary<string, Employee>();
      Dictionary<int, Employee> ageDict = new Dictionary<int, Employee>();
      Dictionary<double, Employee> salaryDict = new Dictionary<double, Employee>();
      List<Employee> duplicateList = new List<Employee>();

      public void Add(Employee employee)
      {
         bool isDuplicate = true;
         if (!nameDict.ContainsKey(employee.Name))
         {
            isDuplicate = false;
            nameDict.Add(employee.Name, employee);
         }
         
         if (!ageDict.ContainsKey(employee.Age))
         {
            isDuplicate = false;
            ageDict.Add(employee.Age, employee);
         }
         
         if (!salaryDict.ContainsKey(employee.Salary))
         {
            isDuplicate = false;
            salaryDict.Add(employee.Salary, employee);
         }
         if (isDuplicate)
              duplicateList.Add(employee);//this object is duplicate, add it to the duplicate list
}

When you read an object from the Xml file, just use the Add method to add it to the container. In the Add method, we check if the three "KEYS" already exist. Since we only store the references to the objects in the three dictionary, the memory cost is not so much.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

ASP.Net to Oracle Connectivity Recently I had to develop an ASP.NET application connecting to an Oracle database.As I am doing it first time ,I had to solve several problems. This article will help to such developers  to develop an ASP.NET client…
More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question