Solved

C# remove duplicates in a List

Posted on 2010-11-12
6
1,165 Views
Last Modified: 2013-12-17
Hi experts,

I am pasting the method to remove duplicate String objects from any given List of string.

My question is, can you modify this method to be able to remove DataRow objects that contain duplicate DataColumn values (from the List of DataRows):

static List<string> removeDuplicates(List<string> inputList)
{

Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
 
foreach (string currValue in inputList)
{

if (!uniqueStore.ContainsKey(currValue))
{

uniqueStore.Add(currValue, 0);
finalList.Add(currValue);

}

}
return finalList;

}

Open in new window



Additionally, replace the value of another datarow column in the original list with the value from last spotted duplicate.

ie. in pseudo code

if (DataRow row.DataColumn["column"]= /*duplicate*/ )
{
/*then do not add entire row in new list*/
/*but take row.DataColumn["column2"] value and insert that value instead of the original ["column2"] already present in the final list*/
}



0
Comment
Question by:gnihar
  • 3
  • 2
6 Comments
 
LVL 3

Accepted Solution

by:
kraiven earned 500 total points
ID: 34121203
Assuming that the DataRows you are interested in have no existing identifier column then the best (or at least an efficient) way to solve this problem is to generate your own identifier (say using GetHashCode()) and then use this as your dictionary key with the data row as the dictionary value.
i.e Given an existing  DataTable, iterate its row collection generating an id and selectively adding to a dictionary.
 
var dict = new Dictionary<int, DataRow>();
foreach(DataRow row in dt.Rows)
{
	int id = Uniqueify(row);
	if (!dict.ContainsKey(id))
		dict.Add(id, row);
}

Open in new window


where the Uniqueify function is:
 
public int Uniqueify(DataRow dr)
{
	string concat = string.Empty;
	for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}
	return concat.GetHashCode();
}

Open in new window


WARNING Rarely you can generate the same hash code for different strings so you might want to test for this on the occasion that the if (!dict.ContainsKey...) is false; for example by iterating through the column collection comparing each value. This would be the brute force approach to this solution but is minimised by utilising the more efficient method given.
0
 
LVL 11

Expert Comment

by:jasonduan
ID: 34121361
use LINQ:

static List<string> removeDuplicates(List<string> inputList)
{
    return inputList.Distinct().ToList();
}

static List<DataRow> removeDuplicates(List<DataRow> rows)
{
    return rows.Distinct(new MyRowComnparer()).ToList();
}

public class MyRowComnparer : IEqualityComparer<DataRow>
{
      public bool Equals(DataRow x, DataRow y)
      {
            // put your logic here
      }

      public int GetHashCode(DataRow obj)
      {
            // put your logic here
      }
}
0
 

Author Comment

by:gnihar
ID: 34127274
Hi,

I probably wasn't specific enough.


In the first answer, it seems that Uniquify function takes all column values from a single row:

	

for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}

Open in new window


and out of that value, gives a dictionary an unique value - hash code which is then used to uniquely identify that row in a new collection.

If I am not mistaken, this will look for all differences in all columns of a specific row, and if precisely 0 differences are found, will then add that row in a new collection.

But in my example I have to :
look in only specific columns if it's a duplicate
if it is, then I do not insert the row in a collection, but take another column value out of the 'duplicate' row and overwrite the new collections' row's column value with the same name with the value from 'duplicate' row's column.

ie.

iterate through datatables' rows
{
if impurecollection.row["column name1"] == purecollection.row["column name1] (regardless of other column values in the same row, they can be duplicates)

then

do not add the whole row in new collection

but

take impurecollection.row["column name 2"] and insert it's value instead of purecollection. row["column name 2"] where impurecollection.row["column name1"] == purecollection.row["column name1"]


and if impurecollection.row["column name1"] != purecollection.row["column name1]

then add that row from impurecollection into purecollection unchanged

}

Hope that clarifies things a bit.

0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Assisted Solution

by:gnihar
gnihar earned 0 total points
ID: 34127501
Hi again, it seems that I have found a solution using modified kraiven's piece of code.

So, here it is :

                foreach (DataRow row1 in dataSet1.Tables[0].Rows)
                {

                        int id = Uniqueify(row1);

                        if (!dict.ContainsKey(id))
                            {
                             dict.Add(id, row1);
                            }       
                        else 
                            if (dict.ContainsKey(id))
                            {
                                dict[id]["column name 2"] = row1["column name 2"];
                            }
                 }

Open in new window


and the function:

        public int Uniqueify(DataRow dr)
        {
            string concat = string.Empty;
            concat = dr[1].ToString();   // i knew the position of the specific column in a row which must not be a duplicate
            return concat.GetHashCode();
        }

Open in new window

0
 
LVL 3

Expert Comment

by:kraiven
ID: 34130357
Hi gnihar,

Thankyou for accepting my solution. I'm afraid I had missed the end of your post which is why I didn't follow-up with that solution. I'm glad my solution was adaptable however.
0
 

Author Closing Comment

by:gnihar
ID: 34162435
see my last post
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
In this video I am going to show you how to back up and restore Office 365 mailboxes using CodeTwo Backup for Office 365. Learn more about the tool used in this video here: http://www.codetwo.com/backup-for-office-365/ (http://www.codetwo.com/ba…
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now