Solved

C# remove duplicates in a List

Posted on 2010-11-12
6
1,190 Views
Last Modified: 2013-12-17
Hi experts,

I am pasting the method to remove duplicate String objects from any given List of string.

My question is, can you modify this method to be able to remove DataRow objects that contain duplicate DataColumn values (from the List of DataRows):

static List<string> removeDuplicates(List<string> inputList)
{

Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
 
foreach (string currValue in inputList)
{

if (!uniqueStore.ContainsKey(currValue))
{

uniqueStore.Add(currValue, 0);
finalList.Add(currValue);

}

}
return finalList;

}

Open in new window



Additionally, replace the value of another datarow column in the original list with the value from last spotted duplicate.

ie. in pseudo code

if (DataRow row.DataColumn["column"]= /*duplicate*/ )
{
/*then do not add entire row in new list*/
/*but take row.DataColumn["column2"] value and insert that value instead of the original ["column2"] already present in the final list*/
}



0
Comment
Question by:gnihar
  • 3
  • 2
6 Comments
 
LVL 3

Accepted Solution

by:
kraiven earned 500 total points
ID: 34121203
Assuming that the DataRows you are interested in have no existing identifier column then the best (or at least an efficient) way to solve this problem is to generate your own identifier (say using GetHashCode()) and then use this as your dictionary key with the data row as the dictionary value.
i.e Given an existing  DataTable, iterate its row collection generating an id and selectively adding to a dictionary.
 
var dict = new Dictionary<int, DataRow>();
foreach(DataRow row in dt.Rows)
{
	int id = Uniqueify(row);
	if (!dict.ContainsKey(id))
		dict.Add(id, row);
}

Open in new window


where the Uniqueify function is:
 
public int Uniqueify(DataRow dr)
{
	string concat = string.Empty;
	for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}
	return concat.GetHashCode();
}

Open in new window


WARNING Rarely you can generate the same hash code for different strings so you might want to test for this on the occasion that the if (!dict.ContainsKey...) is false; for example by iterating through the column collection comparing each value. This would be the brute force approach to this solution but is minimised by utilising the more efficient method given.
0
 
LVL 11

Expert Comment

by:jasonduan
ID: 34121361
use LINQ:

static List<string> removeDuplicates(List<string> inputList)
{
    return inputList.Distinct().ToList();
}

static List<DataRow> removeDuplicates(List<DataRow> rows)
{
    return rows.Distinct(new MyRowComnparer()).ToList();
}

public class MyRowComnparer : IEqualityComparer<DataRow>
{
      public bool Equals(DataRow x, DataRow y)
      {
            // put your logic here
      }

      public int GetHashCode(DataRow obj)
      {
            // put your logic here
      }
}
0
 

Author Comment

by:gnihar
ID: 34127274
Hi,

I probably wasn't specific enough.


In the first answer, it seems that Uniquify function takes all column values from a single row:

	

for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}

Open in new window


and out of that value, gives a dictionary an unique value - hash code which is then used to uniquely identify that row in a new collection.

If I am not mistaken, this will look for all differences in all columns of a specific row, and if precisely 0 differences are found, will then add that row in a new collection.

But in my example I have to :
look in only specific columns if it's a duplicate
if it is, then I do not insert the row in a collection, but take another column value out of the 'duplicate' row and overwrite the new collections' row's column value with the same name with the value from 'duplicate' row's column.

ie.

iterate through datatables' rows
{
if impurecollection.row["column name1"] == purecollection.row["column name1] (regardless of other column values in the same row, they can be duplicates)

then

do not add the whole row in new collection

but

take impurecollection.row["column name 2"] and insert it's value instead of purecollection. row["column name 2"] where impurecollection.row["column name1"] == purecollection.row["column name1"]


and if impurecollection.row["column name1"] != purecollection.row["column name1]

then add that row from impurecollection into purecollection unchanged

}

Hope that clarifies things a bit.

0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 

Assisted Solution

by:gnihar
gnihar earned 0 total points
ID: 34127501
Hi again, it seems that I have found a solution using modified kraiven's piece of code.

So, here it is :

                foreach (DataRow row1 in dataSet1.Tables[0].Rows)
                {

                        int id = Uniqueify(row1);

                        if (!dict.ContainsKey(id))
                            {
                             dict.Add(id, row1);
                            }       
                        else 
                            if (dict.ContainsKey(id))
                            {
                                dict[id]["column name 2"] = row1["column name 2"];
                            }
                 }

Open in new window


and the function:

        public int Uniqueify(DataRow dr)
        {
            string concat = string.Empty;
            concat = dr[1].ToString();   // i knew the position of the specific column in a row which must not be a duplicate
            return concat.GetHashCode();
        }

Open in new window

0
 
LVL 3

Expert Comment

by:kraiven
ID: 34130357
Hi gnihar,

Thankyou for accepting my solution. I'm afraid I had missed the end of your post which is why I didn't follow-up with that solution. I'm glad my solution was adaptable however.
0
 

Author Closing Comment

by:gnihar
ID: 34162435
see my last post
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question