C# remove duplicates in a List

Hi experts,

I am pasting the method to remove duplicate String objects from any given List of string.

My question is, can you modify this method to be able to remove DataRow objects that contain duplicate DataColumn values (from the List of DataRows):

static List<string> removeDuplicates(List<string> inputList)
{

Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
 
foreach (string currValue in inputList)
{

if (!uniqueStore.ContainsKey(currValue))
{

uniqueStore.Add(currValue, 0);
finalList.Add(currValue);

}

}
return finalList;

}

Open in new window



Additionally, replace the value of another datarow column in the original list with the value from last spotted duplicate.

ie. in pseudo code

if (DataRow row.DataColumn["column"]= /*duplicate*/ )
{
/*then do not add entire row in new list*/
/*but take row.DataColumn["column2"] value and insert that value instead of the original ["column2"] already present in the final list*/
}



gniharAsked:
Who is Participating?
 
kraivenConnect With a Mentor Commented:
Assuming that the DataRows you are interested in have no existing identifier column then the best (or at least an efficient) way to solve this problem is to generate your own identifier (say using GetHashCode()) and then use this as your dictionary key with the data row as the dictionary value.
i.e Given an existing  DataTable, iterate its row collection generating an id and selectively adding to a dictionary.
 
var dict = new Dictionary<int, DataRow>();
foreach(DataRow row in dt.Rows)
{
	int id = Uniqueify(row);
	if (!dict.ContainsKey(id))
		dict.Add(id, row);
}

Open in new window


where the Uniqueify function is:
 
public int Uniqueify(DataRow dr)
{
	string concat = string.Empty;
	for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}
	return concat.GetHashCode();
}

Open in new window


WARNING Rarely you can generate the same hash code for different strings so you might want to test for this on the occasion that the if (!dict.ContainsKey...) is false; for example by iterating through the column collection comparing each value. This would be the brute force approach to this solution but is minimised by utilising the more efficient method given.
0
 
jasonduanCommented:
use LINQ:

static List<string> removeDuplicates(List<string> inputList)
{
    return inputList.Distinct().ToList();
}

static List<DataRow> removeDuplicates(List<DataRow> rows)
{
    return rows.Distinct(new MyRowComnparer()).ToList();
}

public class MyRowComnparer : IEqualityComparer<DataRow>
{
      public bool Equals(DataRow x, DataRow y)
      {
            // put your logic here
      }

      public int GetHashCode(DataRow obj)
      {
            // put your logic here
      }
}
0
 
gniharAuthor Commented:
Hi,

I probably wasn't specific enough.


In the first answer, it seems that Uniquify function takes all column values from a single row:

	

for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}

Open in new window


and out of that value, gives a dictionary an unique value - hash code which is then used to uniquely identify that row in a new collection.

If I am not mistaken, this will look for all differences in all columns of a specific row, and if precisely 0 differences are found, will then add that row in a new collection.

But in my example I have to :
look in only specific columns if it's a duplicate
if it is, then I do not insert the row in a collection, but take another column value out of the 'duplicate' row and overwrite the new collections' row's column value with the same name with the value from 'duplicate' row's column.

ie.

iterate through datatables' rows
{
if impurecollection.row["column name1"] == purecollection.row["column name1] (regardless of other column values in the same row, they can be duplicates)

then

do not add the whole row in new collection

but

take impurecollection.row["column name 2"] and insert it's value instead of purecollection. row["column name 2"] where impurecollection.row["column name1"] == purecollection.row["column name1"]


and if impurecollection.row["column name1"] != purecollection.row["column name1]

then add that row from impurecollection into purecollection unchanged

}

Hope that clarifies things a bit.

0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
gniharConnect With a Mentor Author Commented:
Hi again, it seems that I have found a solution using modified kraiven's piece of code.

So, here it is :

                foreach (DataRow row1 in dataSet1.Tables[0].Rows)
                {

                        int id = Uniqueify(row1);

                        if (!dict.ContainsKey(id))
                            {
                             dict.Add(id, row1);
                            }       
                        else 
                            if (dict.ContainsKey(id))
                            {
                                dict[id]["column name 2"] = row1["column name 2"];
                            }
                 }

Open in new window


and the function:

        public int Uniqueify(DataRow dr)
        {
            string concat = string.Empty;
            concat = dr[1].ToString();   // i knew the position of the specific column in a row which must not be a duplicate
            return concat.GetHashCode();
        }

Open in new window

0
 
kraivenCommented:
Hi gnihar,

Thankyou for accepting my solution. I'm afraid I had missed the end of your post which is why I didn't follow-up with that solution. I'm glad my solution was adaptable however.
0
 
gniharAuthor Commented:
see my last post
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.