Solved

C# remove duplicates in a List

Posted on 2010-11-12
6
1,218 Views
Last Modified: 2013-12-17
Hi experts,

I am pasting the method to remove duplicate String objects from any given List of string.

My question is, can you modify this method to be able to remove DataRow objects that contain duplicate DataColumn values (from the List of DataRows):

static List<string> removeDuplicates(List<string> inputList)
{

Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
 
foreach (string currValue in inputList)
{

if (!uniqueStore.ContainsKey(currValue))
{

uniqueStore.Add(currValue, 0);
finalList.Add(currValue);

}

}
return finalList;

}

Open in new window



Additionally, replace the value of another datarow column in the original list with the value from last spotted duplicate.

ie. in pseudo code

if (DataRow row.DataColumn["column"]= /*duplicate*/ )
{
/*then do not add entire row in new list*/
/*but take row.DataColumn["column2"] value and insert that value instead of the original ["column2"] already present in the final list*/
}



0
Comment
Question by:gnihar
  • 3
  • 2
6 Comments
 
LVL 3

Accepted Solution

by:
kraiven earned 500 total points
ID: 34121203
Assuming that the DataRows you are interested in have no existing identifier column then the best (or at least an efficient) way to solve this problem is to generate your own identifier (say using GetHashCode()) and then use this as your dictionary key with the data row as the dictionary value.
i.e Given an existing  DataTable, iterate its row collection generating an id and selectively adding to a dictionary.
 
var dict = new Dictionary<int, DataRow>();
foreach(DataRow row in dt.Rows)
{
	int id = Uniqueify(row);
	if (!dict.ContainsKey(id))
		dict.Add(id, row);
}

Open in new window


where the Uniqueify function is:
 
public int Uniqueify(DataRow dr)
{
	string concat = string.Empty;
	for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}
	return concat.GetHashCode();
}

Open in new window


WARNING Rarely you can generate the same hash code for different strings so you might want to test for this on the occasion that the if (!dict.ContainsKey...) is false; for example by iterating through the column collection comparing each value. This would be the brute force approach to this solution but is minimised by utilising the more efficient method given.
0
 
LVL 11

Expert Comment

by:jasonduan
ID: 34121361
use LINQ:

static List<string> removeDuplicates(List<string> inputList)
{
    return inputList.Distinct().ToList();
}

static List<DataRow> removeDuplicates(List<DataRow> rows)
{
    return rows.Distinct(new MyRowComnparer()).ToList();
}

public class MyRowComnparer : IEqualityComparer<DataRow>
{
      public bool Equals(DataRow x, DataRow y)
      {
            // put your logic here
      }

      public int GetHashCode(DataRow obj)
      {
            // put your logic here
      }
}
0
 

Author Comment

by:gnihar
ID: 34127274
Hi,

I probably wasn't specific enough.


In the first answer, it seems that Uniquify function takes all column values from a single row:

	

for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}

Open in new window


and out of that value, gives a dictionary an unique value - hash code which is then used to uniquely identify that row in a new collection.

If I am not mistaken, this will look for all differences in all columns of a specific row, and if precisely 0 differences are found, will then add that row in a new collection.

But in my example I have to :
look in only specific columns if it's a duplicate
if it is, then I do not insert the row in a collection, but take another column value out of the 'duplicate' row and overwrite the new collections' row's column value with the same name with the value from 'duplicate' row's column.

ie.

iterate through datatables' rows
{
if impurecollection.row["column name1"] == purecollection.row["column name1] (regardless of other column values in the same row, they can be duplicates)

then

do not add the whole row in new collection

but

take impurecollection.row["column name 2"] and insert it's value instead of purecollection. row["column name 2"] where impurecollection.row["column name1"] == purecollection.row["column name1"]


and if impurecollection.row["column name1"] != purecollection.row["column name1]

then add that row from impurecollection into purecollection unchanged

}

Hope that clarifies things a bit.

0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Assisted Solution

by:gnihar
gnihar earned 0 total points
ID: 34127501
Hi again, it seems that I have found a solution using modified kraiven's piece of code.

So, here it is :

                foreach (DataRow row1 in dataSet1.Tables[0].Rows)
                {

                        int id = Uniqueify(row1);

                        if (!dict.ContainsKey(id))
                            {
                             dict.Add(id, row1);
                            }       
                        else 
                            if (dict.ContainsKey(id))
                            {
                                dict[id]["column name 2"] = row1["column name 2"];
                            }
                 }

Open in new window


and the function:

        public int Uniqueify(DataRow dr)
        {
            string concat = string.Empty;
            concat = dr[1].ToString();   // i knew the position of the specific column in a row which must not be a duplicate
            return concat.GetHashCode();
        }

Open in new window

0
 
LVL 3

Expert Comment

by:kraiven
ID: 34130357
Hi gnihar,

Thankyou for accepting my solution. I'm afraid I had missed the end of your post which is why I didn't follow-up with that solution. I'm glad my solution was adaptable however.
0
 

Author Closing Comment

by:gnihar
ID: 34162435
see my last post
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question