Solved

C# remove duplicates in a List

Posted on 2010-11-12
6
1,144 Views
Last Modified: 2013-12-17
Hi experts,

I am pasting the method to remove duplicate String objects from any given List of string.

My question is, can you modify this method to be able to remove DataRow objects that contain duplicate DataColumn values (from the List of DataRows):

static List<string> removeDuplicates(List<string> inputList)
{

Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
 
foreach (string currValue in inputList)
{

if (!uniqueStore.ContainsKey(currValue))
{

uniqueStore.Add(currValue, 0);
finalList.Add(currValue);

}

}
return finalList;

}

Open in new window



Additionally, replace the value of another datarow column in the original list with the value from last spotted duplicate.

ie. in pseudo code

if (DataRow row.DataColumn["column"]= /*duplicate*/ )
{
/*then do not add entire row in new list*/
/*but take row.DataColumn["column2"] value and insert that value instead of the original ["column2"] already present in the final list*/
}



0
Comment
Question by:gnihar
  • 3
  • 2
6 Comments
 
LVL 3

Accepted Solution

by:
kraiven earned 500 total points
ID: 34121203
Assuming that the DataRows you are interested in have no existing identifier column then the best (or at least an efficient) way to solve this problem is to generate your own identifier (say using GetHashCode()) and then use this as your dictionary key with the data row as the dictionary value.
i.e Given an existing  DataTable, iterate its row collection generating an id and selectively adding to a dictionary.
 
var dict = new Dictionary<int, DataRow>();
foreach(DataRow row in dt.Rows)
{
	int id = Uniqueify(row);
	if (!dict.ContainsKey(id))
		dict.Add(id, row);
}

Open in new window


where the Uniqueify function is:
 
public int Uniqueify(DataRow dr)
{
	string concat = string.Empty;
	for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}
	return concat.GetHashCode();
}

Open in new window


WARNING Rarely you can generate the same hash code for different strings so you might want to test for this on the occasion that the if (!dict.ContainsKey...) is false; for example by iterating through the column collection comparing each value. This would be the brute force approach to this solution but is minimised by utilising the more efficient method given.
0
 
LVL 11

Expert Comment

by:jasonduan
ID: 34121361
use LINQ:

static List<string> removeDuplicates(List<string> inputList)
{
    return inputList.Distinct().ToList();
}

static List<DataRow> removeDuplicates(List<DataRow> rows)
{
    return rows.Distinct(new MyRowComnparer()).ToList();
}

public class MyRowComnparer : IEqualityComparer<DataRow>
{
      public bool Equals(DataRow x, DataRow y)
      {
            // put your logic here
      }

      public int GetHashCode(DataRow obj)
      {
            // put your logic here
      }
}
0
 

Author Comment

by:gnihar
ID: 34127274
Hi,

I probably wasn't specific enough.


In the first answer, it seems that Uniquify function takes all column values from a single row:

	

for (int i = 0; i < dr.ItemArray.Length; i++)
	{
		concat += dr[i].ToString();
	}

Open in new window


and out of that value, gives a dictionary an unique value - hash code which is then used to uniquely identify that row in a new collection.

If I am not mistaken, this will look for all differences in all columns of a specific row, and if precisely 0 differences are found, will then add that row in a new collection.

But in my example I have to :
look in only specific columns if it's a duplicate
if it is, then I do not insert the row in a collection, but take another column value out of the 'duplicate' row and overwrite the new collections' row's column value with the same name with the value from 'duplicate' row's column.

ie.

iterate through datatables' rows
{
if impurecollection.row["column name1"] == purecollection.row["column name1] (regardless of other column values in the same row, they can be duplicates)

then

do not add the whole row in new collection

but

take impurecollection.row["column name 2"] and insert it's value instead of purecollection. row["column name 2"] where impurecollection.row["column name1"] == purecollection.row["column name1"]


and if impurecollection.row["column name1"] != purecollection.row["column name1]

then add that row from impurecollection into purecollection unchanged

}

Hope that clarifies things a bit.

0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Assisted Solution

by:gnihar
gnihar earned 0 total points
ID: 34127501
Hi again, it seems that I have found a solution using modified kraiven's piece of code.

So, here it is :

                foreach (DataRow row1 in dataSet1.Tables[0].Rows)
                {

                        int id = Uniqueify(row1);

                        if (!dict.ContainsKey(id))
                            {
                             dict.Add(id, row1);
                            }       
                        else 
                            if (dict.ContainsKey(id))
                            {
                                dict[id]["column name 2"] = row1["column name 2"];
                            }
                 }

Open in new window


and the function:

        public int Uniqueify(DataRow dr)
        {
            string concat = string.Empty;
            concat = dr[1].ToString();   // i knew the position of the specific column in a row which must not be a duplicate
            return concat.GetHashCode();
        }

Open in new window

0
 
LVL 3

Expert Comment

by:kraiven
ID: 34130357
Hi gnihar,

Thankyou for accepting my solution. I'm afraid I had missed the end of your post which is why I didn't follow-up with that solution. I'm glad my solution was adaptable however.
0
 

Author Closing Comment

by:gnihar
ID: 34162435
see my last post
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now