• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1045
  • Last Modified:

Dedup a List<hashTable> in C#

I have a requirement where i need to eliminate the duplicate from a List<hashtable>

The hashTable is a row of Data, for example if there are 500 records with around 10 fields, these 10 fields willl make a Key for hashtable and it's row data will hold in to Value.
and this row (hshtable) will be added to List, so at the end List will have 500 (hashtable) records.

i need to remove duplicates from this List<> based on certiain criterian,

How can i make this possible, please help me, i need a solution in 2 days, by monday evening i need to finish it off.

so thanks in advance

could any one suggest me any sample code

Additional infor
List<Hashtable> executeDedup(List<Hashtable> sourceData,  List<string> uniqueFieldList, DedupRuleTypes dedupRuleType)	
sourceData will have list of Hashtable<colname, colvalue> for each row of actual data
uniqueFieldList is a list of field/column names to identify a row as unique in dedup
DedupRuleTypes is enum with options - Keep First Record, Keep the most complete record

Open in new window

  • 3
  • 2
1 Solution
Anurag ThakurCommented:
to be very frank i dont like the design of using a generic list for a hash table - first it completely takes out the advantages given by the typed list as boxing and unboxing still remains

second you design can be improved a little more
please explain your requirements first as most of the features provided by the hashtable can be achieved by just using a generic list
nithinmohantkAuthor Commented:
My Requirement is we have a set of addresses or Excel sheet user uploads

From this excel file reocrds we need to remove duplicate records
we will read this Excel sheet and will build a hashtable , for each row of data and append it to a List<hashtable>.
it's just a parameter i need to pass, inside the method i can import in to a generic list of my choice and do the operations..
but my pm is specific about the input parameter type..

duplicate removal there is an extra parameter we are passing, List<string> uniqueFieldList, which has the column names on which we need to find unique data. this is user choice, user will specify what are the columns he need to see uniqueness..

i hope this information will help..
Anurag ThakurCommented:
now i understand your problem in a better way

my suggestion how to handle the problem is as
add the data you have is in a datatable (either from excel or from the addresses objects)
then write a function for finding distinct values from the following link http://weblogs.asp.net/eporter/archive/2005/02/10/370548.aspx and get the unique values
nithinmohantkAuthor Commented:
hi ragi, i will try that. but there is one more condition i need to check..

i have two options to check, if the user specifies that keep the first record and remove rest of it.. or he can say keep the record which has more data or complete data.. this is what i'm confused a bit..

first record i can do that. but how would i know which has the most perfect or complete data. seems bit funny right.. i can check by string char length, but it wont take me any where.

suppose there are 2,3 records with same char length, then what i will do..

any ideas?
nithinmohantkAuthor Commented:
Hi Ragi it worked for my requirement, except the option i mentioned previously.
to select only the most perfect data..

anyway thanks again ragi, was a great help and i think i will modify and will find alternatives..


Featured Post

Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now