Link to home
Start Free TrialLog in
Avatar of nithinmohantk
nithinmohantk

asked on

Dedup a List<hashTable> in C#

I have a requirement where i need to eliminate the duplicate from a List<hashtable>

The hashTable is a row of Data, for example if there are 500 records with around 10 fields, these 10 fields willl make a Key for hashtable and it's row data will hold in to Value.
and this row (hshtable) will be added to List, so at the end List will have 500 (hashtable) records.

i need to remove duplicates from this List<> based on certiain criterian,

How can i make this possible, please help me, i need a solution in 2 days, by monday evening i need to finish it off.

so thanks in advance

could any one suggest me any sample code


Additional infor
 
List<Hashtable> executeDedup(List<Hashtable> sourceData,  List<string> uniqueFieldList, DedupRuleTypes dedupRuleType)	
 
sourceData will have list of Hashtable<colname, colvalue> for each row of actual data
uniqueFieldList is a list of field/column names to identify a row as unique in dedup
DedupRuleTypes is enum with options - Keep First Record, Keep the most complete record

Open in new window

Avatar of Anurag Thakur
Anurag Thakur
Flag of India image

to be very frank i dont like the design of using a generic list for a hash table - first it completely takes out the advantages given by the typed list as boxing and unboxing still remains

second you design can be improved a little more
please explain your requirements first as most of the features provided by the hashtable can be achieved by just using a generic list
Avatar of nithinmohantk
nithinmohantk

ASKER

My Requirement is we have a set of addresses or Excel sheet user uploads

From this excel file reocrds we need to remove duplicate records
we will read this Excel sheet and will build a hashtable , for each row of data and append it to a List<hashtable>.
it's just a parameter i need to pass, inside the method i can import in to a generic list of my choice and do the operations..
but my pm is specific about the input parameter type..

duplicate removal there is an extra parameter we are passing, List<string> uniqueFieldList, which has the column names on which we need to find unique data. this is user choice, user will specify what are the columns he need to see uniqueness..

i hope this information will help..
ASKER CERTIFIED SOLUTION
Avatar of Anurag Thakur
Anurag Thakur
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
hi ragi, i will try that. but there is one more condition i need to check..

i have two options to check, if the user specifies that keep the first record and remove rest of it.. or he can say keep the record which has more data or complete data.. this is what i'm confused a bit..

first record i can do that. but how would i know which has the most perfect or complete data. seems bit funny right.. i can check by string char length, but it wont take me any where.

suppose there are 2,3 records with same char length, then what i will do..

any ideas?
Hi Ragi it worked for my requirement, except the option i mentioned previously.
to select only the most perfect data..

anyway thanks again ragi, was a great help and i think i will modify and will find alternatives..