Avatar of CallConnection

asked on 

deduplicating in c# code

Hello Experts,
I am writing a method that needs to deduplicate the data in code; just plain old c# - no sql here...
I'm reading in a datatable and a list of fields to dedup on and want to return deduped list;
I was wondring if you could please help me fill in the "todo" section?

(I suspect the challenge for meis making this reasonably quick, so it might be worth keeping in mind if there are several ways of doing it.. :) )

Thank you!!

public static DataTable Deduplicate(DataTable ds,string dedup_on_fields_str)
            string[] dedup_on_fields_arr = dedup_on_fields_str.Split(',');
            //to do..
            return ds;

Open in new window

C#.NET ProgrammingWeb Development

Avatar of undefined
Last Comment
Dmitry G
Avatar of apresto
Flag of Italy image

Could you elaborate slightly.  As i understand, you are pulling data from a database and you want to de-dup the returned data.
If so, question #1:
Are you trying t de-dup the entire row, or based on a particular field, i.e Name, or Surname
question #2
Why are you not using SQL to so this if you are trying to remove duplicate rows?
Avatar of CallConnection


Hi Apresto - thanks for looking into this..

It's not actually going to load from a database, I am writing a service that will accept requests from elsewhere. I'd like to make it a little more generic, so I wanted to use a datatable to dedup the data (before I insert the data into the database).

The reason for not doing it on the database, is because I am hoping to avoid pushing the high load  of data when it it's not needed - reducing the nework load, disk io etc.

I would actually like to dedup on particular fields; I would like to read them in as a string and turn into array (dedup_on_fields_arr)...

This is just the general idea, but if you think there are better ways of doing this I'm all ears.

Avatar of Dmitry G
Dmitry G
Flag of New Zealand image


as I understand you would like to remove duplications if the fields "A", "B", "C" for rows are same, and does not matter what is the content of other fields. Am I right?
What are datatypes in thisw fields? (might be not too important...)

Another Dmitry
Avatar of CallConnection


Essentially that's right; You always would want to dedup on Something - be it ID or Surname, or a bunch of fields.
The content of other fields is less important here than removing duplicates;
I have an idea how to treat these 'extra' fields, but I wouldn't want the basic example to delve too deeply into it...

As for data types -- I suppose it needs to be generic enough to deal with different data types.
Generally you can always find the datatypes in the properties of the datacolumn - so if your datatable is dt:
            foreach (DataColumn col in dt.Columns)
Saying that one can assume "string" to begin with, and it can be expanded later to other types...
Avatar of Dmitry G
Dmitry G
Flag of New Zealand image

Blurred text
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Dmitry G
Dmitry G
Flag of New Zealand image

In form_load I jusr create datatable and put data. Rows 0 and 2 are duplicated...
.NET Programming
.NET Programming

The .NET Framework is not specific to any one programming language; rather, it includes a library of functions that allows developers to rapidly build applications. Several supported languages include C#, VB.NET, C++ or ASP.NET.

Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews


IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo