Link to home
Start Free TrialLog in
Avatar of zillion_kamesh
zillion_kamesh

asked on

Finding Duplicate rows in a datatable to mark as duplicate and have to append the duplicate row in new datatable

Am using vb.net windows application
I have imported 50K rows from a CSV file to a datatable.  There may be some duplicate rows. I have one column "ISDUP" in the same datatable. i have to update the ISDUP column with True or False based on Duplicate row or not. When i find duplicate row i have to append the duplicate row in another datatable.

Here, the performence is most important thing.

Please provide the solution for this Problem.
Avatar of utter77
utter77

Hi.

The fastes solution is to create a function that loops all rows ordered by the column you want to check for duplets and compare the current row with the last one and of it is a duplet, mark it.

sql = "select name from table1 order by name"

string lastName = ""

loop
  if currentName = lastName
    'mark this record
  end if
  lastName = currentName
end loop

Hope you understand my code example.

/MAtt
Avatar of Sharath S
I would suggest you to import the 50k rows first into your database and then you can upate the ISDUP field based on the duplicate records.

What is your backend database? Provide the version also.
You can check duplication based on some perticular columns. After getting the data in first table, you can loop for all the rows for first table.
Try this.....

  'Create second table having same structure as first table
        Dim SecondDt As New DataTable
        SecondDt = dTable.Clone

        'Loop for all the rows of first datatable
        For Each dtRow As DataRow In dTable.Rows
            'Check is there any row having same record
            Dim dtSearchedRows As DataRow() = dTable.Select("ProductId='" & dtRow("ProductId") & "' and BrochureId='" & dtRow("ProductId") & "'")
            'If there is any duplicate row
            If dtSearchedRows.Length > 0 Then
                'Set ISDUP as true
                dtRow("ISDUP") = "True"
                'Import that row in second table
                SecondDt.ImportRow(dtRow)
            End If
        Next
Sorry there is some problem in above code so try this one........

'Create second table having same structure as first table
        Dim SecondDt As New DataTable
        SecondDt = dTable.Clone

        'Loop for all the rows of first datatable
        For Each dtRow As DataRow In dTable.Rows
            'Check is there any row having same record
            Dim dtSearchedRows As DataRow() = dTable.Select("ProductId='" & dtRow("ProductId") & "' and BrochureId='" & dtRow("ProductId") & "'")
            'If there is any duplicate row
            If dtSearchedRows.Length > 1 Then
                'Set ISDUP as true
                dtRow("ISDUP") = "True"
                'Import that row in second table
                SecondDt.ImportRow(dtRow)
           else
                'Set ISDUP as true
                dtRow("ISDUP") = "False"
            End If
        Next
This function helps you

pass datatable and duplicate col name like "ISDUP" to it

Copied and changed from following link
http://kellychronicles.spaces.live.com/Blog/cns!A0D71E1614E8DBF8!406.entry

Sub SetDuplicateFlag(ByRef dt As DataTable, ByVal DupColName As String)
    Dim ssort As String = ""
    Dim col As DataColumn
    Dim scol As DataColumn
    For Each scol In dt.Columns
      If ssort <> "" Then ssort &= ","
      ssort &= scol.ColumnName
    Next 
    dt.DefaultView.Sort = ssort
    Dim i As Integer, bEquals As Boolean, c As Integer, ccount As Integer
    ccount = dt.Columns.Count
    For i = dt.Rows.Count - 1 To 0 Step -1
      bEquals = True
      For c = 0 To ccount - 1
        If Not i = 0 Then
          If dt.DefaultView(i)(c).ToString() <> dt.DefaultView(i - 1)(c).ToString() Then
            bEquals = False
            Exit For
          End If
        Else
          bEquals = False
        End If
      Next c 
      If bEquals Then
        dt.DefaultView(i).Item(DupColName) = True '(i).Delete()
      Else
        dt.DefaultView(i).Item(DupColName) = False
      End If
    Next 
  End Sub

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Trideep Patel
Trideep Patel
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of zillion_kamesh

ASKER

Me itself found accurate solution for this.


Thanks for all of your replies and suggesstions..

But me itself found one solution for this problem.
In the vision of performence, we cant use dTable.Select() or  dTable.Compute() statements.

trideep's methodalogy will be somewhat useful i think. Because there is no time consuming statements.

Following is the method i used.

MainDataTable is a datatable having duplicate rows. Column used to find duplicate row is DupColName.

            Dim dtDistinct As DataTable = MainDataTable.Clone

            Dim dtDistinctkeyColumn(1) As DataColumn
            dtDistinctkeyColumn(0) = dtDistinct.Columns("DupColName")
            dtDistinct.PrimaryKey = dtDistinctkeyColumn
            Dim dtDistinctfindThese(0) As Object

            Dim drdist As DataRow
            For Each dr As DataRow In MainDataTable.Rows
                dtDistinctfindThese(0) = dr("DupColName")
                drdist = dtDistinct.Rows.Find(dtDistinctfindThese(0))
                If (drdist Is Nothing) Then
                    dr("ISDUP") = False
                    dtDistinct.Rows.Add(dr.ItemArray)
                Else
                    dr("ISDUP") = True
                    dtDuplicates.Rows.Add(dr.ItemArray)
                End If
            Next