?
Solved

Finding Duplicate rows in a datatable to mark as duplicate and have to append the duplicate row in new datatable

Posted on 2009-12-15
8
Medium Priority
?
988 Views
Last Modified: 2012-05-08
Am using vb.net windows application
I have imported 50K rows from a CSV file to a datatable.  There may be some duplicate rows. I have one column "ISDUP" in the same datatable. i have to update the ISDUP column with True or False based on Duplicate row or not. When i find duplicate row i have to append the duplicate row in another datatable.

Here, the performence is most important thing.

Please provide the solution for this Problem.
0
Comment
Question by:zillion_kamesh
  • 2
  • 2
  • 2
  • +2
8 Comments
 
LVL 4

Expert Comment

by:utter77
ID: 26060341
Hi.

The fastes solution is to create a function that loops all rows ordered by the column you want to check for duplets and compare the current row with the last one and of it is a duplet, mark it.

sql = "select name from table1 order by name"

string lastName = ""

loop
  if currentName = lastName
    'mark this record
  end if
  lastName = currentName
end loop

Hope you understand my code example.

/MAtt
0
 
LVL 41

Expert Comment

by:Sharath
ID: 26060531
I would suggest you to import the 50k rows first into your database and then you can upate the ISDUP field based on the duplicate records.

What is your backend database? Provide the version also.
0
 
LVL 11

Expert Comment

by:ROMA CHAUHAN
ID: 26060688
You can check duplication based on some perticular columns. After getting the data in first table, you can loop for all the rows for first table.
Try this.....

  'Create second table having same structure as first table
        Dim SecondDt As New DataTable
        SecondDt = dTable.Clone

        'Loop for all the rows of first datatable
        For Each dtRow As DataRow In dTable.Rows
            'Check is there any row having same record
            Dim dtSearchedRows As DataRow() = dTable.Select("ProductId='" & dtRow("ProductId") & "' and BrochureId='" & dtRow("ProductId") & "'")
            'If there is any duplicate row
            If dtSearchedRows.Length > 0 Then
                'Set ISDUP as true
                dtRow("ISDUP") = "True"
                'Import that row in second table
                SecondDt.ImportRow(dtRow)
            End If
        Next
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 11

Expert Comment

by:ROMA CHAUHAN
ID: 26060939
Sorry there is some problem in above code so try this one........

'Create second table having same structure as first table
        Dim SecondDt As New DataTable
        SecondDt = dTable.Clone

        'Loop for all the rows of first datatable
        For Each dtRow As DataRow In dTable.Rows
            'Check is there any row having same record
            Dim dtSearchedRows As DataRow() = dTable.Select("ProductId='" & dtRow("ProductId") & "' and BrochureId='" & dtRow("ProductId") & "'")
            'If there is any duplicate row
            If dtSearchedRows.Length > 1 Then
                'Set ISDUP as true
                dtRow("ISDUP") = "True"
                'Import that row in second table
                SecondDt.ImportRow(dtRow)
           else
                'Set ISDUP as true
                dtRow("ISDUP") = "False"
            End If
        Next
0
 
LVL 9

Expert Comment

by:Trideep Patel
ID: 26060986
This function helps you

pass datatable and duplicate col name like "ISDUP" to it

Copied and changed from following link
http://kellychronicles.spaces.live.com/Blog/cns!A0D71E1614E8DBF8!406.entry

Sub SetDuplicateFlag(ByRef dt As DataTable, ByVal DupColName As String)
    Dim ssort As String = ""
    Dim col As DataColumn
    Dim scol As DataColumn
    For Each scol In dt.Columns
      If ssort <> "" Then ssort &= ","
      ssort &= scol.ColumnName
    Next 
    dt.DefaultView.Sort = ssort
    Dim i As Integer, bEquals As Boolean, c As Integer, ccount As Integer
    ccount = dt.Columns.Count
    For i = dt.Rows.Count - 1 To 0 Step -1
      bEquals = True
      For c = 0 To ccount - 1
        If Not i = 0 Then
          If dt.DefaultView(i)(c).ToString() <> dt.DefaultView(i - 1)(c).ToString() Then
            bEquals = False
            Exit For
          End If
        Else
          bEquals = False
        End If
      Next c 
      If bEquals Then
        dt.DefaultView(i).Item(DupColName) = True '(i).Delete()
      Else
        dt.DefaultView(i).Item(DupColName) = False
      End If
    Next 
  End Sub

Open in new window

0
 
LVL 9

Accepted Solution

by:
Trideep Patel earned 250 total points
ID: 26061097
Here it is again with some modifications
Sub SetDuplicateFlag(ByRef dt As DataTable, ByVal DupColName As String)
    Dim ssort As String = ""
    Dim col As DataColumn
    Dim scol As DataColumn
    For Each scol In dt.Columns
      If scol.ColumnName = DupColName Then Continue For
      If ssort <> "" Then ssort &= ","
      ssort &= scol.ColumnName
    Next 
    dt.DefaultView.Sort = ssort
    Dim i As Integer, bEquals As Boolean, c As Integer, ccount As Integer
    ccount = dt.Columns.Count
    For i = dt.Rows.Count - 1 To 0 Step -1
      bEquals = True
      For c = 0 To ccount - 1
        If dt.Columns(c).ColumnName = DupColName Then Continue For
        If Not i = 0 Then
          If dt.DefaultView(i)(c).ToString() <> dt.DefaultView(i - 1)(c).ToString() Then
            bEquals = False
            Exit For
          End If
        Else
          bEquals = False
        End If
      Next c 
      If bEquals Then
        dt.DefaultView(i - 1).Item(DupColName) = True
        dt.DefaultView(i).Item(DupColName) = True '(i).Delete()
      Else
        If dt.DefaultView(i).Item(DupColName) <> True Then
          dt.DefaultView(i).Item(DupColName) = False
        End If
        End If
    Next 
  End Sub

Open in new window

0
 

Author Closing Comment

by:zillion_kamesh
ID: 31666636
Me itself found accurate solution for this.
0
 

Author Comment

by:zillion_kamesh
ID: 26061316


Thanks for all of your replies and suggesstions..

But me itself found one solution for this problem.
In the vision of performence, we cant use dTable.Select() or  dTable.Compute() statements.

trideep's methodalogy will be somewhat useful i think. Because there is no time consuming statements.

Following is the method i used.

MainDataTable is a datatable having duplicate rows. Column used to find duplicate row is DupColName.

            Dim dtDistinct As DataTable = MainDataTable.Clone

            Dim dtDistinctkeyColumn(1) As DataColumn
            dtDistinctkeyColumn(0) = dtDistinct.Columns("DupColName")
            dtDistinct.PrimaryKey = dtDistinctkeyColumn
            Dim dtDistinctfindThese(0) As Object

            Dim drdist As DataRow
            For Each dr As DataRow In MainDataTable.Rows
                dtDistinctfindThese(0) = dr("DupColName")
                drdist = dtDistinct.Rows.Find(dtDistinctfindThese(0))
                If (drdist Is Nothing) Then
                    dr("ISDUP") = False
                    dtDistinct.Rows.Add(dr.ItemArray)
                Else
                    dr("ISDUP") = True
                    dtDuplicates.Rows.Add(dr.ItemArray)
                End If
            Next
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you're writing a .NET application to connect to an Access .mdb database and use pre-existing queries that require parameters, you've come to the right place! Let's say the pre-existing query(qryCust) in Access takes a Date as a parameter and l…
Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itsel…
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
Integration Management Part 2
Suggested Courses

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question