zillion_kamesh
asked on
Finding Duplicate rows in a datatable to mark as duplicate and have to append the duplicate row in new datatable
Am using vb.net windows application
I have imported 50K rows from a CSV file to a datatable. There may be some duplicate rows. I have one column "ISDUP" in the same datatable. i have to update the ISDUP column with True or False based on Duplicate row or not. When i find duplicate row i have to append the duplicate row in another datatable.
Here, the performence is most important thing.
Please provide the solution for this Problem.
I have imported 50K rows from a CSV file to a datatable. There may be some duplicate rows. I have one column "ISDUP" in the same datatable. i have to update the ISDUP column with True or False based on Duplicate row or not. When i find duplicate row i have to append the duplicate row in another datatable.
Here, the performence is most important thing.
Please provide the solution for this Problem.
I would suggest you to import the 50k rows first into your database and then you can upate the ISDUP field based on the duplicate records.
What is your backend database? Provide the version also.
What is your backend database? Provide the version also.
You can check duplication based on some perticular columns. After getting the data in first table, you can loop for all the rows for first table.
Try this.....
'Create second table having same structure as first table
Dim SecondDt As New DataTable
SecondDt = dTable.Clone
'Loop for all the rows of first datatable
For Each dtRow As DataRow In dTable.Rows
'Check is there any row having same record
Dim dtSearchedRows As DataRow() = dTable.Select("ProductId=' " & dtRow("ProductId") & "' and BrochureId='" & dtRow("ProductId") & "'")
'If there is any duplicate row
If dtSearchedRows.Length > 0 Then
'Set ISDUP as true
dtRow("ISDUP") = "True"
'Import that row in second table
SecondDt.ImportRow(dtRow)
End If
Next
Try this.....
'Create second table having same structure as first table
Dim SecondDt As New DataTable
SecondDt = dTable.Clone
'Loop for all the rows of first datatable
For Each dtRow As DataRow In dTable.Rows
'Check is there any row having same record
Dim dtSearchedRows As DataRow() = dTable.Select("ProductId='
'If there is any duplicate row
If dtSearchedRows.Length > 0 Then
'Set ISDUP as true
dtRow("ISDUP") = "True"
'Import that row in second table
SecondDt.ImportRow(dtRow)
End If
Next
Sorry there is some problem in above code so try this one........
'Create second table having same structure as first table
Dim SecondDt As New DataTable
SecondDt = dTable.Clone
'Loop for all the rows of first datatable
For Each dtRow As DataRow In dTable.Rows
'Check is there any row having same record
Dim dtSearchedRows As DataRow() = dTable.Select("ProductId=' " & dtRow("ProductId") & "' and BrochureId='" & dtRow("ProductId") & "'")
'If there is any duplicate row
If dtSearchedRows.Length > 1 Then
'Set ISDUP as true
dtRow("ISDUP") = "True"
'Import that row in second table
SecondDt.ImportRow(dtRow)
else
'Set ISDUP as true
dtRow("ISDUP") = "False"
End If
Next
'Create second table having same structure as first table
Dim SecondDt As New DataTable
SecondDt = dTable.Clone
'Loop for all the rows of first datatable
For Each dtRow As DataRow In dTable.Rows
'Check is there any row having same record
Dim dtSearchedRows As DataRow() = dTable.Select("ProductId='
'If there is any duplicate row
If dtSearchedRows.Length > 1 Then
'Set ISDUP as true
dtRow("ISDUP") = "True"
'Import that row in second table
SecondDt.ImportRow(dtRow)
else
'Set ISDUP as true
dtRow("ISDUP") = "False"
End If
Next
This function helps you
pass datatable and duplicate col name like "ISDUP" to it
Copied and changed from following link
http://kellychronicles.spaces.live.com/Blog/cns!A0D71E1614E8DBF8!406.entry
pass datatable and duplicate col name like "ISDUP" to it
Copied and changed from following link
http://kellychronicles.spaces.live.com/Blog/cns!A0D71E1614E8DBF8!406.entry
Sub SetDuplicateFlag(ByRef dt As DataTable, ByVal DupColName As String)
Dim ssort As String = ""
Dim col As DataColumn
Dim scol As DataColumn
For Each scol In dt.Columns
If ssort <> "" Then ssort &= ","
ssort &= scol.ColumnName
Next
dt.DefaultView.Sort = ssort
Dim i As Integer, bEquals As Boolean, c As Integer, ccount As Integer
ccount = dt.Columns.Count
For i = dt.Rows.Count - 1 To 0 Step -1
bEquals = True
For c = 0 To ccount - 1
If Not i = 0 Then
If dt.DefaultView(i)(c).ToString() <> dt.DefaultView(i - 1)(c).ToString() Then
bEquals = False
Exit For
End If
Else
bEquals = False
End If
Next c
If bEquals Then
dt.DefaultView(i).Item(DupColName) = True '(i).Delete()
Else
dt.DefaultView(i).Item(DupColName) = False
End If
Next
End Sub
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Me itself found accurate solution for this.
ASKER
Thanks for all of your replies and suggesstions..
But me itself found one solution for this problem.
In the vision of performence, we cant use dTable.Select() or dTable.Compute() statements.
trideep's methodalogy will be somewhat useful i think. Because there is no time consuming statements.
Following is the method i used.
MainDataTable is a datatable having duplicate rows. Column used to find duplicate row is DupColName.
Dim dtDistinct As DataTable = MainDataTable.Clone
Dim dtDistinctkeyColumn(1) As DataColumn
dtDistinctkeyColumn(0) = dtDistinct.Columns("DupCol
dtDistinct.PrimaryKey = dtDistinctkeyColumn
Dim dtDistinctfindThese(0) As Object
Dim drdist As DataRow
For Each dr As DataRow In MainDataTable.Rows
dtDistinctfindThese(0) = dr("DupColName")
drdist = dtDistinct.Rows.Find(dtDis
If (drdist Is Nothing) Then
dr("ISDUP") = False
dtDistinct.Rows.Add(dr.Ite
Else
dr("ISDUP") = True
dtDuplicates.Rows.Add(dr.I
End If
Next
The fastes solution is to create a function that loops all rows ordered by the column you want to check for duplets and compare the current row with the last one and of it is a duplet, mark it.
sql = "select name from table1 order by name"
string lastName = ""
loop
if currentName = lastName
'mark this record
end if
lastName = currentName
end loop
Hope you understand my code example.
/MAtt