G_M
asked on
Using VB.NET, remove rows containing duplicate values in Excel unless cell contains only numbers
Ok, I had a previous post requesting removal of rows containing duplicate values. This was answered promptly and well (thank you imnorie). Now I need to remove those duplicate values without removing the cells/rows containing only numerical data.
Initially I thought of copying the numerical data out of the document, sorting it and then putting it back in. But my inexperience got the better of me. Below is the script so far. If someone could lend me a hand, I'd really appreciate the help.
Cheers
G_M
Initially I thought of copying the numerical data out of the document, sorting it and then putting it back in. But my inexperience got the better of me. Below is the script so far. If someone could lend me a hand, I'd really appreciate the help.
Cheers
G_M
Public Sub SortDocumentInExcel()
Dim strWorkingFile As String
Dim objExcel As Excel.Application
Dim objWorkbook As Excel.Workbook
Dim objWorkSheet As Excel.Worksheet
Dim objExcelRange As Excel.Range
Dim LastRow As Long
strWorkingFile = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
objExcel = CreateObject("Excel.Application")
objWorkbook = objExcel.Workbooks.Open(strWorkingFile)
objWorkSheet = objWorkbook.Sheets("sheet1")
' Sort the data in the "A" column; expanding selection to other columns
objWorkSheet.Cells.Sort(Key1:=objWorkSheet.Range("A2"), _
Order1:=Excel.XlSortOrder.xlAscending, Header:=Excel.XlYesNoGuess.xlYes, _
Orientation:=Excel.XlSortOrientation.xlSortColumns, SortMethod:=Excel.XlSortMethod.xlPinYin)
'Remove rows based on duplicate values from "A" column
LastRow = objWorkSheet.Range("A" & objWorkSheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
objExcelRange = objWorkSheet.Range("A1").Resize(LastRow, 3)
objExcelRange.RemoveDuplicates(Columns:=1, Header:=Excel.XlYesNoGuess.xlYes)
objWorkbook.Save()
objWorkbook.Close()
objExcel.Quit()
objExcel = Nothing
objWorkbook = Nothing
objWorkSheet = Nothing
End Sub
ASKER
A B C
1 Reference Page Document
2 2002 Page 1 test.docx
3 2002 Page 2 test.docx
4 1999 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data Page 3 test.docx
7 Test Data, 2002 Page 5 test.docx
A B C
1 Reference Page Document
2 1999 Page 1 test.docx
3 2002 Page 2 test.docx
4 2002 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data, 2002 Page 3 test.docx
1 Reference Page Document
2 2002 Page 1 test.docx
3 2002 Page 2 test.docx
4 1999 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data Page 3 test.docx
7 Test Data, 2002 Page 5 test.docx
A B C
1 Reference Page Document
2 1999 Page 1 test.docx
3 2002 Page 2 test.docx
4 2002 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data, 2002 Page 3 test.docx
ASKER
Sorry... hit Submit instead of Preview.
This is the unsorted data:
A B C
1 Reference Page Document
2 2002 Page 1 test.docx
3 2002 Page 2 test.docx
4 1999 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data Page 3 test.docx
7 Test Data, 2002 Page 5 test.docx
This is the sorted data:
A B C
1 Reference Page Document
2 1999 Page 1 test.docx
3 2002 Page 2 test.docx
4 2002 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data, 2002 Page 3 test.docx
Cheers G_M
This is the unsorted data:
A B C
1 Reference Page Document
2 2002 Page 1 test.docx
3 2002 Page 2 test.docx
4 1999 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data Page 3 test.docx
7 Test Data, 2002 Page 5 test.docx
This is the sorted data:
A B C
1 Reference Page Document
2 1999 Page 1 test.docx
3 2002 Page 2 test.docx
4 2002 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data, 2002 Page 3 test.docx
Cheers G_M
ASKER
And that is still incorrect...
The unsorted data should be
A B C
1 Reference Page Document
2 2002 Page 1 test.docx
3 2002 Page 2 test.docx
4 1999 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data Page 3 test.docx
7 Test Data, 2002 Page 5 test.docx
8 Test Data, 2002 Page 4 test.docx
The unsorted data should be
A B C
1 Reference Page Document
2 2002 Page 1 test.docx
3 2002 Page 2 test.docx
4 1999 Page 1 test.docx
5 Test Data Page 3 test.docx
6 Test Data Page 3 test.docx
7 Test Data, 2002 Page 5 test.docx
8 Test Data, 2002 Page 4 test.docx
I think I see what you mean.
Not sure how to approach it though.
One thing that might work is to add a formula in column D that indicates whether or no the reference should be kept.
Something like this:
=IF(ISNUMBER(A2),"Yes",IF( COUNTIF($A $2:A2,A2)= 1, "YES","NO"))
That would put YES in column D if the row is to be kept and NO otherwise.
Then the data could be filtered to leave just the rows to keep.
I'm sure there's a some other, better? way but can't think of anything else off the top of my head.
Not sure how to approach it though.
One thing that might work is to add a formula in column D that indicates whether or no the reference should be kept.
Something like this:
=IF(ISNUMBER(A2),"Yes",IF(
That would put YES in column D if the row is to be kept and NO otherwise.
Then the data could be filtered to leave just the rows to keep.
I'm sure there's a some other, better? way but can't think of anything else off the top of my head.
ASKER
Ok imnorie, I used your idea (sort of) identifying if the data in the A column contained a a numerical value by inserting the following equation into column D:
=IF(ISNUMBER(A#), "Yes", "NO")
I then attempt to extract the data to a second workbook called "wf2_" & strWorkingFileTo
I have managed to copy the first two values into the new document; however, after that I get an error with the objExcelRange.offset(1) increment that I have created.
The intent is as follows:
1. Sort the data in "A" column in objWorkbook Alphabetically (done)
2. Move/Cut all the rows containing numbers in "A" column of objWorkbook into objWorkbookNew (errors in script)
3. Remove duplicates from "A" column in objWorkbook (done)
4. Move/Cut the remaining data in objWorkbook to objWorkbookNew
5. Delete objWorkbook
I have gone around in circles with this and again, my inexperience got the better of me.
I'm sure there is something small I am missing here. I have attached some sample data and the code to the point I have described above.
Hope someone can help.
Cheers
G_M
=IF(ISNUMBER(A#), "Yes", "NO")
I then attempt to extract the data to a second workbook called "wf2_" & strWorkingFileTo
I have managed to copy the first two values into the new document; however, after that I get an error with the objExcelRange.offset(1) increment that I have created.
The intent is as follows:
1. Sort the data in "A" column in objWorkbook Alphabetically (done)
2. Move/Cut all the rows containing numbers in "A" column of objWorkbook into objWorkbookNew (errors in script)
3. Remove duplicates from "A" column in objWorkbook (done)
4. Move/Cut the remaining data in objWorkbook to objWorkbookNew
5. Delete objWorkbook
I have gone around in circles with this and again, my inexperience got the better of me.
I'm sure there is something small I am missing here. I have attached some sample data and the code to the point I have described above.
Hope someone can help.
Cheers
G_M
Public Sub SortDocumentInExcel()
Dim objExcel As Excel.Application
Dim strWorkingFileFrom As String
Dim objWorkbook As Excel.Workbook
Dim objWorkSheet As Excel.Worksheet
Dim objExcelRange As Excel.Range
Dim strWorkingFileTo As String
Dim objWorkbookNew As Excel.Workbook
Dim objWorkSheetNew As Excel.Worksheet
Dim objExcelRange2 As Excel.Range
Dim objExcelRange3 As Excel.Range
Dim LastRow As Long
Dim misValue As Object = System.Reflection.Missing.Value
'Values for original Excel document... The document with the raw data
strWorkingFileFrom = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
objExcel = CreateObject("Excel.Application")
objWorkbook = objExcel.Workbooks.Open(strWorkingFileFrom)
objWorkSheet = objWorkbook.Sheets("sheet1")
objExcelRange = objWorkSheet.Range("A2")
'Values for the new Excel document... The document with the filtered data
strWorkingFileTo = frmGetReferences.lblOutputFolder.Text & "wf2_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
objWorkbookNew = objExcel.Workbooks.Add(misValue)
objWorkSheetNew = objWorkbookNew.Sheets("sheet1")
objWorkSheetNew.SaveAs(strWorkingFileTo)
objExcelRange2 = objWorkSheetNew.Range("A" & objWorkSheetNew.Rows.Count).End(Excel.XlDirection.xlUp)
'Sort the data in the "A" column; expanding selection to other columns
objWorkSheet.Cells.Sort(Key1:=objWorkSheet.Range("A2"), _
Order1:=Excel.XlSortOrder.xlAscending, Header:=Excel.XlYesNoGuess.xlYes, _
Orientation:=Excel.XlSortOrientation.xlSortColumns, SortMethod:=Excel.XlSortMethod.xlPinYin)
'Extract data if column "D" is equal to "Yes"... Extracts numeric data only
Do While objExcelRange.Value <> 0
If objExcelRange.Row <> 1 Then
If objExcelRange.Offset(, 3).Value = "Yes" Then
objExcelRange.Cut(objExcelRange2)
'objExcelRange2.Paste()
objWorkbookNew.Save()
'objExcelRange = objExcelRange.Offset(1)
objExcelRange2 = objExcelRange2.Offset(1)
Else
Exit Do
End If
objExcelRange = objExcelRange.Offset(1)
End If
Loop
'Remove rows based on duplicate values from "A" column
LastRow = objWorkSheet.Range("A" & objWorkSheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
objExcelRange3 = objWorkSheet.Range("A1").Resize(LastRow, 3)
objExcelRange3.RemoveDuplicates(Columns:=1, Header:=Excel.XlYesNoGuess.xlYes)
objWorkbookNew.Save()
objWorkbookNew.Close()
objWorkbook.Save()
objWorkbook.Close()
objExcel.Quit()
objExcel = Nothing
objWorkbook = Nothing
objWorkSheet = Nothing
End Sub
End Module
wf-ch1.xlsx
I'll have a look at it, just want to clarify what you want to do first:
1 Extract all the references with nos to new sheet.
2 Remove duplicates from remaining references.
3 Combine 2 lists.
One thing I notice you seem to be doing in the code is creating a new workbook.
Is that for the eventual result?
Does it have to be a new workbook?
This could all be done within the same workbook and the original list could be left untouched.
If you really want a new workbook that could be done right at the end.
Using just one workbook would makes things a lot easier than working with 2 workbooks.
1 Extract all the references with nos to new sheet.
2 Remove duplicates from remaining references.
3 Combine 2 lists.
One thing I notice you seem to be doing in the code is creating a new workbook.
Is that for the eventual result?
Does it have to be a new workbook?
This could all be done within the same workbook and the original list could be left untouched.
If you really want a new workbook that could be done right at the end.
Using just one workbook would makes things a lot easier than working with 2 workbooks.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hmmm... I didn't really know how to handle it with one workbook and though it easier to use two. But, being a novice leaves what ever you think appropriate as best :o) I am working with the intent to eventually process this data to three different formats at the end... .docx, .txt and .xlsx. I am hoping to pick up enough knowledge along the way to get this done.
I don't really care if I keep the origainal data, I just want to filter out what is not needed. This working file will eventually be deleted; however, it would probably be handy for debugging my mistakes.
Looking into it further, I seem to have an issue with my Do While Loop and the string data from the workbook isn't validating. I feel like I'm so close though.
Cheers
G_M
I don't really care if I keep the origainal data, I just want to filter out what is not needed. This working file will eventually be deleted; however, it would probably be handy for debugging my mistakes.
Looking into it further, I seem to have an issue with my Do While Loop and the string data from the workbook isn't validating. I feel like I'm so close though.
Cheers
G_M
There are no duplicates in the sample file.
The code I posted works with the data you posted earlier and copies the references/rows required to another worksheet.
It still needs tidied up as bit, deleting the helper column, the criteria etc
The code I posted works with the data you posted earlier and copies the references/rows required to another worksheet.
It still needs tidied up as bit, deleting the helper column, the criteria etc
ASKER
Thank you imnorie, that worked a treat (although I'll have to study it a bit to work out how) ;o)
I'll leave it at that for the moment. I'll see if I can push on a little further with this on my own.
Thanks again. You're a champion :o)
I'll leave it at that for the moment. I'll see if I can push on a little further with this on my own.
Thanks again. You're a champion :o)
Public Sub SortDocumentInExcel()
Dim objExcel As Excel.Application
Dim objWorkbook As Excel.Workbook
Dim objWorksheet As Excel.Worksheet
Dim objWorksheetNew As Excel.Worksheet
Dim objExcelRange As Excel.Range
Dim objExcelRange2 As Excel.Range
Dim strInputFile As String
Dim LastRow As Long
strInputFile = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
objExcel = New Excel.Application
objWorkbook = objExcel.Workbooks.Open(strInputFile)
objWorksheet = objWorkbook.Worksheets(1)
LastRow = objWorksheet.Range("A" & objWorksheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
objExcelRange = objWorksheet.Range("A1").Resize(LastRow, 3)
objWorksheet.Range("D1").Value = "Keep"
objWorksheet.Range("D2:D" & LastRow).Formula = "=IF(ISNUMBER(A2),""Yes"",IF(COUNTIF($A$2:A2,A2)=1, ""Yes"",""No""))"
objExcelRange = objExcelRange.Resize(, 4)
objWorksheetNew = objWorkbook.Worksheets.Add(After:=objWorksheet)
objWorksheetNew.Range("A1").Value = "Keep"
objWorksheetNew.Range("A2").Value = "Yes"
objExcelRange2 = objWorksheetNew.Range("A1:A2")
objExcelRange.AdvancedFilter(Action:=Excel.XlFilterAction.xlFilterCopy, _
CriteriaRange:=objExcelRange2, CopyToRange:=objWorksheetNew.Range("D1"), Unique:=True)
objExcel.Visible = True
objExcel.UserControl = True
objWorkbook.Close(True)
objExcel.Quit()
objWorkbook = Nothing
objExcel = Nothing
It's quite simple really.
1 Open file.
2 Identify data.
3 Add extra (helper) column with formula to identify rows to keep.
4 Add a worksheet for the results
5 Set up criteria that will only return the rows to keep.
6 Apply advanced filter to copy ony the rows to keep to the newly created worksheet.
7 Delete criteria.
8 Delete helper column(s)
9 Sort.
10 Save and close workbook.
Obviously I've not done 7-10 but this is how 7 & 8 can be done.
1 Open file.
2 Identify data.
3 Add extra (helper) column with formula to identify rows to keep.
4 Add a worksheet for the results
5 Set up criteria that will only return the rows to keep.
6 Apply advanced filter to copy ony the rows to keep to the newly created worksheet.
7 Delete criteria.
8 Delete helper column(s)
9 Sort.
10 Save and close workbook.
Obviously I've not done 7-10 but this is how 7 & 8 can be done.
objXLDst.Range("A1:C1").EntireColumn.Delete
objXLSrc.Range("D1").EntireColumn.Delete
objXLDst.Range("D1").EntireColumn.Delete
ASKER
I was just about to work all that out... Thank you again imnorie
Perhaps it would be easier to only get the references you want from the document.
Can you post some sample data before/after?
PS I think there's a way to relate questions.
That might help you keep tabs on things, and also alert people that answered previous questions.