Link to home
Start Free TrialLog in
Avatar of G_M
G_MFlag for Australia

asked on

Using VB.NET, remove rows containing duplicate values in Excel unless cell contains only numbers

Ok, I had a previous post requesting removal of rows containing duplicate values. This was answered promptly and well (thank you imnorie). Now I need to remove those duplicate values without removing the cells/rows containing only numerical data.

Initially I thought of copying the numerical  data out of the document, sorting it and then putting it back in. But my inexperience got the better of me. Below is the script so far. If someone could lend me a hand, I'd really appreciate the help.

Cheers
G_M

 
Public Sub SortDocumentInExcel()

        Dim strWorkingFile As String
        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorkSheet As Excel.Worksheet
        Dim objExcelRange As Excel.Range
        Dim LastRow As Long


        strWorkingFile = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = CreateObject("Excel.Application")
        objWorkbook = objExcel.Workbooks.Open(strWorkingFile)
        objWorkSheet = objWorkbook.Sheets("sheet1")

        ' Sort the data in the "A" column; expanding selection to other columns
        objWorkSheet.Cells.Sort(Key1:=objWorkSheet.Range("A2"), _
                                Order1:=Excel.XlSortOrder.xlAscending, Header:=Excel.XlYesNoGuess.xlYes, _
                                Orientation:=Excel.XlSortOrientation.xlSortColumns, SortMethod:=Excel.XlSortMethod.xlPinYin)

        'Remove rows based on duplicate values from "A" column
        LastRow = objWorkSheet.Range("A" & objWorkSheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
        objExcelRange = objWorkSheet.Range("A1").Resize(LastRow, 3)
        objExcelRange.RemoveDuplicates(Columns:=1, Header:=Excel.XlYesNoGuess.xlYes)

        objWorkbook.Save()
        objWorkbook.Close()
        objExcel.Quit()

        objExcel = Nothing
        objWorkbook = Nothing
        objWorkSheet = Nothing


    End Sub

Open in new window

Avatar of Norie
Norie

This is quite a bit different  as far as I can see.

Perhaps it would be easier to only get the references you want from the document.

Can you post some sample data before/after?

PS I think there's a way to relate questions.

That might help you keep tabs on things, and also alert people that answered previous questions.
Avatar of G_M

ASKER

       A                              B                    C
1 Reference                  Page         Document
2 2002                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 1999                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data                   Page 3      test.docx
7 Test Data, 2002         Page 5      test.docx

        A                              B                    C
1 Reference                  Page         Document
2 1999                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 2002                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data, 2002     Page 3      test.docx

Avatar of G_M

ASKER

Sorry... hit Submit instead of Preview.

This is the unsorted data:
       A                              B                    C
1 Reference                  Page         Document
2 2002                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 1999                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data                   Page 3      test.docx
7 Test Data, 2002         Page 5      test.docx

This is the sorted data:
        A                              B                    C
1 Reference                  Page         Document
2 1999                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 2002                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data, 2002         Page 3      test.docx

Cheers G_M
Avatar of G_M

ASKER

And that is still incorrect...

The unsorted data should be

       A                              B                    C
1 Reference                  Page         Document
2 2002                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 1999                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data                   Page 3      test.docx
7 Test Data, 2002         Page 5      test.docx
8 Test Data, 2002         Page 4      test.docx
I think I see what you mean.

Not sure how to approach it though.

One thing that might work is to add a formula in column D that indicates whether or no the reference should be kept.

Something like this:

=IF(ISNUMBER(A2),"Yes",IF(COUNTIF($A$2:A2,A2)=1, "YES","NO"))

That would put YES in column D if the row is to be kept and NO otherwise.

Then the data could be filtered to leave just the rows to keep.

I'm sure there's a some other, better? way but can't think of anything else off the top of my head.
Avatar of G_M

ASKER

Ok imnorie, I used your idea (sort of) identifying if the data in the A column contained a a numerical value by inserting the following equation into column D:

=IF(ISNUMBER(A#), "Yes", "NO")

I then attempt to extract the data to a second workbook called "wf2_" & strWorkingFileTo

I have managed to copy the first two values into the new document; however, after that I get an error with the objExcelRange.offset(1) increment that I have created.

The intent is as follows:
1. Sort the data in "A" column in objWorkbook Alphabetically (done)
2. Move/Cut all the rows containing numbers in "A" column of objWorkbook into objWorkbookNew (errors in script)
3. Remove duplicates from "A" column in objWorkbook (done)
4. Move/Cut the remaining data in objWorkbook to objWorkbookNew
5. Delete objWorkbook

I have gone around in circles with this and again, my inexperience got the better of me.
I'm sure there is something small I am missing here. I have attached some sample data and the code to the point I have described above.

Hope someone can help.
Cheers
G_M

 
Public Sub SortDocumentInExcel()

        Dim objExcel As Excel.Application

        Dim strWorkingFileFrom As String
        Dim objWorkbook As Excel.Workbook
        Dim objWorkSheet As Excel.Worksheet
        Dim objExcelRange As Excel.Range

        Dim strWorkingFileTo As String
        Dim objWorkbookNew As Excel.Workbook
        Dim objWorkSheetNew As Excel.Worksheet
        Dim objExcelRange2 As Excel.Range

        Dim objExcelRange3 As Excel.Range
        Dim LastRow As Long
        Dim misValue As Object = System.Reflection.Missing.Value

        'Values for original Excel document... The document with the raw data
        strWorkingFileFrom = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = CreateObject("Excel.Application")
        objWorkbook = objExcel.Workbooks.Open(strWorkingFileFrom)
        objWorkSheet = objWorkbook.Sheets("sheet1")
        objExcelRange = objWorkSheet.Range("A2")
        
        'Values for the new Excel document... The document with the filtered data
        strWorkingFileTo = frmGetReferences.lblOutputFolder.Text & "wf2_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objWorkbookNew = objExcel.Workbooks.Add(misValue)
        objWorkSheetNew = objWorkbookNew.Sheets("sheet1")
        objWorkSheetNew.SaveAs(strWorkingFileTo)
        objExcelRange2 = objWorkSheetNew.Range("A" & objWorkSheetNew.Rows.Count).End(Excel.XlDirection.xlUp)

        'Sort the data in the "A" column; expanding selection to other columns
        objWorkSheet.Cells.Sort(Key1:=objWorkSheet.Range("A2"), _
                                Order1:=Excel.XlSortOrder.xlAscending, Header:=Excel.XlYesNoGuess.xlYes, _
                                Orientation:=Excel.XlSortOrientation.xlSortColumns, SortMethod:=Excel.XlSortMethod.xlPinYin)

        'Extract data if column "D" is equal to "Yes"... Extracts numeric data only
        Do While objExcelRange.Value <> 0

            If objExcelRange.Row <> 1 Then

                If objExcelRange.Offset(, 3).Value = "Yes" Then
                    objExcelRange.Cut(objExcelRange2)
                    'objExcelRange2.Paste()
                    objWorkbookNew.Save()
                    'objExcelRange = objExcelRange.Offset(1)
                    objExcelRange2 = objExcelRange2.Offset(1)
                Else
                    Exit Do
                End If
                objExcelRange = objExcelRange.Offset(1)
            End If
        Loop

        'Remove rows based on duplicate values from "A" column
        LastRow = objWorkSheet.Range("A" & objWorkSheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
        objExcelRange3 = objWorkSheet.Range("A1").Resize(LastRow, 3)
        objExcelRange3.RemoveDuplicates(Columns:=1, Header:=Excel.XlYesNoGuess.xlYes)


        objWorkbookNew.Save()
        objWorkbookNew.Close()
        objWorkbook.Save()
        objWorkbook.Close()
        objExcel.Quit()

        objExcel = Nothing
        objWorkbook = Nothing
        objWorkSheet = Nothing



    End Sub



End Module

Open in new window

wf-ch1.xlsx
I'll have a look at it, just want to clarify what you want to do first:

1 Extract all the references with nos to new sheet.

2 Remove duplicates from remaining references.

3 Combine 2 lists.

One thing I notice you seem to be doing in the code is creating a new workbook.

Is that for the eventual result?

Does it have to be a new workbook?

This could all be done within the same workbook and the original list could be left untouched.

If you really want a new workbook that could be done right at the end.

Using just one workbook would makes things a lot easier than working with 2 workbooks.

ASKER CERTIFIED SOLUTION
Avatar of Norie
Norie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of G_M

ASKER

Hmmm... I didn't really know how to handle it with one workbook and though it easier to use two. But, being a novice leaves what ever you think appropriate as best :o) I am working with the intent to eventually process this data to three different formats at the end... .docx, .txt and .xlsx. I am hoping to pick up enough knowledge along the way to get this done.

I don't really care if I keep the origainal data, I just want to filter out what is not needed. This working file will eventually be deleted; however, it would probably be handy for debugging my mistakes.

Looking into it further, I seem to have an issue with my Do While Loop and the string data from the workbook isn't validating. I feel like I'm so close though.

Cheers
G_M
There are no duplicates in the sample file.

The code I posted works with the data you posted earlier and copies the references/rows required to another worksheet.

It still needs tidied up as bit, deleting the helper column, the criteria etc
Avatar of G_M

ASKER

Thank you imnorie, that worked a treat (although I'll have to study it a bit to work out how) ;o)

I'll leave it at that for the moment. I'll see if I can push on a little further with this on my own.

Thanks again. You're a champion :o)

 
Public Sub SortDocumentInExcel()

        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorksheet As Excel.Worksheet
        Dim objWorksheetNew As Excel.Worksheet
        Dim objExcelRange As Excel.Range
        Dim objExcelRange2 As Excel.Range
        Dim strInputFile As String
        Dim LastRow As Long

        strInputFile = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"

        objExcel = New Excel.Application
        objWorkbook = objExcel.Workbooks.Open(strInputFile)
        objWorksheet = objWorkbook.Worksheets(1)

        LastRow = objWorksheet.Range("A" & objWorksheet.Rows.Count).End(Excel.XlDirection.xlUp).Row

        objExcelRange = objWorksheet.Range("A1").Resize(LastRow, 3)

        objWorksheet.Range("D1").Value = "Keep"
        objWorksheet.Range("D2:D" & LastRow).Formula = "=IF(ISNUMBER(A2),""Yes"",IF(COUNTIF($A$2:A2,A2)=1, ""Yes"",""No""))"

        objExcelRange = objExcelRange.Resize(, 4)

        objWorksheetNew = objWorkbook.Worksheets.Add(After:=objWorksheet)
        objWorksheetNew.Range("A1").Value = "Keep"
        objWorksheetNew.Range("A2").Value = "Yes"

        objExcelRange2 = objWorksheetNew.Range("A1:A2")
        objExcelRange.AdvancedFilter(Action:=Excel.XlFilterAction.xlFilterCopy, _
                                      CriteriaRange:=objExcelRange2, CopyToRange:=objWorksheetNew.Range("D1"), Unique:=True)

        objExcel.Visible = True
        objExcel.UserControl = True


        objWorkbook.Close(True)
        objExcel.Quit()
        objWorkbook = Nothing
        objExcel = Nothing

Open in new window

It's quite simple really.

1 Open file.

2 Identify data.

3 Add extra (helper) column with formula to identify rows to keep.

4 Add a worksheet for the results

5 Set up criteria that will only return the rows to keep.

6 Apply advanced filter to copy ony the rows to keep to the newly created worksheet.

7 Delete criteria.

8 Delete helper column(s)

9 Sort.

10 Save and close workbook.

Obviously I've not done 7-10 but this is how 7 & 8 can be done.
objXLDst.Range("A1:C1").EntireColumn.Delete

objXLSrc.Range("D1").EntireColumn.Delete 

objXLDst.Range("D1").EntireColumn.Delete

Open in new window

Avatar of G_M

ASKER

I was just about to work all that out... Thank you again imnorie