Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Using VB.NET, remove rows containing duplicate values in Excel unless cell contains only numbers

Posted on 2011-10-16
13
Medium Priority
?
1,346 Views
Last Modified: 2012-05-12
Ok, I had a previous post requesting removal of rows containing duplicate values. This was answered promptly and well (thank you imnorie). Now I need to remove those duplicate values without removing the cells/rows containing only numerical data.

Initially I thought of copying the numerical  data out of the document, sorting it and then putting it back in. But my inexperience got the better of me. Below is the script so far. If someone could lend me a hand, I'd really appreciate the help.

Cheers
G_M

 
Public Sub SortDocumentInExcel()

        Dim strWorkingFile As String
        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorkSheet As Excel.Worksheet
        Dim objExcelRange As Excel.Range
        Dim LastRow As Long


        strWorkingFile = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = CreateObject("Excel.Application")
        objWorkbook = objExcel.Workbooks.Open(strWorkingFile)
        objWorkSheet = objWorkbook.Sheets("sheet1")

        ' Sort the data in the "A" column; expanding selection to other columns
        objWorkSheet.Cells.Sort(Key1:=objWorkSheet.Range("A2"), _
                                Order1:=Excel.XlSortOrder.xlAscending, Header:=Excel.XlYesNoGuess.xlYes, _
                                Orientation:=Excel.XlSortOrientation.xlSortColumns, SortMethod:=Excel.XlSortMethod.xlPinYin)

        'Remove rows based on duplicate values from "A" column
        LastRow = objWorkSheet.Range("A" & objWorkSheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
        objExcelRange = objWorkSheet.Range("A1").Resize(LastRow, 3)
        objExcelRange.RemoveDuplicates(Columns:=1, Header:=Excel.XlYesNoGuess.xlYes)

        objWorkbook.Save()
        objWorkbook.Close()
        objExcel.Quit()

        objExcel = Nothing
        objWorkbook = Nothing
        objWorkSheet = Nothing


    End Sub

Open in new window

0
Comment
Question by:G_M
  • 7
  • 6
13 Comments
 
LVL 36

Expert Comment

by:Norie
ID: 36976339
This is quite a bit different  as far as I can see.

Perhaps it would be easier to only get the references you want from the document.

Can you post some sample data before/after?

PS I think there's a way to relate questions.

That might help you keep tabs on things, and also alert people that answered previous questions.
0
 

Author Comment

by:G_M
ID: 36976373
       A                              B                    C
1 Reference                  Page         Document
2 2002                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 1999                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data                   Page 3      test.docx
7 Test Data, 2002         Page 5      test.docx

        A                              B                    C
1 Reference                  Page         Document
2 1999                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 2002                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data, 2002     Page 3      test.docx

0
 

Author Comment

by:G_M
ID: 36976378
Sorry... hit Submit instead of Preview.

This is the unsorted data:
       A                              B                    C
1 Reference                  Page         Document
2 2002                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 1999                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data                   Page 3      test.docx
7 Test Data, 2002         Page 5      test.docx

This is the sorted data:
        A                              B                    C
1 Reference                  Page         Document
2 1999                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 2002                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data, 2002         Page 3      test.docx

Cheers G_M
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:G_M
ID: 36976383
And that is still incorrect...

The unsorted data should be

       A                              B                    C
1 Reference                  Page         Document
2 2002                           Page 1      test.docx
3 2002                           Page 2      test.docx
4 1999                           Page 1      test.docx
5 Test Data                   Page 3      test.docx
6 Test Data                   Page 3      test.docx
7 Test Data, 2002         Page 5      test.docx
8 Test Data, 2002         Page 4      test.docx
0
 
LVL 36

Expert Comment

by:Norie
ID: 36976676
I think I see what you mean.

Not sure how to approach it though.

One thing that might work is to add a formula in column D that indicates whether or no the reference should be kept.

Something like this:

=IF(ISNUMBER(A2),"Yes",IF(COUNTIF($A$2:A2,A2)=1, "YES","NO"))

That would put YES in column D if the row is to be kept and NO otherwise.

Then the data could be filtered to leave just the rows to keep.

I'm sure there's a some other, better? way but can't think of anything else off the top of my head.
0
 

Author Comment

by:G_M
ID: 36979271
Ok imnorie, I used your idea (sort of) identifying if the data in the A column contained a a numerical value by inserting the following equation into column D:

=IF(ISNUMBER(A#), "Yes", "NO")

I then attempt to extract the data to a second workbook called "wf2_" & strWorkingFileTo

I have managed to copy the first two values into the new document; however, after that I get an error with the objExcelRange.offset(1) increment that I have created.

The intent is as follows:
1. Sort the data in "A" column in objWorkbook Alphabetically (done)
2. Move/Cut all the rows containing numbers in "A" column of objWorkbook into objWorkbookNew (errors in script)
3. Remove duplicates from "A" column in objWorkbook (done)
4. Move/Cut the remaining data in objWorkbook to objWorkbookNew
5. Delete objWorkbook

I have gone around in circles with this and again, my inexperience got the better of me.
I'm sure there is something small I am missing here. I have attached some sample data and the code to the point I have described above.

Hope someone can help.
Cheers
G_M

 
Public Sub SortDocumentInExcel()

        Dim objExcel As Excel.Application

        Dim strWorkingFileFrom As String
        Dim objWorkbook As Excel.Workbook
        Dim objWorkSheet As Excel.Worksheet
        Dim objExcelRange As Excel.Range

        Dim strWorkingFileTo As String
        Dim objWorkbookNew As Excel.Workbook
        Dim objWorkSheetNew As Excel.Worksheet
        Dim objExcelRange2 As Excel.Range

        Dim objExcelRange3 As Excel.Range
        Dim LastRow As Long
        Dim misValue As Object = System.Reflection.Missing.Value

        'Values for original Excel document... The document with the raw data
        strWorkingFileFrom = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objExcel = CreateObject("Excel.Application")
        objWorkbook = objExcel.Workbooks.Open(strWorkingFileFrom)
        objWorkSheet = objWorkbook.Sheets("sheet1")
        objExcelRange = objWorkSheet.Range("A2")
        
        'Values for the new Excel document... The document with the filtered data
        strWorkingFileTo = frmGetReferences.lblOutputFolder.Text & "wf2_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"
        objWorkbookNew = objExcel.Workbooks.Add(misValue)
        objWorkSheetNew = objWorkbookNew.Sheets("sheet1")
        objWorkSheetNew.SaveAs(strWorkingFileTo)
        objExcelRange2 = objWorkSheetNew.Range("A" & objWorkSheetNew.Rows.Count).End(Excel.XlDirection.xlUp)

        'Sort the data in the "A" column; expanding selection to other columns
        objWorkSheet.Cells.Sort(Key1:=objWorkSheet.Range("A2"), _
                                Order1:=Excel.XlSortOrder.xlAscending, Header:=Excel.XlYesNoGuess.xlYes, _
                                Orientation:=Excel.XlSortOrientation.xlSortColumns, SortMethod:=Excel.XlSortMethod.xlPinYin)

        'Extract data if column "D" is equal to "Yes"... Extracts numeric data only
        Do While objExcelRange.Value <> 0

            If objExcelRange.Row <> 1 Then

                If objExcelRange.Offset(, 3).Value = "Yes" Then
                    objExcelRange.Cut(objExcelRange2)
                    'objExcelRange2.Paste()
                    objWorkbookNew.Save()
                    'objExcelRange = objExcelRange.Offset(1)
                    objExcelRange2 = objExcelRange2.Offset(1)
                Else
                    Exit Do
                End If
                objExcelRange = objExcelRange.Offset(1)
            End If
        Loop

        'Remove rows based on duplicate values from "A" column
        LastRow = objWorkSheet.Range("A" & objWorkSheet.Rows.Count).End(Excel.XlDirection.xlUp).Row
        objExcelRange3 = objWorkSheet.Range("A1").Resize(LastRow, 3)
        objExcelRange3.RemoveDuplicates(Columns:=1, Header:=Excel.XlYesNoGuess.xlYes)


        objWorkbookNew.Save()
        objWorkbookNew.Close()
        objWorkbook.Save()
        objWorkbook.Close()
        objExcel.Quit()

        objExcel = Nothing
        objWorkbook = Nothing
        objWorkSheet = Nothing



    End Sub



End Module

Open in new window

wf-ch1.xlsx
0
 
LVL 36

Expert Comment

by:Norie
ID: 36979761
I'll have a look at it, just want to clarify what you want to do first:

1 Extract all the references with nos to new sheet.

2 Remove duplicates from remaining references.

3 Combine 2 lists.

One thing I notice you seem to be doing in the code is creating a new workbook.

Is that for the eventual result?

Does it have to be a new workbook?

This could all be done within the same workbook and the original list could be left untouched.

If you really want a new workbook that could be done right at the end.

Using just one workbook would makes things a lot easier than working with 2 workbooks.

0
 
LVL 36

Accepted Solution

by:
Norie earned 2000 total points
ID: 36979822
I've come up with something based on my idea from yesterday but as far as I can see the data in the sample file doesn't have any duplicates.

I tried it on the sample data you posted here and it seemed to work.

Anyway, here's the code.
Option Explicit

Dim objXL As Excel.Application
Dim objWB As Excel.Workbook
Dim objSrcWS As Excel.Worksheet
Dim objDstWS As Excel.Worksheet
Dim objXLDataRange As Excel.Range
Dim objXlCritRange As Excel.Range
Dim strInputFile As String
Dim LastRow As Long


'strInputFile = "C:\DocRefs.xlsx"
strInputFile = "C:\wf-ch1.xlsx"

objXL = New Excel.Application

objWB = objXL.Workbooks.Open(strInputFile)

objSrcWS = objWB.Worksheets(1)

LastRow = objSrcWS.Range("A" & objSrcWS.Rows.Count).End(Excel.XlDirection.xlUp).Row

objXLDataRange = objSrcWS.Range("A1").Resize(LastRow, 3)

objSrcWS.Range("D1").Value = "Keep"

objSrcWS.Range("D2:D" & LastRow).Formula = "=IF(ISNUMBER(A2),""Yes"",IF(COUNTIF($A$2:A2,A2)=1, ""Yes"",""No""))"

objXLDataRange = objXLDataRange.Resize(, 4)

objDstWS = objWB.Worksheets.Add(After:=objSrcWS)

objDstWS.Range("A1").Value = "Keep"
objDstWS.Range("A2").Value = "Yes"

objXlCritRange = objDstWS.Range("A1:A2")

objXLDataRange.AdvancedFilter(Action:=Excel.XlFilterAction.xlFilterCopy,CriteriaRange:=objXlCritRange, CopyToRange:=objDstWS.Range("D1"), Unique:=True)

objXL.Visible = True
objXL.UserControl = True

'       objWB.Close(True)

'       objXL.Quit()

objWB = Nothing
objXL = Nothing

Open in new window

0
 

Author Comment

by:G_M
ID: 36979868
Hmmm... I didn't really know how to handle it with one workbook and though it easier to use two. But, being a novice leaves what ever you think appropriate as best :o) I am working with the intent to eventually process this data to three different formats at the end... .docx, .txt and .xlsx. I am hoping to pick up enough knowledge along the way to get this done.

I don't really care if I keep the origainal data, I just want to filter out what is not needed. This working file will eventually be deleted; however, it would probably be handy for debugging my mistakes.

Looking into it further, I seem to have an issue with my Do While Loop and the string data from the workbook isn't validating. I feel like I'm so close though.

Cheers
G_M
0
 
LVL 36

Expert Comment

by:Norie
ID: 36979931
There are no duplicates in the sample file.

The code I posted works with the data you posted earlier and copies the references/rows required to another worksheet.

It still needs tidied up as bit, deleting the helper column, the criteria etc
0
 

Author Comment

by:G_M
ID: 36980010
Thank you imnorie, that worked a treat (although I'll have to study it a bit to work out how) ;o)

I'll leave it at that for the moment. I'll see if I can push on a little further with this on my own.

Thanks again. You're a champion :o)

 
Public Sub SortDocumentInExcel()

        Dim objExcel As Excel.Application
        Dim objWorkbook As Excel.Workbook
        Dim objWorksheet As Excel.Worksheet
        Dim objWorksheetNew As Excel.Worksheet
        Dim objExcelRange As Excel.Range
        Dim objExcelRange2 As Excel.Range
        Dim strInputFile As String
        Dim LastRow As Long

        strInputFile = frmGetReferences.lblOutputFolder.Text & "wf_" & frmGetReferences.txtOutputFilename.Text & ".xlsx"

        objExcel = New Excel.Application
        objWorkbook = objExcel.Workbooks.Open(strInputFile)
        objWorksheet = objWorkbook.Worksheets(1)

        LastRow = objWorksheet.Range("A" & objWorksheet.Rows.Count).End(Excel.XlDirection.xlUp).Row

        objExcelRange = objWorksheet.Range("A1").Resize(LastRow, 3)

        objWorksheet.Range("D1").Value = "Keep"
        objWorksheet.Range("D2:D" & LastRow).Formula = "=IF(ISNUMBER(A2),""Yes"",IF(COUNTIF($A$2:A2,A2)=1, ""Yes"",""No""))"

        objExcelRange = objExcelRange.Resize(, 4)

        objWorksheetNew = objWorkbook.Worksheets.Add(After:=objWorksheet)
        objWorksheetNew.Range("A1").Value = "Keep"
        objWorksheetNew.Range("A2").Value = "Yes"

        objExcelRange2 = objWorksheetNew.Range("A1:A2")
        objExcelRange.AdvancedFilter(Action:=Excel.XlFilterAction.xlFilterCopy, _
                                      CriteriaRange:=objExcelRange2, CopyToRange:=objWorksheetNew.Range("D1"), Unique:=True)

        objExcel.Visible = True
        objExcel.UserControl = True


        objWorkbook.Close(True)
        objExcel.Quit()
        objWorkbook = Nothing
        objExcel = Nothing

Open in new window

0
 
LVL 36

Expert Comment

by:Norie
ID: 36980089
It's quite simple really.

1 Open file.

2 Identify data.

3 Add extra (helper) column with formula to identify rows to keep.

4 Add a worksheet for the results

5 Set up criteria that will only return the rows to keep.

6 Apply advanced filter to copy ony the rows to keep to the newly created worksheet.

7 Delete criteria.

8 Delete helper column(s)

9 Sort.

10 Save and close workbook.

Obviously I've not done 7-10 but this is how 7 & 8 can be done.
objXLDst.Range("A1:C1").EntireColumn.Delete

objXLSrc.Range("D1").EntireColumn.Delete 

objXLDst.Range("D1").EntireColumn.Delete

Open in new window

0
 

Author Comment

by:G_M
ID: 36980110
I was just about to work all that out... Thank you again imnorie
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft's Excel has many features that most people will never need nor take advantage of.  Conditional formatting is one feature that you may find a necessity once you start using it.
If you need to forecast numbers -- typically for finance -- the Windows and Mac versions of Excel 2016 have a basket of tools to get the job done.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.

581 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question