Solved

I need a script to compare rows of results that can auto de-dupe via .VBS script for automation purposes

Posted on 2008-10-24
19
344 Views
Last Modified: 2013-11-10
Hi all,

I am in the midst of seeking a script that can take a spread sheet in a given directory, open it, and check all the rows top heading down.  I would want the code to specify if there are exact matches to remove them on all worksheets (not just the active one) and then save the file as deduped.myfile.xls instead of over writting the original file.

Thanks for your help in advance!
0
Comment
Question by:smyers051972
  • 7
  • 5
  • 4
  • +1
19 Comments
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22797666
You can do the code part with my addin, http://xldynamic.com/source/xld.DupMaster.html

it will remove exact matches over all sheets (ie it doesn't just compare rows within a sheet)

Once downloaded the addin is run via
Tools - The Duplixate Master
See pic below for settings

The save as part will need to be done manually

Cheees

DAve

dm---all-rows.jpg
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22797832
Hey thanks appreciate it.  I am seeking a .VBS script however. I need it to be fully executable through scripting this is for automation purposes.

Thank you!
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22797852
Hi,

You should probably post this in the VBscript area.

I understand your need for automation, but this is a pretty significant piece of code to build from scratch

Cheers

Dave
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22798400
How can I change the zones?
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22798971
Pls hit the Request Attention button at top right

Cheers

Dave
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 22802996
Hello smyers051972,

Please specify what you mean by "exact match", and provide a sample file as an example.  EE now
allows you to directly upload files to your question.

Please be advised that once you upload a file, it can be publicly accessed, and that it
may not be possible to fully and permanently delete it.  Therefore, be very careful about
posting proprietary, confidential, or other sensitive information.  If necessary, use "fake"
and/or obfuscated data in your sample.

Regards,

Patrick
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22803708
It would be a mailing list, when I say exact match I am meaning

#     First Name        Last Name                 address                        city          state           zip          phone#  

1     John                  Doe                           1234 Street Way          City         State           12345    305-555-0555
2     Jane                  Doe                           4321 Street Way          Another   State          13245    786-555-0556
3     Jane                  Doe                           5321 Street Way          ACity        State          23245    306-555-0556
4     John                  Doe                           1234 Street Way          City         State           12345    305-555-0555
5     Jane                  Doe                           4321 Street Way          Another   State          13245    786-555-0556

In this example items 1,4,5 are dupes because they match another field so they should be deleted, number 2 is close but not an exact match. so if anything of those columns doesnt match completely then they should stay for manual review. There will be other columns such as statistics etc but the first set of columns shown above will be used for duplucate matching: first/last name, address, city, state, zip, and phone number.

The reason why I want to have a complete match is because they may be two different people, we are pulling from 8 different database sources (all sql 2000) and merging to one sheet.  What I want the script to do is check all worksheets and find the duplicates and delete them then save the file as deduped.myfile.xls

Thank you!
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22803841
Do you need to find dupes between sheets also, or just dupes that are on each sheet.  In other words, if there is an entry on the second sheet that is a duplicate of one on the first sheet, do you need that one deleted?
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 22803861
smyers051972 said:
>>The reason why I want to have a complete match is because they may be two different people, we are
>>pulling from 8 different database sources (all sql 2000) and merging to one sheet.  

I should think you would be better off doing the duplicate checking on the SQL Server.  How are you
gathering the data from there?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 23

Accepted Solution

by:
ahammar earned 500 total points
ID: 22804005
Here is code and a working workbook that worked for me...
Note:
When you start the macro, it asks which file to check for dupes

It simply checks the first 7 columns of each row (A through G) for duplicaes.  It does not check any farther than that.

For now, there must be a value in every row of column A..(no blanks in column A until after the last row of data)

It saves the new file as: deduped.myfile.xls  in the same folder as the file you opened to check for dupes

It checks every sheet, but it does not cross reference sheets.  In other words, when it changes sheets, it starts over.  It does not compare any data in 1 sheet, with data in another sheet.  This can be easily changed simply by removing the 3 lines that empty the collection object.

Here is the code and a working file.  You can download the file and just run the macro...

If you have any questions or need something changed...let me know
:-)
Albert
Here is the workbook and the code:



Public Sub DeleteDupes()
 

    Dim OtherWb As Workbook

    Dim FileName As String

    Dim r As Range

    Dim rval As String

    Dim collString As Collection

    Dim n As Integer

    Dim x as Integer

    Dim sh As Worksheet

    

    

    Set collString = New Collection

   

FileName = Application.GetOpenFilename("Excel Workbook Files (*.xls), *.xls")

   

Application.ScreenUpdating = False

If FileName <> "False" Then

      Set OtherWb = Workbooks.Open(FileName, False, True)
 

        For Each sh In OtherWb.Sheets

              Set r = sh.Range("A1")

            

                On Error GoTo errhandler

                Do Until r = ""

                    rval = ""

                    For n = 0 To 6

                        rval = rval & Trim(r.Offset(0, n).Value)

                    Next n

                    collString.Add rval, rval

                    Set r = r.Offset(1, 0)

cont:

                Loop

                For x = 1 To collString.Count

                    collString.Remove 1

                Next x

                

        Next sh
 

    OtherWb.SaveAs OtherWb.Path & "\deduped.myfile.xls" ', 56

    OtherWb.Close False

        

Exit Sub
 

errhandler:

    If Err.Number = 457 Then

        Set r = r.Offset(1, 0)

        r.Offset(-1, 0).EntireRow.Delete xlUp

        Resume cont

    Else

        MsgBox Err.Description

        Exit Sub

    End If

    

End If

   

Application.ScreenUpdating = True

   
 

End Sub

Open in new window

DeleteDupes.xls
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22804024
remember to always make copies of your files before testing...
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805179
Thank you for the replies! I will test this out as a possible solution.

I wanted to touch on one thing though just in case a revision is needed and I am sure you will know this once I touch out on this.  We take all data from 8 different databases on 8 servers, reason why is each server is self contained because they are out at sea aboard a vessel.  Currently I have some VB scripts I got here that after I export to excel as 8 different files, I can take them all and merge to one single file, either 1 worksheet or tabbed for each ship.  I would probably like to check all work sheets.  So dupe check at the sql server would be impossible to do, we only have 384k via satellite :)

Thank you!
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805201
When you say I can run the macro is this ok from the command prompt? I am scripting it out with out running excel manually? i.e. I can save this as a .VBS?  also to specify the directory can I change *.xls (the second one shown) to C:\myfiles\*.xls ?


Thanks again!
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22805842
The code I provided is not script...it is vba code.  It is meant to be run directly from Excel.  I don't know anything about script.  There may be a way to run this from the command prompt, but if there is, I do not know...sorry about that...
As far as the VBA goes, if you want to specify a starting directory, then in line 14 of the code I posted above...(it is a blank line) put this:
chdir "C:\myfiles"

You will still be asked which file to open to check, but it will start in that folder.
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805858
Ahh darn!

I really need it to be .VBS format I was unaware that you were suggesting VBA, to use it I would need to actually be running excel and I am looking to streamline the entire process and have it save the file automatically.

I will see if theres a way to convert your script to VBS format in the mean time any help would be great for anyone that can create a .VBS script.

Thank you!
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805870
I found out from an article at: http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Controls/Q_23074943.html?sfQueryTermInfo=1+convert+vb+vba

that you can not convert. I will close the question and accept your solution then post a new one in VB script section I think I miss categorized this in the first place.

Thank you for your help
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22806869
Hi smyers,
Thanks for the points and the grade!  I'm sorry it's not exactly what you wanted.  If you were not a premium member I would say have this question deleted and your points refunded.  You never have to accept a solution that does not work for you....plus you did specify script in your question.  I just wrote it in vba for 2 reasons....lots of times the asker thinks vba is called script, and the other is a lot of times the vba works for them just as well even if they do want script....but neither is the case for you....but since you are a premium member, it's a lot easier to just accept an answer than write to support and ask to have a question deleted...and you don't lose anything...
As far as that link you posted....that was not talking about .vbs script.  That was talking about converting to VB (Visual Basic).

I can take this code and convert it to VB with no problem and it would be a standalone app, but that's still not the same as script...
Let me know if that's what you want, and I'll do that for you.  I'll be gone until late tonight though, so it wouldn't be today...just so you'll know...

:-)
Albert
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22817837
Good to see that you got this sorted with a vbs elsewhere. I had assumed that you had wanted to find duplicate rows betwen sheets - hence my significant comment - finding duplicates within a sheet is a much easier process

Regards

Dave
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

No matter the version of Windows you are using, you may have some problems with Windows Search running too slow or possibly not running at all. Before jumping into how you can solve this issue, just know there are many other viable alternative deskt…
In this article we discuss how to recover the missing Outlook 2011 for Mac data like Emails and Contacts manually.
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now