Solved

I need a script to compare rows of results that can auto de-dupe via .VBS script for automation purposes

Posted on 2008-10-24
19
350 Views
Last Modified: 2013-11-10
Hi all,

I am in the midst of seeking a script that can take a spread sheet in a given directory, open it, and check all the rows top heading down.  I would want the code to specify if there are exact matches to remove them on all worksheets (not just the active one) and then save the file as deduped.myfile.xls instead of over writting the original file.

Thanks for your help in advance!
0
Comment
Question by:smyers051972
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
  • +1
19 Comments
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22797666
You can do the code part with my addin, http://xldynamic.com/source/xld.DupMaster.html

it will remove exact matches over all sheets (ie it doesn't just compare rows within a sheet)

Once downloaded the addin is run via
Tools - The Duplixate Master
See pic below for settings

The save as part will need to be done manually

Cheees

DAve

dm---all-rows.jpg
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22797832
Hey thanks appreciate it.  I am seeking a .VBS script however. I need it to be fully executable through scripting this is for automation purposes.

Thank you!
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22797852
Hi,

You should probably post this in the VBscript area.

I understand your need for automation, but this is a pretty significant piece of code to build from scratch

Cheers

Dave
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:smyers051972
ID: 22798400
How can I change the zones?
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22798971
Pls hit the Request Attention button at top right

Cheers

Dave
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 22802996
Hello smyers051972,

Please specify what you mean by "exact match", and provide a sample file as an example.  EE now
allows you to directly upload files to your question.

Please be advised that once you upload a file, it can be publicly accessed, and that it
may not be possible to fully and permanently delete it.  Therefore, be very careful about
posting proprietary, confidential, or other sensitive information.  If necessary, use "fake"
and/or obfuscated data in your sample.

Regards,

Patrick
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22803708
It would be a mailing list, when I say exact match I am meaning

#     First Name        Last Name                 address                        city          state           zip          phone#  

1     John                  Doe                           1234 Street Way          City         State           12345    305-555-0555
2     Jane                  Doe                           4321 Street Way          Another   State          13245    786-555-0556
3     Jane                  Doe                           5321 Street Way          ACity        State          23245    306-555-0556
4     John                  Doe                           1234 Street Way          City         State           12345    305-555-0555
5     Jane                  Doe                           4321 Street Way          Another   State          13245    786-555-0556

In this example items 1,4,5 are dupes because they match another field so they should be deleted, number 2 is close but not an exact match. so if anything of those columns doesnt match completely then they should stay for manual review. There will be other columns such as statistics etc but the first set of columns shown above will be used for duplucate matching: first/last name, address, city, state, zip, and phone number.

The reason why I want to have a complete match is because they may be two different people, we are pulling from 8 different database sources (all sql 2000) and merging to one sheet.  What I want the script to do is check all worksheets and find the duplicates and delete them then save the file as deduped.myfile.xls

Thank you!
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22803841
Do you need to find dupes between sheets also, or just dupes that are on each sheet.  In other words, if there is an entry on the second sheet that is a duplicate of one on the first sheet, do you need that one deleted?
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 22803861
smyers051972 said:
>>The reason why I want to have a complete match is because they may be two different people, we are
>>pulling from 8 different database sources (all sql 2000) and merging to one sheet.  

I should think you would be better off doing the duplicate checking on the SQL Server.  How are you
gathering the data from there?
0
 
LVL 23

Accepted Solution

by:
ahammar earned 500 total points
ID: 22804005
Here is code and a working workbook that worked for me...
Note:
When you start the macro, it asks which file to check for dupes

It simply checks the first 7 columns of each row (A through G) for duplicaes.  It does not check any farther than that.

For now, there must be a value in every row of column A..(no blanks in column A until after the last row of data)

It saves the new file as: deduped.myfile.xls  in the same folder as the file you opened to check for dupes

It checks every sheet, but it does not cross reference sheets.  In other words, when it changes sheets, it starts over.  It does not compare any data in 1 sheet, with data in another sheet.  This can be easily changed simply by removing the 3 lines that empty the collection object.

Here is the code and a working file.  You can download the file and just run the macro...

If you have any questions or need something changed...let me know
:-)
Albert
Here is the workbook and the code:



Public Sub DeleteDupes()
 
    Dim OtherWb As Workbook
    Dim FileName As String
    Dim r As Range
    Dim rval As String
    Dim collString As Collection
    Dim n As Integer
    Dim x as Integer
    Dim sh As Worksheet
    
    
    Set collString = New Collection
   
FileName = Application.GetOpenFilename("Excel Workbook Files (*.xls), *.xls")
   
Application.ScreenUpdating = False
If FileName <> "False" Then
      Set OtherWb = Workbooks.Open(FileName, False, True)
 
        For Each sh In OtherWb.Sheets
              Set r = sh.Range("A1")
            
                On Error GoTo errhandler
                Do Until r = ""
                    rval = ""
                    For n = 0 To 6
                        rval = rval & Trim(r.Offset(0, n).Value)
                    Next n
                    collString.Add rval, rval
                    Set r = r.Offset(1, 0)
cont:
                Loop
                For x = 1 To collString.Count
                    collString.Remove 1
                Next x
                
        Next sh
 
    OtherWb.SaveAs OtherWb.Path & "\deduped.myfile.xls" ', 56
    OtherWb.Close False
        
Exit Sub
 
errhandler:
    If Err.Number = 457 Then
        Set r = r.Offset(1, 0)
        r.Offset(-1, 0).EntireRow.Delete xlUp
        Resume cont
    Else
        MsgBox Err.Description
        Exit Sub
    End If
    
End If
   
Application.ScreenUpdating = True
   
 
End Sub

Open in new window

DeleteDupes.xls
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22804024
remember to always make copies of your files before testing...
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805179
Thank you for the replies! I will test this out as a possible solution.

I wanted to touch on one thing though just in case a revision is needed and I am sure you will know this once I touch out on this.  We take all data from 8 different databases on 8 servers, reason why is each server is self contained because they are out at sea aboard a vessel.  Currently I have some VB scripts I got here that after I export to excel as 8 different files, I can take them all and merge to one single file, either 1 worksheet or tabbed for each ship.  I would probably like to check all work sheets.  So dupe check at the sql server would be impossible to do, we only have 384k via satellite :)

Thank you!
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805201
When you say I can run the macro is this ok from the command prompt? I am scripting it out with out running excel manually? i.e. I can save this as a .VBS?  also to specify the directory can I change *.xls (the second one shown) to C:\myfiles\*.xls ?


Thanks again!
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22805842
The code I provided is not script...it is vba code.  It is meant to be run directly from Excel.  I don't know anything about script.  There may be a way to run this from the command prompt, but if there is, I do not know...sorry about that...
As far as the VBA goes, if you want to specify a starting directory, then in line 14 of the code I posted above...(it is a blank line) put this:
chdir "C:\myfiles"

You will still be asked which file to open to check, but it will start in that folder.
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805858
Ahh darn!

I really need it to be .VBS format I was unaware that you were suggesting VBA, to use it I would need to actually be running excel and I am looking to streamline the entire process and have it save the file automatically.

I will see if theres a way to convert your script to VBS format in the mean time any help would be great for anyone that can create a .VBS script.

Thank you!
0
 
LVL 1

Author Comment

by:smyers051972
ID: 22805870
I found out from an article at: http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Controls/Q_23074943.html?sfQueryTermInfo=1+convert+vb+vba 

that you can not convert. I will close the question and accept your solution then post a new one in VB script section I think I miss categorized this in the first place.

Thank you for your help
0
 
LVL 23

Expert Comment

by:ahammar
ID: 22806869
Hi smyers,
Thanks for the points and the grade!  I'm sorry it's not exactly what you wanted.  If you were not a premium member I would say have this question deleted and your points refunded.  You never have to accept a solution that does not work for you....plus you did specify script in your question.  I just wrote it in vba for 2 reasons....lots of times the asker thinks vba is called script, and the other is a lot of times the vba works for them just as well even if they do want script....but neither is the case for you....but since you are a premium member, it's a lot easier to just accept an answer than write to support and ask to have a question deleted...and you don't lose anything...
As far as that link you posted....that was not talking about .vbs script.  That was talking about converting to VB (Visual Basic).

I can take this code and convert it to VB with no problem and it would be a standalone app, but that's still not the same as script...
Let me know if that's what you want, and I'll do that for you.  I'll be gone until late tonight though, so it wouldn't be today...just so you'll know...

:-)
Albert
0
 
LVL 50

Expert Comment

by:Dave Brett
ID: 22817837
Good to see that you got this sorted with a vbs elsewhere. I had assumed that you had wanted to find duplicate rows betwen sheets - hence my significant comment - finding duplicates within a sheet is a much easier process

Regards

Dave
0

Featured Post

Office 365 Training for Admins - 7 Day Trial

Learn how to provision tenants, synchronize on-premise Active Directory, implement Single Sign-On, customize Office deployment, and protect your organization with eDiscovery and DLP policies.  Only from Platform Scholar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question