Solved

Compare multiple columns to determine matches

Posted on 2014-03-07
11
494 Views
Last Modified: 2014-03-12
I have been given an audit that I do every month that requires comparing different sources to determine which items from each source are not in the others.  Essentially, I am given a directory dump with files in it, and told to validate that each of the files in the directory is also present in a spreadsheet.  The spreadsheet has 2 different columns that I have to validate the directory list against, but it only has to exist in one of the columns in order to pass the audit.  So, what I have been doing is scraping the .txt document that contains the files, and copying that into a column in Excel.  Then, I take the 2 columns out of the other Excel spreadsheet and compare them one at a time against the new sheet that I created.  It ends up working, but it's really not repeatable, or consistent, and it takes forever.  I'm looking for a more elegant solution.  It doesn't matter to me if it's in Excel, Access, VBA, SQL, or whatever.  I just need some way to do it better than I've been doing.
0
Comment
Question by:SarahDaisy8
11 Comments
 
LVL 16

Expert Comment

by:DcpKing
ID: 39913883
What do you have for tools?

If SQL Server, Access, or something that accepts SQL commands, you could import and keep the two columns from the Excel file, and also import the text file (is it a "dir >> textfile.txt ?) into another table. If that's possible then it's simple to write a query like

select * from DirectoryListTable
where FileNameField in (select FileNameField from ExcelTableOne)

to see what filenames are present. Change the "in" to "not in" to see the extras!

hth

Mike
0
 
LVL 34

Assisted Solution

by:PatHartman
PatHartman earned 250 total points
ID: 39913984
If you can see the directory from your computer, I would ignore the word document and get the file list directly from windows using FSO (file system object).  Then I would link to the spreadsheet and use an append query to append column 1 to a work table and a second query to append column 2.  You can add an extra column if you need to retain the fact that the name came from column 1 or column 2  Now you have one table with a list of files from the directory and a second table with a list of files from the spreadsheet.  So far all of these steps are repeatable and can be automated.

To do the match, I would use a union query and tag the results
qry1 is an inner join and will return all rows in both tables.
qry2 is a left join and returns rows in the directory list but not the spreadsheet
qry3 is a right join and returns rows in the spreadsheet but not in the directory list (I know you didn't ask for this but you should account for typos)

Then you can make a report that lists and counts the three situations from the union query.
0
 
LVL 17

Accepted Solution

by:
dbaSQL earned 250 total points
ID: 39914143
You know, I have done something very similar.  I had an ftp process that ran every day, to upload new files to our system.  The files were triggered by events, so there was no way to schedule, or know when to expect new files.  

I created a 'ProcessedFiles' table that just held FileName and Processed, where Processed was a bit flag, with a DEFAULT of 0.  Every day I would check the target directory for new files.  I don't have the full code on hand right now, but it was basically like the statement below.  Scan the directory, load all filenames to a working table, then compare to ProcessedFiles.  Anything where NOT EXISTS was then uploaded with a SSIS data flow task.  Upon completion, the Processed flag gets changed to 1.

You could use this type of approach to gather your filenames from the directory, and then compare to the spreadsheet.  

-- change this to your target directory path and run it.  you'll see what I mean.
DECLARE @path VARCHAR(100) = 'dir C:\tmp\'   --- your target directory
DECLARE @cmd VARCHAR(1024) = @path + ' /A-D /B'

DECLARE @Files TABLE (Filename VARCHAR(255))
INSERT @Files (Filename)
EXEC master..xp_cmdshell @cmd

-- Look at your file list
SELECT * FROM @Files
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 39914753
One advice: use Python. By far the easiest way to accomplish this kind of tasks. It will be a well spent time to learn and get familiar with it. There is a module to deal with any situation you might come across and they are all FREE!

CSV with Python: http://www.pythonforbeginners.com/systems-programming/using-the-csv-module-in-python/
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39915146
1. is the file comparison for a single directory's files or is there a tree involved?
2. do you need to compare any of the file attributes (modified date, size, hash) or just the names?
3. do you only need to know the exceptions or something else? (new and missing, for instance)
4. at the end of the process, what is to become of the Excel worksheet contents?
5. how many files are in the directory?
6. how many file names are in the worksheet?
7. is there a directory of files that matches the list in the worksheet?
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 
LVL 45

Expert Comment

by:aikimark
ID: 39915161
If the files from the prior month (time period) exist in a directory, you should also be able to use Robocopy to do the comparison.
For testing purposes, I created two folders "new" and "old", copied some files to each folder and ran the following command.
robocopy /L /X new old

Open in new window

I then deleted a file and added a file in the "new" folder and reran the command.  This was the result.
C:\Users\Mark\Downloads>robocopy /L /X new old

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows

-------------------------------------------------------------------------------

  Started : Sat Mar 08 17:21:32 2014

   Source : C:\Users\Mark\Downloads\new\
     Dest : C:\Users\Mark\Downloads\old\

    Files : *.*

  Options : *.* /X /L /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------------

                           3    C:\Users\Mark\Downloads\new\
          *EXTRA File              84784        fciv.exe
            New File                   0        New Text Document.txt

------------------------------------------------------------------------------

                Total    Copied   Skipped  Mismatch    FAILED    Extras
     Dirs :         1         0         1         0         0         0
    Files :         3         1         2         0         0         1
    Bytes :    47.6 k         0    47.6 k         0         0    82.7 k
    Times :   0:00:00   0:00:00                       0:00:00   0:00:00

    Ended : Sat Mar 08 17:21:32 2014

Open in new window

Note that it reported the new and missing files.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39915187
If you only have the list in the worksheet, you can most quickly do this comparison in the Excel VBA or VBScript environment using a dictionary object.
Example with parameter for directory.  The list of files is assumed to be in column F, starting on row 2.
Option Explicit

Public Sub Q_28383341(parmFolder As String)
    Dim oDic As Object
    Dim vFiles() As Variant
    Dim strFilename As String
    Dim vItem As Variant
    Set oDic = CreateObject("scripting.dictionary")
    vFiles = ActiveSheet.Range(ActiveSheet.Range("F2"), ActiveSheet.Range("F2").End(xlDown)).Value
    For Each vItem In vFiles
        oDic.Add CStr(vItem), 0
    Next
    strFilename = Dir(parmFolder & "\*.*", vbNormal)
    Do Until Len(strFilename) = 0
        If oDic.exists(strFilename) Then
            oDic(strFilename) = 1
        Else
            Debug.Print "New file: " & strFilename
        End If
        strFilename = Dir
    Loop
    For Each vItem In oDic
        If oDic(vItem) = 0 Then
            Debug.Print "Missing file: " & vItem
        End If
    Next
End Sub

Open in new window

Invoking this in the Immediate Window gives:
Q_28383341 "C:\Users\Mark\Downloads\New"
New file: New Text Document.txt
Missing file: fciv.exe

Open in new window

0
 
LVL 1

Author Comment

by:SarahDaisy8
ID: 39918086
Let me expand on my topic to add clarification, and to talk about my difficulties.  Right now, I receive a .txt file that contains files in directories that have been backed up.  The location of the files doesn't matter, just the name of the files, so I strip out only the names of the files from this .txt file and create a table in SQL.  I end up with one column with items named [username].txt.  The [username] portion can come from 2 different variables.  Those being the 2 columns in Excel that I spoke of previously.  The first column in the email address of the user, and the 2nd column is the user's login name.  Therefore, the corresponding .txt file for the user can either be named [email].txt or [login].txt.  Technically a user can have an email and a username, and the .txt file could be named either one, but I wouldn't know this in advance.  So, basically I need to compare the .txt file to see which ones don't match either an email address OR login name.  The queries I've been using aren't working because I'm getting duplicate data, because if there is an email, it will still check the login name, and if it doesn't find one, it will tell me it's missing.  I need to combine login + email into one check against the .txt file name.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39918306
Please post a sample of both the database table and the text file contents.

This question is in the Excel zone as well as some database zones.  Are these appropriate to the environment of the solution you seek?  If not, we can change the zones.

What do you think of my posted code solution?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39918313
If you're working in a database environment I would normally import the text file and let the database do the comparison with left and right (outer) joins.
0
 
LVL 1

Author Closing Comment

by:SarahDaisy8
ID: 39924597
Thanks.  Between the two different solutions, I came up with something that works for me.
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The Visual Basic for Applications (VBA) language is at the heart of every application that you write. It is your key to taking Access beyond the world of wizards into a world where anything is possible. This article introduces you to…
A simple tool to export all objects of two Access files as text and compare it with Meld, a free diff tool.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Using Microsoft Access, learn some simple rules for how to construct tables in a relational database. Split up all multi-value fields into single values: Split up fields that belong to other things into separate tables: Make sure that all record…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now