Solved

Get two strings in a filename and search it in a two dimensional array.

Posted on 2011-09-20
18
272 Views
Last Modified: 2012-06-27
Hi guys,

I have a 2 Dim array from an excel file with 3 columns.  The first column is ID, next is lastname, the 3rd is firstname.

I already put all the data in a 2 dim array where Array(NumberOfRowsInExcel, 1-3). 1 for the ID, 2 for the lastname and 3 for the firstname col in excel.

Now I have several folders and directories with document files that has this filenaming format.
Firstname Lastname-########_.doc

The # represents any random number.
Other samples.
Bryan Adams-31467854_.doc
Jeff McBride-78945677_.doc

Now what i want to do is crawl all docs in a folder then parse the firstname and lastname from the filename then compare that to the 2 dim array. If there is a perfect match, rename the file appending the Array(x,1) value or the ID column in excel to that doc file.

Example: There is a match in the array.  Append it to the very start of the filename then add underscore something like this.

000001_Bryan Adams-31467854_.doc
000002_Jeff McBride-78945677_.doc

If there's no match on the array, move it to a folder named "unmatched".


Above example lets say Bryan Adams has been parsed from the filename.
Pseudocode:
Let x = counter.
Loop
   If Array(x, 2) = Adams and Array(x,3) = Bryan then
     Name "path\filename.doc" as "path\" & array(x,1)  & "_" & "filename.doc"
   else
     filecopy to unmatched folder
     kill file
...
End Loop

Basically that's what i want to do. Can somebody help me out patch things right?  I haven't done any programming in VB6 for a year. Thanks.

By the way, this is my code to to extract the data from excel and put it in an array just in case.
Set xlApp = New Excel.Application
Set xlWbk = xlApp.Workbooks.Open("C:\EMR\drv_patient_list.xls")
Set xlWks = xlWbk.Sheets(1)
arv = xlWks.Range("A1:H" & xlWks.UsedRange.Rows.Count).Value

lngRowCount = LastRowInSheet(xlWks)
ReDim arrData(1 To lngRowCount, 1 To 3)
For i = 2 To lngRowCount
    For j = 1 To 3
        arrData(i, j) = xlWks.Cells(i, j).Value
    Next j
Next i

xlWbk.Close False
xlApp.Quit

Open in new window



TIA
0
Comment
Question by:m3mdicl
  • 10
  • 7
18 Comments
 
LVL 7

Expert Comment

by:shaydie
ID: 36570738
Something like this maybe?


Set objFolders = objFSO.getfolder(BasePath)
Set objFiles = objFolders.Files

 For Each FILE In objFiles
 strFileName = FILE.Name
 
 strFirstName = Left(strFileName, (InStr(1, strFileName, " ") - 1))
 strLastName = Mid(strFileName, (InStr(1, strFileName, " ") + 1), (InStr(1, strFileName, "-")) - (InStr(1, strFileName, " ") + 1))
 
 bIsMatch = False
 
 For ArrayItem = 0 To UBound(vArray)
 
 If strFirstName = vArray(ArrayItem, 1) And strLastName = vArray(ArrayItem, 2) Then
 
    objFSO.MoveFile FILE.Path, BasePath & vArray(ArrayItem, 0) & "_" & strFileName
    bIsMatch = True
    Exit For
 
 End If
 
 Next ArrayItem
 
 If bIsMatch = False Then objFSO.MoveFile FILE.Path, UnmatchedPath & strFileName
 
 Next
0
 

Author Comment

by:m3mdicl
ID: 36571753
Thanks for the reply.  

I was coding it using native syntax like Dir, Name, Filecopy and Kill because i though it's much faster than FSO.  But that should do it for the meantime.

I will test it and let you know.  Thanks mate.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 36573628
1. You can save yourself some processing by skipping the first row in your range specification on line 4.

arv = xlWks.Range("A2:H" & xlWks.UsedRange.Rows.Count).Value

Open in new window


2. Once this statement has executed, all your data is in the arv variable.  I don't see a need to transfer it to arrData in the item-by-item assignment.

3. From a performance perspective, you would be much better off in your loop referring to
arv(i, j)
rather than
xlWks.Cells(i, j).Value

4. If you only need the first three columns of data from the worksheet, why are you getting the first eight columns?

5. You should declare all your variables.  Add an Option Explicit statement to the top of your General Declarations section.

============
Depending on the number of names you might have to match and the number of files that are matching candidates, I might suggest using a dictionary object in place of an array.  Also, I would use the combined (space delimited) firstname and lastname data to match against the parsed file name.

Can we assume that the numbers associated with the names are unique?

In the following example, the Dir() function is not supplied a path.  You might add a path to the initial call or use a ChDir statement to set the current directory.

Option Explicit

Public Sub FileRename()
    Set xlApp = New Excel.Application
    Set xlWbk = xlApp.Workbooks.Open("C:\EMR\drv_patient_list.xls")
    Set xlWKS = xlWbk.Sheets(1)
    
    Dim lngRowCount As Long
    Dim lngLoop As Long
    Dim dicNames As Object
    Dim strFilename As String
    Dim strNamePart() As String
    Set dicNames = CreateObject("Scripting.Dictionary")
    
    'get data from workbook
    arv = xlWKS.Range("A2:C" & xlWKS.UsedRange.Rows.Count).Value
    xlWbk.Close False
    xlApp.Quit
    
    'populate dictionary with name and number data
    lngRowCount = UBound(arv)
    For lngLoop = 1 To lngRowCount
        dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), arv(lngLoop, 1)
    Next lngLoop
    
    'iterate files and rename if match in dictionary
    strFilename = Dir("*.doc")
    Do Until Len(strFilename) = 0
        If LCase(strFilename) Like "#*_*-#_.doc" Then   'skip already renamed files
        Else
            strNamePart = Split(strFilename, "-")
            If dicNames.Exists(strNamePart(0)) Then
                Name strFilename As dicNames(strNamePart(0)) & "_" & strFilename
            End If
        End If
        strFilename = Dir
    Loop

    Set dicNames = Nothing

End Sub

Open in new window

0
 

Author Comment

by:m3mdicl
ID: 36574496
Nice catch aikimark.   I will test it and let you know.  Thanks for the detailed help.
0
 

Author Comment

by:m3mdicl
ID: 36574792
@aikimark

I got a runtime error 457.  The key is already associated with an element of this collection.

This is the offending line

dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), arv(lngLoop, 1)

I also declared the variables like this.  Is this correct?

Option Explicit
Dim xlApp As Excel.Application
Dim xlWbk As Excel.Workbook
Dim xlWks As Excel.Worksheet
Dim arv As Variant

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 36574814
that means that the names in your worksheet are not unique.  What do you want to do with the duplicate names?
0
 

Author Comment

by:m3mdicl
ID: 36574919
Yes, firstnames and lastnames are not unique.  Only the ID is unique. so let's go back to array?
0
 

Author Comment

by:m3mdicl
ID: 36574991
I got 2000+ records on the excel sheet.  So the names will have duplicates.  That's why i want to compare both the firstname and lastnames.  It should match the parsed first and last names from the .doc filenames.

Is there any workaround for this if the dictionary wont allow dups?  I can only think of 2 dimensional arrays.  Any thoughts?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 36575206
>>So the names will have duplicates.  That's why i want to compare both the firstname and lastnames.

Let me explain.  Your list has duplicate whole names (firstname and lastname combined).  It doesn't matter whether you iterate an array or search a dictionary/collection.  I can prevent the error you encountered with the dictionary object very easily.  However, you need to think about this scenario.

Your Excel list has 3 rows like this
ID	Last	First
000134	Roberts	Bob
001236	Roberts	Bob
000012	Roberts	Bob

Open in new window


When you encounter a file like
Bob Roberts-78987654_.doc
what number should be used for the new file name? (000134, 001236, 000012)

So, this is a processing and data decision for you.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:m3mdicl
ID: 36575368
I see. Okay if dups have been found to exists, don't rename the file instead move it to another folder named "duplicates" some thing like that.

So to recap.

If found a perfect match = Rename file
If not found = move the file to a folder named "unmatched"
if dups found = move the file to a folder named "duplicate"

Thanks for the help.
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 36575775
The following code assumes that the files are in the current directory and that the unmatched and duplicate sub-directories exist.

Option Explicit

Public Sub FileRename()
    Set xlApp = New Excel.Application
    Set xlWbk = xlApp.Workbooks.Open("C:\EMR\drv_patient_list.xls")
    Set xlWKS = xlWbk.Sheets(1)
    
    Dim lngRowCount As Long
    Dim lngLoop As Long
    Dim dicNames As Object
    Dim strFilename As String
    Dim strNamePart() As String
    Set dicNames = CreateObject("Scripting.Dictionary")
    
    'get data from workbook
    arv = xlWKS.Range("A2:C" & xlWKS.UsedRange.Rows.Count).Value
    xlWbk.Close False
    xlApp.Quit
    
    'populate dictionary with name and number data
    lngRowCount = UBound(arv)
    For lngLoop = 1 To lngRowCount
        If dicNames.Exists(arv(lngLoop, 3) & " " & arv(lngLoop, 2)) Then
            dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), -1
        Else
            dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), arv(lngLoop, 1)
        End If
    Next lngLoop
    
    'iterate files and rename if match in dictionary
    'move file to unmatched sub-directory if no name match
    'move file to duplicate sub-directory if Excel list for this whole name was a duplicate
    'NOTE: assume that .\unmatched and .\duplicate sub-directories exist
    strFilename = Dir("*.doc")
    Do Until Len(strFilename) = 0
        If LCase(strFilename) Like "#*_*-#_.doc" Then   'skip already renamed files
        Else
            strNamePart = Split(strFilename, "-")
            If dicNames.Exists(strNamePart(0)) Then
                If Sgn(dicNames(strNamePart(0))) = -1 Then
                    'duplicate
                    FileCopy strFilename, ".\duplicate\" & strFilename
                    Kill strFilename
                Else
                    Name strFilename As dicNames(strNamePart(0)) & "_" & strFilename
                End If
            Else
                'unmatched
                FileCopy strFilename, ".\unmatched\" & strFilename
                Kill strFilename
            End If
        End If
        strFilename = Dir
    Loop

    Set dicNames = Nothing

End Sub

Open in new window

0
 

Author Comment

by:m3mdicl
ID: 36580934
Hi I still get an error on line 24.

dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), -1

same error.  The key is already associated with an element of this collection.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 36580993
oops.  I was too quick with the copy/paste and didn't go back and change the line before posting here.

Line 24 should be

            dicNames(arv(lngLoop, 3) & " " & arv(lngLoop, 2))=-1

Open in new window


0
 

Author Comment

by:m3mdicl
ID: 36581102
Hi one quick question.  What data type should arv variable be?  It's not working I declared arv as variant type.
0
 

Author Comment

by:m3mdicl
ID: 36581231
Hi I modified the codes to work. I added this API to create folders automatically
Private Declare Function MakeSureDirectoryPathExists Lib "imagehlp.dll" (ByVal lpPath As String) As Long

I also put a DirListBox control to make it easier.
Dir1.Path = is the current path.  

The problem is even if I have a legit document matching it's not renaming it instead it moved to unmatch folder.
Here is the complete code.

Option Explicit
Private Declare Function MakeSureDirectoryPathExists Lib "imagehlp.dll" (ByVal lpPath As String) As Long
Dim xlApp As Excel.Application
Dim xlWbk As Excel.Workbook
Dim xlWks As Excel.Worksheet
Dim arv() As Variant

Public Sub FileRename()
    Set xlApp = New Excel.Application
    Set xlWbk = xlApp.Workbooks.Open("C:\EMR\drv_patient_list.xls")
    Set xlWks = xlWbk.Sheets(1)
    
    Dim lngRowCount As Long
    Dim lngLoop As Long
    Dim dicNames As Object
    Dim strFilename As String
    Dim strNamePart() As String
    Set dicNames = CreateObject("Scripting.Dictionary")
    
    'get data from workbook
    arv = xlWks.Range("A2:C" & xlWks.UsedRange.Rows.Count).Value
    xlWbk.Close False
    xlApp.Quit
    
    'populate dictionary with name and number data
    lngRowCount = UBound(arv)
    For lngLoop = 1 To lngRowCount
        If dicNames.Exists(arv(lngLoop, 3) & " " & arv(lngLoop, 2)) Then
            dicNames(arv(lngLoop, 3) & " " & arv(lngLoop, 2)) = -1
        Else
            dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), arv(lngLoop, 1)
        End If
    Next lngLoop
    
    'iterate files and rename if match in dictionary
    'move file to unmatched sub-directory if no name match
    'move file to duplicate sub-directory if Excel list for this whole name was a duplicate
    'NOTE: assume that .\unmatched and .\duplicate sub-directories exist

    strFilename = Dir(Dir1.Path & "\" & "*.doc")
    Do Until Len(strFilename) = 0
        If LCase(strFilename) Like "#*_*-#_.doc" Then   'skip already renamed files
        Else
            strNamePart = Split(strFilename, "-")
            If dicNames.Exists(strNamePart(0)) Then
                If Sgn(dicNames(strNamePart(0))) = -1 Then
                    'duplicate
                    MakeSureDirectoryPathExists Dir1.Path & "\duplicate\"
                    FileCopy strFilename, "\duplicate\" & strFilename
                    Kill strFilename
                Else
                    Name strFilename As dicNames(strNamePart(0)) & "_" & strFilename
                End If
            Else
                'unmatched
                MakeSureDirectoryPathExists Dir1.Path & "\unmatched\"
                FileCopy Dir1.Path & "\" & strFilename, Dir1.Path & "\unmatched\" & strFilename
                Kill Dir1.Path & "\" & strFilename
            End If
        End If
        strFilename = Dir
    Loop

    Set dicNames = Nothing

End Sub
Private Sub cmdConvert_Click()
    Call FileRename
End Sub

Private Sub Form_Load()
    Dir1.Path = "c:"
End Sub

Open in new window

0
 

Author Comment

by:m3mdicl
ID: 36581424
Hi.  I figured it out.  Seems that the names added has more space in between the first name and last name.

Debug.print (arv(lngLoop, 3) & " " & arv(lngLoop, 2))
Bryan                  Adams

I just added trim and it worked fine.

 'populate dictionary with name and number data
    lngRowCount = UBound(arv)
    For lngLoop = 1 To lngRowCount
        If dicNames.Exists(Trim(arv(lngLoop, 3)) & " " & Trim(arv(lngLoop, 2))) Then
            dicNames(Trim(arv(lngLoop, 3)) & " " & Trim(arv(lngLoop, 2))) = -1
        Else
            dicNames.Add Trim(arv(lngLoop, 3)) & " " & Trim(arv(lngLoop, 2)), Trim(arv(lngLoop, 1))
        End If
    Next lngLoop

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 36581480
You are pretty close.  I moved the two API calls up to the start of the routine.

I think the reason that the results for matched files is wrong is that your description of the file names differs from your description of the Excel data.  Please post a directory list (use the /b switch) and the worksheet.  Please understand that any extra characters in the file name or the Excel data will cause this to misbehave in the way you described (nothing would match).

Option Explicit

Private Declare Function MakeSureDirectoryPathExists Lib "imagehlp.dll" (ByVal lpPath As String) As Long

Dim xlApp As Excel.Application
Dim xlWbk As Excel.Workbook
Dim xlWks As Excel.Worksheet
Dim arv() As Variant

Public Sub FileRename()
    Set xlApp = New Excel.Application
    Set xlWbk = xlApp.Workbooks.Open("C:\EMR\drv_patient_list.xls")
    Set xlWks = xlWbk.Sheets(1)
    
    Dim lngRowCount As Long
    Dim lngLoop As Long
    Dim dicNames As Object
    Dim strFilename As String
    Dim strNamePart() As String

     MakeSureDirectoryPathExists Dir1.Path & "\duplicate\"
     MakeSureDirectoryPathExists Dir1.Path & "\unmatched\"

    Set dicNames = CreateObject("Scripting.Dictionary")
    
    'get data from workbook
    arv = xlWks.Range("A2:C" & xlWks.UsedRange.Rows.Count).Value
    xlWbk.Close False
    xlApp.Quit
    
    'populate dictionary with name and number data
    lngRowCount = UBound(arv)
    For lngLoop = 1 To lngRowCount
        If dicNames.Exists(arv(lngLoop, 3) & " " & arv(lngLoop, 2)) Then
            dicNames(arv(lngLoop, 3) & " " & arv(lngLoop, 2)) = -1
        Else
            dicNames.Add arv(lngLoop, 3) & " " & arv(lngLoop, 2), arv(lngLoop, 1)
        End If
    Next lngLoop
    
    'iterate files and rename if match in dictionary
    'move file to unmatched sub-directory if no name match
    'move file to duplicate sub-directory if Excel list for this whole name was a duplicate
    'NOTE: assume that .\unmatched and .\duplicate sub-directories exist

    strFilename = Dir(Dir1.Path & "\" & "*.doc")
    Do Until Len(strFilename) = 0
        If LCase(strFilename) Like "#*_*-#_.doc" Then   'skip already renamed files
        Else
            strNamePart = Split(strFilename, "-")
            If dicNames.Exists(strNamePart(0)) Then
                If Sgn(dicNames(strNamePart(0))) = -1 Then
                    'duplicate
                    FileCopy strFilename, "\duplicate\" & strFilename
                    Kill strFilename
                Else
                    Name strFilename As dicNames(strNamePart(0)) & "_" & strFilename
                End If
            Else
                'unmatched
                FileCopy Dir1.Path & "\" & strFilename, Dir1.Path & "\unmatched\" & strFilename
                Kill Dir1.Path & "\" & strFilename
            End If
        End If
        strFilename = Dir
    Loop

    Set dicNames = Nothing

End Sub
Private Sub cmdConvert_Click()
    Call FileRename
End Sub

Private Sub Form_Load()
    Dir1.Path = "c:"
End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 36581494
Ah.  That was it, then.  I'm glad you found it.  I posted my last comment prior to seeing your concluding comment.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now