Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 175
  • Last Modified:

Fastest way to search 5000 Excel Files in a directory

I have 5000 Excel files in a directory.  I need to go through all of them and search for some text and change a cell if the text is found.  I'm looking for an alternative to my current method of Automation, where I open, search and close each file.  Openning and closing each file takes to long.
0
jbauer22
Asked:
jbauer22
1 Solution
 
ophirgCommented:
Hi.

I have something that might work, but only in some cases.

Try refering each cell in some sheet like this:

='full path[filename.xls]sheet name'!A1

The excel appears to be doing a much better job in openning files.

Ophir.
0
 
inthedarkCommented:
You must have a reason for not using a database?

But if you quikly find all of the files that contain the text, you can then use your current method to change the file. This method should take just a few seconds, but will increas if loads of files have to be changed.

If the files are not encrypted you could find the text this way:

redim fls$(0)

dim fl as long
fim flc as long

' read all file names
fl=-1
D$=Dir(YourFolder + "\*.XLS")
Do While Len(D$)
   fl=fl+1
   if fl>ubound(fls) Then
     redim preserve fls(fl+1000)
   end if
   fls(fl)=d$
   D$ = Dir() ' ** warning this can fail
Loop

if fl<0 then
  msgbox "No files found"
  exit sub ?
end if

redim preserve fls(fl)

For FLc = 0 to fl
   d$=fls(flc)
   DT$ = ReadFile(D$)
   If Instr(ucase$(D$),SearchText)>0 then
     CallChangeThisFile YourFolder + "\" + D$
   End if
Next

etc.



Public Function ReadFile(filename As String) As String

Dim wlfn As Long

wlfn = FreeFile
On Error Resume Next

If Len(Dir(filename)) > 0 Then
    If Err.Number Then
        ReadFile = "Invalid resource file path: " + filename
        Exit Function
    End If
   
    Open filename For Input Shared As #wlfn
    ReadFile = Input(LOF(wlfn), wlfn)
    Close wlfn
Else
    ReadFile = "ERROR"
End If

End Function
0
 
inthedarkCommented:
woops....

I used d$, should have been dt$
 but you get the drift of fast search

For FLc = 0 to fl
  d$=fls(flc)
  DT$ = ReadFile(D$)
  If Instr(ucase$(DT$),SearchText)>0 then ' corrected    CallChangeThisFile YourFolder + "\" + D$
  End if
Next

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ryan ChongCommented:
The faster way to search files in folders should be using API, take a look for the example below:

http://www.mvps.org/vbnet/code/fileapi/recursivefiles_minimal.htm

By the way, the time to Open, Read then Close the Excel files may not able to be faster as it's handle by Excel Object Library.
0
 
mcoopCommented:
Are you looking for cells with specific text, or filenames with specific text...

For data wihin cells...

The excel 'find' function is very fast when called from VB - via automation.    On the other hand you do have to open every file....  Although 5000 files isn;t that many - I'd say around an hour to open and process on a fast machine... (of course saving the file after the change will double this time - or more)

I suppose if someone has some code that understands the raw Excel file format, you may have some luck there - but will likely have problems when each new version of Excel is released.

0
 
inthedarkCommented:
There is another way which would only take a few seconds to run. But would work better is most of the files did not change to often.

1) Load each Excel document, then save as a CSV file - this sould only take a split second.

2) Scan the csv file and extract all of the words and save then in a database. You would need 3 tables: documents, words and word by document index.  You would save each word in a table so it could have a numeric key to enable very fast searching. Give each document a numeric key too.  If the scan finds the same document already exists, revise the previous words. Then Save the date last changed in the table.

3) Create a service which checks the date/time last changed, recent changes can then be update in the database.

Each word would be given a numeric key. As the word and document index key would only be 8 bytes you could search  millions of words in a split second.  It would only take a few minutes to create the code.

Just a thought......
0
 
mcoopCommented:
On deeper thought... work on a long term solution...
Migrate to a database.
0
 
inthedarkCommented:
I'd go with mcoop's suggestion....but bosses can sometimes only see one colour.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now