• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 179
  • Last Modified:

Fastest way to search 5000 Excel Files in a directory

I have 5000 Excel files in a directory.  I need to go through all of them and search for some text and change a cell if the text is found.  I'm looking for an alternative to my current method of Automation, where I open, search and close each file.  Openning and closing each file takes to long.
0
jbauer22
Asked:
jbauer22
1 Solution
 
ophirgCommented:
Hi.

I have something that might work, but only in some cases.

Try refering each cell in some sheet like this:

='full path[filename.xls]sheet name'!A1

The excel appears to be doing a much better job in openning files.

Ophir.
0
 
inthedarkCommented:
You must have a reason for not using a database?

But if you quikly find all of the files that contain the text, you can then use your current method to change the file. This method should take just a few seconds, but will increas if loads of files have to be changed.

If the files are not encrypted you could find the text this way:

redim fls$(0)

dim fl as long
fim flc as long

' read all file names
fl=-1
D$=Dir(YourFolder + "\*.XLS")
Do While Len(D$)
   fl=fl+1
   if fl>ubound(fls) Then
     redim preserve fls(fl+1000)
   end if
   fls(fl)=d$
   D$ = Dir() ' ** warning this can fail
Loop

if fl<0 then
  msgbox "No files found"
  exit sub ?
end if

redim preserve fls(fl)

For FLc = 0 to fl
   d$=fls(flc)
   DT$ = ReadFile(D$)
   If Instr(ucase$(D$),SearchText)>0 then
     CallChangeThisFile YourFolder + "\" + D$
   End if
Next

etc.



Public Function ReadFile(filename As String) As String

Dim wlfn As Long

wlfn = FreeFile
On Error Resume Next

If Len(Dir(filename)) > 0 Then
    If Err.Number Then
        ReadFile = "Invalid resource file path: " + filename
        Exit Function
    End If
   
    Open filename For Input Shared As #wlfn
    ReadFile = Input(LOF(wlfn), wlfn)
    Close wlfn
Else
    ReadFile = "ERROR"
End If

End Function
0
 
inthedarkCommented:
woops....

I used d$, should have been dt$
 but you get the drift of fast search

For FLc = 0 to fl
  d$=fls(flc)
  DT$ = ReadFile(D$)
  If Instr(ucase$(DT$),SearchText)>0 then ' corrected    CallChangeThisFile YourFolder + "\" + D$
  End if
Next

0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
Ryan ChongCommented:
The faster way to search files in folders should be using API, take a look for the example below:

http://www.mvps.org/vbnet/code/fileapi/recursivefiles_minimal.htm

By the way, the time to Open, Read then Close the Excel files may not able to be faster as it's handle by Excel Object Library.
0
 
mcoopCommented:
Are you looking for cells with specific text, or filenames with specific text...

For data wihin cells...

The excel 'find' function is very fast when called from VB - via automation.    On the other hand you do have to open every file....  Although 5000 files isn;t that many - I'd say around an hour to open and process on a fast machine... (of course saving the file after the change will double this time - or more)

I suppose if someone has some code that understands the raw Excel file format, you may have some luck there - but will likely have problems when each new version of Excel is released.

0
 
inthedarkCommented:
There is another way which would only take a few seconds to run. But would work better is most of the files did not change to often.

1) Load each Excel document, then save as a CSV file - this sould only take a split second.

2) Scan the csv file and extract all of the words and save then in a database. You would need 3 tables: documents, words and word by document index.  You would save each word in a table so it could have a numeric key to enable very fast searching. Give each document a numeric key too.  If the scan finds the same document already exists, revise the previous words. Then Save the date last changed in the table.

3) Create a service which checks the date/time last changed, recent changes can then be update in the database.

Each word would be given a numeric key. As the word and document index key would only be 8 bytes you could search  millions of words in a split second.  It would only take a few minutes to create the code.

Just a thought......
0
 
mcoopCommented:
On deeper thought... work on a long term solution...
Migrate to a database.
0
 
inthedarkCommented:
I'd go with mcoop's suggestion....but bosses can sometimes only see one colour.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now