Solved

Fastest way to search 5000 Excel Files in a directory

Posted on 2002-07-29
8
151 Views
Last Modified: 2012-05-04
I have 5000 Excel files in a directory.  I need to go through all of them and search for some text and change a cell if the text is found.  I'm looking for an alternative to my current method of Automation, where I open, search and close each file.  Openning and closing each file takes to long.
0
Comment
Question by:jbauer22
8 Comments
 
LVL 1

Expert Comment

by:ophirg
ID: 7187107
Hi.

I have something that might work, but only in some cases.

Try refering each cell in some sheet like this:

='full path[filename.xls]sheet name'!A1

The excel appears to be doing a much better job in openning files.

Ophir.
0
 
LVL 17

Accepted Solution

by:
inthedark earned 150 total points
ID: 7187174
You must have a reason for not using a database?

But if you quikly find all of the files that contain the text, you can then use your current method to change the file. This method should take just a few seconds, but will increas if loads of files have to be changed.

If the files are not encrypted you could find the text this way:

redim fls$(0)

dim fl as long
fim flc as long

' read all file names
fl=-1
D$=Dir(YourFolder + "\*.XLS")
Do While Len(D$)
   fl=fl+1
   if fl>ubound(fls) Then
     redim preserve fls(fl+1000)
   end if
   fls(fl)=d$
   D$ = Dir() ' ** warning this can fail
Loop

if fl<0 then
  msgbox "No files found"
  exit sub ?
end if

redim preserve fls(fl)

For FLc = 0 to fl
   d$=fls(flc)
   DT$ = ReadFile(D$)
   If Instr(ucase$(D$),SearchText)>0 then
     CallChangeThisFile YourFolder + "\" + D$
   End if
Next

etc.



Public Function ReadFile(filename As String) As String

Dim wlfn As Long

wlfn = FreeFile
On Error Resume Next

If Len(Dir(filename)) > 0 Then
    If Err.Number Then
        ReadFile = "Invalid resource file path: " + filename
        Exit Function
    End If
   
    Open filename For Input Shared As #wlfn
    ReadFile = Input(LOF(wlfn), wlfn)
    Close wlfn
Else
    ReadFile = "ERROR"
End If

End Function
0
 
LVL 17

Expert Comment

by:inthedark
ID: 7187185
woops....

I used d$, should have been dt$
 but you get the drift of fast search

For FLc = 0 to fl
  d$=fls(flc)
  DT$ = ReadFile(D$)
  If Instr(ucase$(DT$),SearchText)>0 then ' corrected    CallChangeThisFile YourFolder + "\" + D$
  End if
Next

0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 50

Expert Comment

by:Ryan Chong
ID: 7187270
The faster way to search files in folders should be using API, take a look for the example below:

http://www.mvps.org/vbnet/code/fileapi/recursivefiles_minimal.htm

By the way, the time to Open, Read then Close the Excel files may not able to be faster as it's handle by Excel Object Library.
0
 
LVL 4

Expert Comment

by:mcoop
ID: 7187439
Are you looking for cells with specific text, or filenames with specific text...

For data wihin cells...

The excel 'find' function is very fast when called from VB - via automation.    On the other hand you do have to open every file....  Although 5000 files isn;t that many - I'd say around an hour to open and process on a fast machine... (of course saving the file after the change will double this time - or more)

I suppose if someone has some code that understands the raw Excel file format, you may have some luck there - but will likely have problems when each new version of Excel is released.

0
 
LVL 17

Expert Comment

by:inthedark
ID: 7187543
There is another way which would only take a few seconds to run. But would work better is most of the files did not change to often.

1) Load each Excel document, then save as a CSV file - this sould only take a split second.

2) Scan the csv file and extract all of the words and save then in a database. You would need 3 tables: documents, words and word by document index.  You would save each word in a table so it could have a numeric key to enable very fast searching. Give each document a numeric key too.  If the scan finds the same document already exists, revise the previous words. Then Save the date last changed in the table.

3) Create a service which checks the date/time last changed, recent changes can then be update in the database.

Each word would be given a numeric key. As the word and document index key would only be 8 bytes you could search  millions of words in a split second.  It would only take a few minutes to create the code.

Just a thought......
0
 
LVL 4

Expert Comment

by:mcoop
ID: 7187547
On deeper thought... work on a long term solution...
Migrate to a database.
0
 
LVL 17

Expert Comment

by:inthedark
ID: 7187557
I'd go with mcoop's suggestion....but bosses can sometimes only see one colour.
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question