Solved

Fastest way to search 5000 Excel Files in a directory

Posted on 2002-07-29
8
153 Views
Last Modified: 2012-05-04
I have 5000 Excel files in a directory.  I need to go through all of them and search for some text and change a cell if the text is found.  I'm looking for an alternative to my current method of Automation, where I open, search and close each file.  Openning and closing each file takes to long.
0
Comment
Question by:jbauer22
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 1

Expert Comment

by:ophirg
ID: 7187107
Hi.

I have something that might work, but only in some cases.

Try refering each cell in some sheet like this:

='full path[filename.xls]sheet name'!A1

The excel appears to be doing a much better job in openning files.

Ophir.
0
 
LVL 17

Accepted Solution

by:
inthedark earned 150 total points
ID: 7187174
You must have a reason for not using a database?

But if you quikly find all of the files that contain the text, you can then use your current method to change the file. This method should take just a few seconds, but will increas if loads of files have to be changed.

If the files are not encrypted you could find the text this way:

redim fls$(0)

dim fl as long
fim flc as long

' read all file names
fl=-1
D$=Dir(YourFolder + "\*.XLS")
Do While Len(D$)
   fl=fl+1
   if fl>ubound(fls) Then
     redim preserve fls(fl+1000)
   end if
   fls(fl)=d$
   D$ = Dir() ' ** warning this can fail
Loop

if fl<0 then
  msgbox "No files found"
  exit sub ?
end if

redim preserve fls(fl)

For FLc = 0 to fl
   d$=fls(flc)
   DT$ = ReadFile(D$)
   If Instr(ucase$(D$),SearchText)>0 then
     CallChangeThisFile YourFolder + "\" + D$
   End if
Next

etc.



Public Function ReadFile(filename As String) As String

Dim wlfn As Long

wlfn = FreeFile
On Error Resume Next

If Len(Dir(filename)) > 0 Then
    If Err.Number Then
        ReadFile = "Invalid resource file path: " + filename
        Exit Function
    End If
   
    Open filename For Input Shared As #wlfn
    ReadFile = Input(LOF(wlfn), wlfn)
    Close wlfn
Else
    ReadFile = "ERROR"
End If

End Function
0
 
LVL 17

Expert Comment

by:inthedark
ID: 7187185
woops....

I used d$, should have been dt$
 but you get the drift of fast search

For FLc = 0 to fl
  d$=fls(flc)
  DT$ = ReadFile(D$)
  If Instr(ucase$(DT$),SearchText)>0 then ' corrected    CallChangeThisFile YourFolder + "\" + D$
  End if
Next

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 51

Expert Comment

by:Ryan Chong
ID: 7187270
The faster way to search files in folders should be using API, take a look for the example below:

http://www.mvps.org/vbnet/code/fileapi/recursivefiles_minimal.htm

By the way, the time to Open, Read then Close the Excel files may not able to be faster as it's handle by Excel Object Library.
0
 
LVL 4

Expert Comment

by:mcoop
ID: 7187439
Are you looking for cells with specific text, or filenames with specific text...

For data wihin cells...

The excel 'find' function is very fast when called from VB - via automation.    On the other hand you do have to open every file....  Although 5000 files isn;t that many - I'd say around an hour to open and process on a fast machine... (of course saving the file after the change will double this time - or more)

I suppose if someone has some code that understands the raw Excel file format, you may have some luck there - but will likely have problems when each new version of Excel is released.

0
 
LVL 17

Expert Comment

by:inthedark
ID: 7187543
There is another way which would only take a few seconds to run. But would work better is most of the files did not change to often.

1) Load each Excel document, then save as a CSV file - this sould only take a split second.

2) Scan the csv file and extract all of the words and save then in a database. You would need 3 tables: documents, words and word by document index.  You would save each word in a table so it could have a numeric key to enable very fast searching. Give each document a numeric key too.  If the scan finds the same document already exists, revise the previous words. Then Save the date last changed in the table.

3) Create a service which checks the date/time last changed, recent changes can then be update in the database.

Each word would be given a numeric key. As the word and document index key would only be 8 bytes you could search  millions of words in a split second.  It would only take a few minutes to create the code.

Just a thought......
0
 
LVL 4

Expert Comment

by:mcoop
ID: 7187547
On deeper thought... work on a long term solution...
Migrate to a database.
0
 
LVL 17

Expert Comment

by:inthedark
ID: 7187557
I'd go with mcoop's suggestion....but bosses can sometimes only see one colour.
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question