Link to home
Start Free TrialLog in
Avatar of BHForum
BHForum

asked on

VBS or Batch - Search for string and remove rows that match

I will be searching 100MB text files for a string. Any line containing the string will be removed. Below is the code I have currently, but even with a 5MB file it took 8 minutes to search. 2 items I would like if possible:

1. Search faster as there will be multiple search strings to look for and remove
2. Provide a status or progress bar.


Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("c:\server.log", ForReading)

Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine
    If InStr(strLine, "Login Credentials:") = 0 Then
        strNewContents = strNewContents & strLine & vbCrLf
    End If
Loop

objFile.Close

Set objFile = objFSO.OpenTextFile("c:\server.log", ForWriting)
objFile.Write strNewContents

objFile.Close

Open in new window

Avatar of Bill Prew
Bill Prew

Have you considered using a DOS command line util, namely FINDSTR?  It would be trivial code, and may execute faster.  Here's an example of what you could do:

findstr /v /c:"Login Credentials:"  /c:"Other String" c:\server.log > c:\newfile.log

Open in new window

~bp
ASKER CERTIFIED SOLUTION
Avatar of Paul Tomasi
Paul Tomasi
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Paul,
That's not correct, as it will eliminate any line containing any of the search keywords, not the phrase only, and it will not allow for multiple search phrases on the other hand. Bill's code works in respect to both.
I don't know whether we have to consider searching in more than one file - the example does not say so. On the other hand that code would work also for a single file.
I'm not sure whether findstr with all alternative search phrases is the fastest way with "native" tools, or a cascaded find is faster. I'm positive that findstr /L is faster than findstr /R (which is the default), as the checking for semi-regular expressions is skipped.

With find, we would have something like
< c:\server.log find /v "Login Credentials:" | find /v "Other String" > C:\server.new.log

Open in new window


However, native tools do not allow for a progress indicator of any kind. And I don't think you will get anything much faster than your VB code (with a few optimizations applied, like not storing the complete content in memory and writing it at once; using a blocking buffer for reading and writing instead of a line based, and so on).
Avatar of BHForum

ASKER

I thought that a VBS would be faster as I had an earlier question regarding searching through 40,000 files for a string. The files were between 50k and 10's of MB. The batch file took 45 minutes where the VBS took about 4 or 5. This is where my head was when I asked for a VBS solution. As this is a single file, if fast enough I don't need a progress indicator. I will be searching through several different strings, so I will see how this works.

Thanks
Avatar of BHForum

ASKER

Can't see the last posts...but this is what fixed it. Did the search correctly. Now with all of this done, I was able to get the SQL database fixed and no longer need the solution -_-. Thanks a bunch