Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 224
  • Last Modified:

Read & manipulate large text file

Using VB5
Do you know of a control (or something) that would allow me to:
1) Read an plain ASCII text file of any size (usually no more than 1-2mb) very fast
2) Search to the first occurance of a given word (very fast)
3) get the line number of that word
4) Return the contents of any one line by line number

The text may be line delimited with chr$s 10&13 or with only chr$(13) as in the case of when the file originates on a MacIntosh.

Items 1 & 2 can be reversed making item 3 always 1 (or 0) if that would work better-faster.

Thanks for your help.
0
EEI
Asked:
EEI
  • 12
  • 4
  • 2
  • +2
1 Solution
 
traygreenCommented:
Try the following code if you are willing to open the file.
If you're looking for a fast Grep like util, you might want to keep looking
Option Explicit

Const cSEARCHTXT = "only"

Private Sub Search()
   Dim MyChar
   Dim LineStr
   Dim LineCount As Long
   
   Open "D:\TEMP\TEST.txt" For Input As #1  ' Open file.
   
   LineCount = 0
   
   Do While Not EOF(1)  ' Loop until end of file.
      MyChar = Input(1, #1)   ' Get one character.
      LineStr = ""
     
      Do While Asc(MyChar) <> 13 And Not EOF(1)
         LineStr = LineStr & MyChar
         MyChar = Input(1, #1)
      Loop
     
      LineCount = LineCount + 1
     
      If InStr(LineStr, cSEARCHTXT) Then
         Exit Do
      End If
   Loop
   Close #1 ' Close file.
   
   If InStr(LineStr, cSEARCHTXT) Then
      MsgBox "The text " & cSEARCHTXT & " was found on line #" & LineCount
   End If
End Sub
Private Sub Command1_Click()
   Call Search
End Sub

0
 
traygreenCommented:
If the use the above, this will handle returning the line by number....
Private Function GetLine(pTarget As Integer) As String
   Dim MyChar
   Dim LineStr
   Dim LineCount As Long
   
   Open "D:\TEMP\TEST.txt" For Input As #1  ' Open file.
   
   LineCount = 0
   
   Do While Not EOF(1)  ' Loop until end of file.
      MyChar = Input(1, #1)   ' Get one character.
      LineStr = ""
     
      Do While Asc(MyChar) <> 13 And Not EOF(1)
         LineStr = LineStr & MyChar
         MyChar = Input(1, #1)
      Loop
     
      LineCount = LineCount + 1
     
      If LineCount = pTarget Then
         Exit Do
      End If
   Loop
   Close #1 ' Close file.
   
   If LineCount = pTarget Then
      GetLine = LineStr
   Else
      GetLine = "Only " & LineCount & " lines in the file.  Requested line not found."
   End If
   
End Function

0
 
EEIAuthor Commented:
I have tried this method before. It is much too slow. These files are often very large and large quantities of them are sometimes batch processed making speed important. I was hoping for a control or something that is done in assembler that will handle this task very fast.

Thanks
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
mark2150Commented:
It's not the speed of the search routine thats killing you. It's the speed of reading a 2MB file in and back out again. You're in I/O limbo more than you're CPU bound. The requirement that it correctly interprit chr(13) only delimited files forces you into bytewise scanning as line based input won't work. The disk I/O is the culpret and no OCX or control is going to be able to help with that.

M

0
 
EEIAuthor Commented:
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 
EEIAuthor Commented:
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 
EEIAuthor Commented:
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 
EEIAuthor Commented:
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 
EEIAuthor Commented:
I kept getting an Internal Server Error back so I kept resending. Sorry about all the entries. Hope they can clean it up.
0
 
cedricdCommented:
first solution.

Did you try to put all the file into a table,
make a sql command like select field from table where field like "..."

openen a recordset with this command and on the recordset search the line

or
second solution
make a calcul with the offset (a line is 255 char long so a char = 2 bytes --> 255 * 2 * 8 = 510 * 8 = 4080 bits

good luck :-)
0
 
EEIAuthor Commented:
Thats pretty close but I do not have a table with a search property. If I loop through the table in VB to search it will take too long.

I don't quite understand the part of your first solution that addresses the search aspect of the problem. Also, I am not quite sure how to apply your second solution.

Perhaps if you could make an example.

Thanks
0
 
EEIAuthor Commented:
Thats pretty close but I do not have a table with a search property. If I loop through the table in VB to search it will take too long.

I don't quite understand the part of your first solution that addresses the search aspect of the problem. Also, I am not quite sure how to apply your second solution.

Perhaps if you could make an example.

Thanks
0
 
cedricdCommented:
For the first solution you can create an temporaly access database in which you can create an temporaly table.
When it's done,read the file and put all line into this table
ex : field
     line 1
     line 2
     line 3 etc..

when it's done,you can make a select query

set rs = db.openrecordset("Select * from table where field like ""string*""",dbopendynaset)

this command will find all the field beginning by string,
if you want to search the field containing string then search for *string*, and if you want to search exactly string then search for string.

ex : like '*string*'
     like 'string'

i used this method for making a code analyser (for different language as pl1, cobol, Jcl, etc..) to search no compliant year 2000 date.

it worked very well.

If you want to work absolutly with the OCX then you to find a formule to calculate the position of the string using the offset.
But it's too hard i think. (you will gain time by using my first method).

A third solution is to use the instr() function but i think that it will be too long.

if you want the code then post a comment to ask it before accepting or rejecting this answer.
0
 
EEIAuthor Commented:
cedricd,  I have not worked with an access db in VB before so please hold my hand a little. Perhaps send a working example. Something that will demonstrate the four steps I initially outlined.

Thanks.
0
 
cedricdCommented:
i'll give you an example with creating a new database and a new table with vb.

dim madb as database
dim tablenew as tabledef

Set madb = CreateDatabase(App.Path & "\Working", dbLangGeneral)
set tablenew = madb.createtabledef("tablename")
With tableNew
      .Fields.Append .CreateField("FirstName", dbText)
      .Fields.Append .CreateField("LastName", dbText)
      .Fields.Append .CreateField("Phone", dbText)
      .Fields.Append .CreateField("Notes", dbMemo)
End with
            
madb.TableDefs.Append tableNew
madb.close
now the table is created.
opening the database and opening a recordset to write the file on it.

Initializing the table
sql = "Delete * from tablenew"
set madb = opendatabase(app.path+"\working\madb.mdb")
madb.execute sql
set rs = db.openrecordset(tablenew,dbopentable)

opening file as 1
reading on buffer
cpt = 1
while not eof(1)
   rs.addnew
   rs!field1 = buffer
   rs!field2 = cpt
   rs.update
   cpt = cpt + 1
   reading on buffer (string variable)  
wend

now searching the string.

sql = "Select field from tablenew where field like '*" + string + "*'"

set rs = db.openrecordset(sql,dbopendynaset)

rs will contain all field containing the string ex : aastring aa
if you want to search exactly the string then like '"+string+"'"

numline = rs!field2
line = rs!field1            
0
 
cedricdCommented:
i forget to tell you one thing : don't use dbmemo for your field because you can't make a compare with this type of field (prefere the dbtext).

0
 
EEIAuthor Commented:
cedricd

After trying your example I found it did work with my VB. So I did some research on SQL. I have seen it arround but had not had need for it in my applications so far. I also learned that it is an addon that must be purchased seperately for VB. It looks like overkill for what I am trying to do. It seems like a lot to put into my application just to read and search an ASCII file. Like delivering a toaster with a Mac truck. I prefer something small, slick and simple.

I know you are really trying hard to find a solution for me. I appreciate it greatly. Perhaps there is just no simple solution availible. It seems a shame that this should be so difficult. Fourteen years ago I could have written a program in assembler for the Commodore 64 to do this (of course not for a 2mb file). Those were the days when computers were simple:)
0
 
covingtonCommented:
Look into Videosoft's VS-OCX control which has an 'awk' component. It's extremely fast doing exactly the tasks you want. You can download a demo version at
<a href = "www.videosoft.com">www.videosoft.com</a>
0
 
EEIAuthor Commented:
covington

I have aquired the control and it works great! I knew there should be something out there that would do this. I have tried it with a 3mb file and it loads it in amazingly fast, just a few milliseconds. The search takes about 19 seconds but that's a lot faster than Word Pad can find it.

Thanks!
0
 
EEIAuthor Commented:
covington

I have aquired the control and it works great! I knew there should be something out there that would do this. I have tried it with a 3mb file and it loads it in amazingly fast, just a few milliseconds. The search takes about 19 seconds but that's a lot faster than Word Pad can find it.

Thanks!
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 12
  • 4
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now