Solved

Read & manipulate large text file

Posted on 1998-12-02
20
209 Views
Last Modified: 2013-12-25
Using VB5
Do you know of a control (or something) that would allow me to:
1) Read an plain ASCII text file of any size (usually no more than 1-2mb) very fast
2) Search to the first occurance of a given word (very fast)
3) get the line number of that word
4) Return the contents of any one line by line number

The text may be line delimited with chr$s 10&13 or with only chr$(13) as in the case of when the file originates on a MacIntosh.

Items 1 & 2 can be reversed making item 3 always 1 (or 0) if that would work better-faster.

Thanks for your help.
0
Comment
Question by:EEI
  • 12
  • 4
  • 2
  • +2
20 Comments
 
LVL 3

Expert Comment

by:traygreen
Comment Utility
Try the following code if you are willing to open the file.
If you're looking for a fast Grep like util, you might want to keep looking
Option Explicit

Const cSEARCHTXT = "only"

Private Sub Search()
   Dim MyChar
   Dim LineStr
   Dim LineCount As Long
   
   Open "D:\TEMP\TEST.txt" For Input As #1  ' Open file.
   
   LineCount = 0
   
   Do While Not EOF(1)  ' Loop until end of file.
      MyChar = Input(1, #1)   ' Get one character.
      LineStr = ""
     
      Do While Asc(MyChar) <> 13 And Not EOF(1)
         LineStr = LineStr & MyChar
         MyChar = Input(1, #1)
      Loop
     
      LineCount = LineCount + 1
     
      If InStr(LineStr, cSEARCHTXT) Then
         Exit Do
      End If
   Loop
   Close #1 ' Close file.
   
   If InStr(LineStr, cSEARCHTXT) Then
      MsgBox "The text " & cSEARCHTXT & " was found on line #" & LineCount
   End If
End Sub
Private Sub Command1_Click()
   Call Search
End Sub

0
 
LVL 3

Expert Comment

by:traygreen
Comment Utility
If the use the above, this will handle returning the line by number....
Private Function GetLine(pTarget As Integer) As String
   Dim MyChar
   Dim LineStr
   Dim LineCount As Long
   
   Open "D:\TEMP\TEST.txt" For Input As #1  ' Open file.
   
   LineCount = 0
   
   Do While Not EOF(1)  ' Loop until end of file.
      MyChar = Input(1, #1)   ' Get one character.
      LineStr = ""
     
      Do While Asc(MyChar) <> 13 And Not EOF(1)
         LineStr = LineStr & MyChar
         MyChar = Input(1, #1)
      Loop
     
      LineCount = LineCount + 1
     
      If LineCount = pTarget Then
         Exit Do
      End If
   Loop
   Close #1 ' Close file.
   
   If LineCount = pTarget Then
      GetLine = LineStr
   Else
      GetLine = "Only " & LineCount & " lines in the file.  Requested line not found."
   End If
   
End Function

0
 

Author Comment

by:EEI
Comment Utility
I have tried this method before. It is much too slow. These files are often very large and large quantities of them are sometimes batch processed making speed important. I was hoping for a control or something that is done in assembler that will handle this task very fast.

Thanks
0
 
LVL 12

Expert Comment

by:mark2150
Comment Utility
It's not the speed of the search routine thats killing you. It's the speed of reading a 2MB file in and back out again. You're in I/O limbo more than you're CPU bound. The requirement that it correctly interprit chr(13) only delimited files forces you into bytewise scanning as line based input won't work. The disk I/O is the culpret and no OCX or control is going to be able to help with that.

M

0
 

Author Comment

by:EEI
Comment Utility
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
Comment Utility
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
Comment Utility
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
Comment Utility
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
Comment Utility
I kept getting an Internal Server Error back so I kept resending. Sorry about all the entries. Hope they can clean it up.
0
 
LVL 2

Expert Comment

by:cedricd
Comment Utility
first solution.

Did you try to put all the file into a table,
make a sql command like select field from table where field like "..."

openen a recordset with this command and on the recordset search the line

or
second solution
make a calcul with the offset (a line is 255 char long so a char = 2 bytes --> 255 * 2 * 8 = 510 * 8 = 4080 bits

good luck :-)
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:EEI
Comment Utility
Thats pretty close but I do not have a table with a search property. If I loop through the table in VB to search it will take too long.

I don't quite understand the part of your first solution that addresses the search aspect of the problem. Also, I am not quite sure how to apply your second solution.

Perhaps if you could make an example.

Thanks
0
 

Author Comment

by:EEI
Comment Utility
Thats pretty close but I do not have a table with a search property. If I loop through the table in VB to search it will take too long.

I don't quite understand the part of your first solution that addresses the search aspect of the problem. Also, I am not quite sure how to apply your second solution.

Perhaps if you could make an example.

Thanks
0
 
LVL 2

Expert Comment

by:cedricd
Comment Utility
For the first solution you can create an temporaly access database in which you can create an temporaly table.
When it's done,read the file and put all line into this table
ex : field
     line 1
     line 2
     line 3 etc..

when it's done,you can make a select query

set rs = db.openrecordset("Select * from table where field like ""string*""",dbopendynaset)

this command will find all the field beginning by string,
if you want to search the field containing string then search for *string*, and if you want to search exactly string then search for string.

ex : like '*string*'
     like 'string'

i used this method for making a code analyser (for different language as pl1, cobol, Jcl, etc..) to search no compliant year 2000 date.

it worked very well.

If you want to work absolutly with the OCX then you to find a formule to calculate the position of the string using the offset.
But it's too hard i think. (you will gain time by using my first method).

A third solution is to use the instr() function but i think that it will be too long.

if you want the code then post a comment to ask it before accepting or rejecting this answer.
0
 

Author Comment

by:EEI
Comment Utility
cedricd,  I have not worked with an access db in VB before so please hold my hand a little. Perhaps send a working example. Something that will demonstrate the four steps I initially outlined.

Thanks.
0
 
LVL 2

Expert Comment

by:cedricd
Comment Utility
i'll give you an example with creating a new database and a new table with vb.

dim madb as database
dim tablenew as tabledef

Set madb = CreateDatabase(App.Path & "\Working", dbLangGeneral)
set tablenew = madb.createtabledef("tablename")
With tableNew
      .Fields.Append .CreateField("FirstName", dbText)
      .Fields.Append .CreateField("LastName", dbText)
      .Fields.Append .CreateField("Phone", dbText)
      .Fields.Append .CreateField("Notes", dbMemo)
End with
            
madb.TableDefs.Append tableNew
madb.close
now the table is created.
opening the database and opening a recordset to write the file on it.

Initializing the table
sql = "Delete * from tablenew"
set madb = opendatabase(app.path+"\working\madb.mdb")
madb.execute sql
set rs = db.openrecordset(tablenew,dbopentable)

opening file as 1
reading on buffer
cpt = 1
while not eof(1)
   rs.addnew
   rs!field1 = buffer
   rs!field2 = cpt
   rs.update
   cpt = cpt + 1
   reading on buffer (string variable)  
wend

now searching the string.

sql = "Select field from tablenew where field like '*" + string + "*'"

set rs = db.openrecordset(sql,dbopendynaset)

rs will contain all field containing the string ex : aastring aa
if you want to search exactly the string then like '"+string+"'"

numline = rs!field2
line = rs!field1            
0
 
LVL 2

Expert Comment

by:cedricd
Comment Utility
i forget to tell you one thing : don't use dbmemo for your field because you can't make a compare with this type of field (prefere the dbtext).

0
 

Author Comment

by:EEI
Comment Utility
cedricd

After trying your example I found it did work with my VB. So I did some research on SQL. I have seen it arround but had not had need for it in my applications so far. I also learned that it is an addon that must be purchased seperately for VB. It looks like overkill for what I am trying to do. It seems like a lot to put into my application just to read and search an ASCII file. Like delivering a toaster with a Mac truck. I prefer something small, slick and simple.

I know you are really trying hard to find a solution for me. I appreciate it greatly. Perhaps there is just no simple solution availible. It seems a shame that this should be so difficult. Fourteen years ago I could have written a program in assembler for the Commodore 64 to do this (of course not for a 2mb file). Those were the days when computers were simple:)
0
 
LVL 3

Accepted Solution

by:
covington earned 200 total points
Comment Utility
Look into Videosoft's VS-OCX control which has an 'awk' component. It's extremely fast doing exactly the tasks you want. You can download a demo version at
<a href = "www.videosoft.com">www.videosoft.com</a>
0
 

Author Comment

by:EEI
Comment Utility
covington

I have aquired the control and it works great! I knew there should be something out there that would do this. I have tried it with a 3mb file and it loads it in amazingly fast, just a few milliseconds. The search takes about 19 seconds but that's a lot faster than Word Pad can find it.

Thanks!
0
 

Author Comment

by:EEI
Comment Utility
covington

I have aquired the control and it works great! I knew there should be something out there that would do this. I have tried it with a 3mb file and it loads it in amazingly fast, just a few milliseconds. The search takes about 19 seconds but that's a lot faster than Word Pad can find it.

Thanks!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

I’ve seen a number of people looking for examples of how to access web services from VB6.  I’ve been using a test harness I built in VB6 (using many resources I found online) that I use for small projects to work out how to communicate with web serv…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now