Solved

Read & manipulate large text file

Posted on 1998-12-02
20
220 Views
Last Modified: 2013-12-25
Using VB5
Do you know of a control (or something) that would allow me to:
1) Read an plain ASCII text file of any size (usually no more than 1-2mb) very fast
2) Search to the first occurance of a given word (very fast)
3) get the line number of that word
4) Return the contents of any one line by line number

The text may be line delimited with chr$s 10&13 or with only chr$(13) as in the case of when the file originates on a MacIntosh.

Items 1 & 2 can be reversed making item 3 always 1 (or 0) if that would work better-faster.

Thanks for your help.
0
Comment
Question by:EEI
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 4
  • 2
  • +2
20 Comments
 
LVL 3

Expert Comment

by:traygreen
ID: 1488652
Try the following code if you are willing to open the file.
If you're looking for a fast Grep like util, you might want to keep looking
Option Explicit

Const cSEARCHTXT = "only"

Private Sub Search()
   Dim MyChar
   Dim LineStr
   Dim LineCount As Long
   
   Open "D:\TEMP\TEST.txt" For Input As #1  ' Open file.
   
   LineCount = 0
   
   Do While Not EOF(1)  ' Loop until end of file.
      MyChar = Input(1, #1)   ' Get one character.
      LineStr = ""
     
      Do While Asc(MyChar) <> 13 And Not EOF(1)
         LineStr = LineStr & MyChar
         MyChar = Input(1, #1)
      Loop
     
      LineCount = LineCount + 1
     
      If InStr(LineStr, cSEARCHTXT) Then
         Exit Do
      End If
   Loop
   Close #1 ' Close file.
   
   If InStr(LineStr, cSEARCHTXT) Then
      MsgBox "The text " & cSEARCHTXT & " was found on line #" & LineCount
   End If
End Sub
Private Sub Command1_Click()
   Call Search
End Sub

0
 
LVL 3

Expert Comment

by:traygreen
ID: 1488653
If the use the above, this will handle returning the line by number....
Private Function GetLine(pTarget As Integer) As String
   Dim MyChar
   Dim LineStr
   Dim LineCount As Long
   
   Open "D:\TEMP\TEST.txt" For Input As #1  ' Open file.
   
   LineCount = 0
   
   Do While Not EOF(1)  ' Loop until end of file.
      MyChar = Input(1, #1)   ' Get one character.
      LineStr = ""
     
      Do While Asc(MyChar) <> 13 And Not EOF(1)
         LineStr = LineStr & MyChar
         MyChar = Input(1, #1)
      Loop
     
      LineCount = LineCount + 1
     
      If LineCount = pTarget Then
         Exit Do
      End If
   Loop
   Close #1 ' Close file.
   
   If LineCount = pTarget Then
      GetLine = LineStr
   Else
      GetLine = "Only " & LineCount & " lines in the file.  Requested line not found."
   End If
   
End Function

0
 

Author Comment

by:EEI
ID: 1488654
I have tried this method before. It is much too slow. These files are often very large and large quantities of them are sometimes batch processed making speed important. I was hoping for a control or something that is done in assembler that will handle this task very fast.

Thanks
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Expert Comment

by:mark2150
ID: 1488655
It's not the speed of the search routine thats killing you. It's the speed of reading a 2MB file in and back out again. You're in I/O limbo more than you're CPU bound. The requirement that it correctly interprit chr(13) only delimited files forces you into bytewise scanning as line based input won't work. The disk I/O is the culpret and no OCX or control is going to be able to help with that.

M

0
 

Author Comment

by:EEI
ID: 1488656
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
ID: 1488657
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
ID: 1488658
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
ID: 1488659
I have used an OCX called MhFileDisplay from MicroHelp in the past in the 16 bit version of this application. This OCX will load a file of virtually any size in a few milliseconds. The OCX has a search property but it returns the byte offset and not the line number of the found text. So I still have to do a lot of string manupilation to process the contents. But file loading is fast.  When I wanted to go to VB5 the MicroHelp OXCs will not register. I have talked to BeCubed (now supporting MicroHelp) and Wise tech support. Together we have determined that all of the dependents are included etc. and we have not been able to find the problem. So I thought it would be nice to just replace this MhFileDisplay with something that is perhaps better and faster anyway.

I hope you can come up with something.

Thanks again
0
 

Author Comment

by:EEI
ID: 1488660
I kept getting an Internal Server Error back so I kept resending. Sorry about all the entries. Hope they can clean it up.
0
 
LVL 2

Expert Comment

by:cedricd
ID: 1488661
first solution.

Did you try to put all the file into a table,
make a sql command like select field from table where field like "..."

openen a recordset with this command and on the recordset search the line

or
second solution
make a calcul with the offset (a line is 255 char long so a char = 2 bytes --> 255 * 2 * 8 = 510 * 8 = 4080 bits

good luck :-)
0
 

Author Comment

by:EEI
ID: 1488662
Thats pretty close but I do not have a table with a search property. If I loop through the table in VB to search it will take too long.

I don't quite understand the part of your first solution that addresses the search aspect of the problem. Also, I am not quite sure how to apply your second solution.

Perhaps if you could make an example.

Thanks
0
 

Author Comment

by:EEI
ID: 1488663
Thats pretty close but I do not have a table with a search property. If I loop through the table in VB to search it will take too long.

I don't quite understand the part of your first solution that addresses the search aspect of the problem. Also, I am not quite sure how to apply your second solution.

Perhaps if you could make an example.

Thanks
0
 
LVL 2

Expert Comment

by:cedricd
ID: 1488664
For the first solution you can create an temporaly access database in which you can create an temporaly table.
When it's done,read the file and put all line into this table
ex : field
     line 1
     line 2
     line 3 etc..

when it's done,you can make a select query

set rs = db.openrecordset("Select * from table where field like ""string*""",dbopendynaset)

this command will find all the field beginning by string,
if you want to search the field containing string then search for *string*, and if you want to search exactly string then search for string.

ex : like '*string*'
     like 'string'

i used this method for making a code analyser (for different language as pl1, cobol, Jcl, etc..) to search no compliant year 2000 date.

it worked very well.

If you want to work absolutly with the OCX then you to find a formule to calculate the position of the string using the offset.
But it's too hard i think. (you will gain time by using my first method).

A third solution is to use the instr() function but i think that it will be too long.

if you want the code then post a comment to ask it before accepting or rejecting this answer.
0
 

Author Comment

by:EEI
ID: 1488665
cedricd,  I have not worked with an access db in VB before so please hold my hand a little. Perhaps send a working example. Something that will demonstrate the four steps I initially outlined.

Thanks.
0
 
LVL 2

Expert Comment

by:cedricd
ID: 1488666
i'll give you an example with creating a new database and a new table with vb.

dim madb as database
dim tablenew as tabledef

Set madb = CreateDatabase(App.Path & "\Working", dbLangGeneral)
set tablenew = madb.createtabledef("tablename")
With tableNew
      .Fields.Append .CreateField("FirstName", dbText)
      .Fields.Append .CreateField("LastName", dbText)
      .Fields.Append .CreateField("Phone", dbText)
      .Fields.Append .CreateField("Notes", dbMemo)
End with
            
madb.TableDefs.Append tableNew
madb.close
now the table is created.
opening the database and opening a recordset to write the file on it.

Initializing the table
sql = "Delete * from tablenew"
set madb = opendatabase(app.path+"\working\madb.mdb")
madb.execute sql
set rs = db.openrecordset(tablenew,dbopentable)

opening file as 1
reading on buffer
cpt = 1
while not eof(1)
   rs.addnew
   rs!field1 = buffer
   rs!field2 = cpt
   rs.update
   cpt = cpt + 1
   reading on buffer (string variable)  
wend

now searching the string.

sql = "Select field from tablenew where field like '*" + string + "*'"

set rs = db.openrecordset(sql,dbopendynaset)

rs will contain all field containing the string ex : aastring aa
if you want to search exactly the string then like '"+string+"'"

numline = rs!field2
line = rs!field1            
0
 
LVL 2

Expert Comment

by:cedricd
ID: 1488667
i forget to tell you one thing : don't use dbmemo for your field because you can't make a compare with this type of field (prefere the dbtext).

0
 

Author Comment

by:EEI
ID: 1488668
cedricd

After trying your example I found it did work with my VB. So I did some research on SQL. I have seen it arround but had not had need for it in my applications so far. I also learned that it is an addon that must be purchased seperately for VB. It looks like overkill for what I am trying to do. It seems like a lot to put into my application just to read and search an ASCII file. Like delivering a toaster with a Mac truck. I prefer something small, slick and simple.

I know you are really trying hard to find a solution for me. I appreciate it greatly. Perhaps there is just no simple solution availible. It seems a shame that this should be so difficult. Fourteen years ago I could have written a program in assembler for the Commodore 64 to do this (of course not for a 2mb file). Those were the days when computers were simple:)
0
 
LVL 3

Accepted Solution

by:
covington earned 200 total points
ID: 1488669
Look into Videosoft's VS-OCX control which has an 'awk' component. It's extremely fast doing exactly the tasks you want. You can download a demo version at
<a href = "www.videosoft.com">www.videosoft.com</a>
0
 

Author Comment

by:EEI
ID: 1488670
covington

I have aquired the control and it works great! I knew there should be something out there that would do this. I have tried it with a 3mb file and it loads it in amazingly fast, just a few milliseconds. The search takes about 19 seconds but that's a lot faster than Word Pad can find it.

Thanks!
0
 

Author Comment

by:EEI
ID: 1488671
covington

I have aquired the control and it works great! I knew there should be something out there that would do this. I have tried it with a 3mb file and it loads it in amazingly fast, just a few milliseconds. The search takes about 19 seconds but that's a lot faster than Word Pad can find it.

Thanks!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code o…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question