Link to home
Start Free TrialLog in
Avatar of PDF
PDF

asked on

Most efficient way to loop through a txt file and find specific "Block of data" using vb

Hi experts,
I have one question and it is related to design and performance!

Need most efficient way to loop through a txt file and find specific "Block of data”.
That will be inserted into an SQL table. (With SQL part I'm fine)
 Size of file is max 50 MB.
The Data is well-defined and the file looks like:

BEGIN DATA FOR ID XXXXXX
**ERRORS M(20) < 10 INDICATES THAT THE CELL IS SUSPECT**ERRORS**
***ERRORS  jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd;ghds;ghsdfhg;dfhgsfd
***ERRORS**dlhagdlajgdlakjalkfjalflajgfljakgflagdljfgalkjdgf
END DATA OUTPUT FOR ID XXXXXX
Qotiqyrtpoqyptqptqw
etypqytqwetyqwyitqor
BEGIN DATA FOR ID YYYYYY
**ERRORS**jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd;ghds;ghsdfhg;dfhgsfd
**ERRORS jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd;ghds;ghsdfhg;dfhgsfd
**ERRORS**12424335353545454646464664
**ERRORS**vnvcbvxcm,vxcbxc,nvx,cb,xnvx,nb,xmnvx,cm
***ERRORS**
***ERRORS**nvnmnbmnnbnnmmn
END DATA OUTPUT FOR ID YYYYYY
Etc, etc…

I do want to
   1. Parse id
   2. Find the line with "ERRORS"
   3. Build the new record as (id, Comment) like:

Id          Comment  
xxxxxx **ERRORS M(20) < 10 INDICATES THAT THE CELL IS SUSPECT**ERRORS**
xxxxxx ***ERRORS  jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd;ghds;ghsdfhg;dfhgsfd
xxxxxx ***ERRORS**dlhagdlajgdlakjalkfjalflajgfljakgflagdljfgalkjdgf
YYYYYY **ERRORS**jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd;ghds;ghsdfhg;dfhgsfd
YYYYYY **ERRORS jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd;ghds;ghsdfhg;dfhgsfd
YYYYYY **ERRORS**12424335353545454646464664
YYYYYY **ERRORS**vnvcbvxcm,vxcbxc,nvx,cb,xnvx,nb,xmnvx,cm
YYYYYY ***ERRORS**
YYYYYY ***ERRORS**nvnmnbmnnbnnmmn


Code where I  need help (with another loop) :  
Do Until EOF(1)
    Line Input #1, MyTextLine
    LineNo = LineNo + 1
    If Mid(MyTextLine, 1, 17) = "BEGIN DATA FOR ID" Then
    ID = Mid(MyTextLine, 19, 6)
    Debug.Print ID

    '''need another loop to find line with "ERRORS"



    End If
Loop


I hope this is enough information..
Looking forward to a very elegant solution....
Avatar of Martin Liss
Martin Liss
Flag of United States of America image

Unless I misunderstand the question you don't need a second loop

Dim FF As Integer
Dim MyTextLine As String
Dim ID As String

FF = FreeFile

Open "C:\temp\errors.txt" For Input As #FF
Do Until EOF(FF)
    Line Input #FF, MyTextLine
    'LineNo = LineNo + 1
    Select Case True
        Case Mid(MyTextLine, 1, 17) = "BEGIN DATA FOR ID"
        ID = Mid(MyTextLine, 19, 6)
        Debug.Print ID
        Case InStr(1, "*ERRORS", MyTextLine) > 0
            ' write error to SQL
    End Select
Loop
Close

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Martin Liss
Martin Liss
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PDF
PDF

ASKER

MartinLiss,

MartinLiss,

Thank you for your prompt reply.
Works Perfectly!

I replaced line    "Case InStr(1, MyTextLine, "*ERRORS") > 0
           with           "Case InStr(MyTextLine, "ERRORS") > 0
   
Thanks again!
Sincerely
PDF
OK, with or without the "1", the search starts at position one because that's the default and I think it's better to specifically include the value, but in any case you're welcome and I'm glad I was able to help.

In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2014
Avatar of PDF

ASKER

Hi MartinLiss,

Just to be 100% sure that I have all lines in that block of data
That starts with "****errors",I would like  to do Loop.
If are  outputs identical, than I can use accepted solution.
 (In case that line doesn’t start with ****errors but it is still part of block.)
Thank you very much & thank you for link.


I appreciate You a lot!
Just to be 100% sure that I have all lines in that block of data
That starts with "****errors",I would like  to do Loop.
If are  outputs identical, than I can use accepted solution.
 (In case that line doesn’t start with ****errors but it is still part of block.)
Thank you very much & thank you for link.

I'm sorry but I don't know if you are asking me a question.

In any case you're welcome and I'm glad I was able to help.

In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2014
Avatar of PDF

ASKER

Well,
My question was "to loop through a txt file and find specific "Block of data" .
I accepted your comment BUT,
I want to be absolutely certain that I have all lines and I would like  to use  an loop.
Trust me, there's no need to use a second loop. The code as is looks at every line in the file and...

If the line contains "BEGIN DATA FOR ID" it stores the ID
If the line contains "ERRORS" it writes the line to SQL (after you added the proper code to do that)
All other lines are ignored.
Avatar of PDF

ASKER

Right ,if line not start with "ERRORS " is ignored,but can be part of block.



Unfortunately ...there can be some line that not start with "ERRORS" but they are part of block.
 

BEGIN DATA FOR ID XXXXXX
ryyyyywerty
ssgfshghgd
afdgfdsgfh
aaaaaaaaaaaaaaaaadg
**ERRORS-----   Start of block
ggaewgleealf   ---- need to be part of sql
 ***ERRORS  jgkhdfghdkjfhghg;dsfh;ghs;ghsd;ghsd
ahgjajgfdakjkas   ----need to be part of sql
***ERRORS**dlhagdlajgdlakjalkfjalflajgfljakgflagdljfgalkjdgf
END DATA OUTPUT FOR ID XXXXXX
So are you saying that everything between the "BEGIN DATA FOR ID XXXXXX" line and the "END DATA OUTPUT FOR ID XXXXXX" line are all a part of the same block? If not then using the same lines, please show what each block should contain.
Avatar of PDF

ASKER

Block of data for each entry start as:
"BEGIN DATA FOR ID XXXXXX"
data
data
data
100 more lines of data
**ERRORS-----   Start of Errors block
 ***ERRORS**dlha
 ***ERRORS**dlha
Data
Data
 ***ERRORS**dlha
Data
End:"END DATA OUTPUT FOR ID XXXXXX"

I need all lines between **ERRORS and "END DATA OUTPUT FOR ID XXXXXX"
Just to be 100% clear, do you mean these lines?

**ERRORS-----   Start of Errors block
 ***ERRORS**dlha
 ***ERRORS**dlha
Data
Data
 ***ERRORS**dlha
Data

or do you mean these lines?

Data
Data
Data
Avatar of PDF

ASKER

all lines  :

**ERRORS-----   Start of Errors block
  ***ERRORS**dlha
  ***ERRORS**dlha
 Data
 Data
  ***ERRORS**dlha
 Data
End:"END DATA OUTPUT FOR ID XXXXXX"
Try this.

Dim FF As Integer
Dim MyTextLine As String
Dim ID As String
Dim bErrorFound As Boolean

FF = FreeFile

Open "C:\temp\errors.txt" For Input As #FF
Do Until EOF(FF)
    Line Input #FF, MyTextLine
    Select Case True
        Case InStr(1, UCase(MyTextLine), "BEGIN DATA FOR ID") > 0
            ID = Mid(MyTextLine, 19, 6)
            bErrorFound = False
            Debug.Print ""
            Debug.Print ID
        Case InStr(1, UCase(MyTextLine), "END DATA") > 0
            bErrorFound = False
            ' Write this line to SQL
            Debug.Print "written to SQL: " & MyTextLine
        Case Else
            If InStr(1, UCase(MyTextLine), "ERRORS") > 0 Or bErrorFound Then
                ' write error to SQL
                Debug.Print "written to SQL: " & MyTextLine
                bErrorFound = True
            End If
    End Select
Loop
Close

Open in new window

Avatar of PDF

ASKER

Thank you for posting code.
 I'll get back to you tomorrow.
Avatar of PDF

ASKER

EXCELLENT !!!

Thank you, MartinLiss for your help!
You're welcome.