PDF
asked on
Most efficient way to loop through a txt file and find specific "Block of data" using vb
Hi experts,
I have one question and it is related to design and performance!
Need most efficient way to loop through a txt file and find specific "Block of data”.
That will be inserted into an SQL table. (With SQL part I'm fine)
Size of file is max 50 MB.
The Data is well-defined and the file looks like:
BEGIN DATA FOR ID XXXXXX
**ERRORS M(20) < 10 INDICATES THAT THE CELL IS SUSPECT**ERRORS**
***ERRORS jgkhdfghdkjfhghg;dsfh;ghs; ghsd;ghsd; ghds;ghsdf hg;dfhgsfd
***ERRORS**dlhagdlajgdlakj alkfjalfla jgfljakgfl agdljfgalk jdgf
END DATA OUTPUT FOR ID XXXXXX
Qotiqyrtpoqyptqptqw
etypqytqwetyqwyitqor
BEGIN DATA FOR ID YYYYYY
**ERRORS**jgkhdfghdkjfhghg ;dsfh;ghs; ghsd;ghsd; ghds;ghsdf hg;dfhgsfd
**ERRORS jgkhdfghdkjfhghg;dsfh;ghs; ghsd;ghsd; ghds;ghsdf hg;dfhgsfd
**ERRORS**1242433535354545 4646464664
**ERRORS**vnvcbvxcm,vxcbxc ,nvx,cb,xn vx,nb,xmnv x,cm
***ERRORS**
***ERRORS**nvnmnbmnnbnnmmn
END DATA OUTPUT FOR ID YYYYYY
Etc, etc…
I do want to
1. Parse id
2. Find the line with "ERRORS"
3. Build the new record as (id, Comment) like:
Id Comment
xxxxxx **ERRORS M(20) < 10 INDICATES THAT THE CELL IS SUSPECT**ERRORS**
xxxxxx ***ERRORS jgkhdfghdkjfhghg;dsfh;ghs; ghsd;ghsd; ghds;ghsdf hg;dfhgsfd
xxxxxx ***ERRORS**dlhagdlajgdlakj alkfjalfla jgfljakgfl agdljfgalk jdgf
YYYYYY **ERRORS**jgkhdfghdkjfhghg ;dsfh;ghs; ghsd;ghsd; ghds;ghsdf hg;dfhgsfd
YYYYYY **ERRORS jgkhdfghdkjfhghg;dsfh;ghs; ghsd;ghsd; ghds;ghsdf hg;dfhgsfd
YYYYYY **ERRORS**1242433535354545 4646464664
YYYYYY **ERRORS**vnvcbvxcm,vxcbxc ,nvx,cb,xn vx,nb,xmnv x,cm
YYYYYY ***ERRORS**
YYYYYY ***ERRORS**nvnmnbmnnbnnmmn
Code where I need help (with another loop) :
Do Until EOF(1)
Line Input #1, MyTextLine
LineNo = LineNo + 1
If Mid(MyTextLine, 1, 17) = "BEGIN DATA FOR ID" Then
ID = Mid(MyTextLine, 19, 6)
Debug.Print ID
'''need another loop to find line with "ERRORS"
End If
Loop
I hope this is enough information..
Looking forward to a very elegant solution....
I have one question and it is related to design and performance!
Need most efficient way to loop through a txt file and find specific "Block of data”.
That will be inserted into an SQL table. (With SQL part I'm fine)
Size of file is max 50 MB.
The Data is well-defined and the file looks like:
BEGIN DATA FOR ID XXXXXX
**ERRORS M(20) < 10 INDICATES THAT THE CELL IS SUSPECT**ERRORS**
***ERRORS jgkhdfghdkjfhghg;dsfh;ghs;
***ERRORS**dlhagdlajgdlakj
END DATA OUTPUT FOR ID XXXXXX
Qotiqyrtpoqyptqptqw
etypqytqwetyqwyitqor
BEGIN DATA FOR ID YYYYYY
**ERRORS**jgkhdfghdkjfhghg
**ERRORS jgkhdfghdkjfhghg;dsfh;ghs;
**ERRORS**1242433535354545
**ERRORS**vnvcbvxcm,vxcbxc
***ERRORS**
***ERRORS**nvnmnbmnnbnnmmn
END DATA OUTPUT FOR ID YYYYYY
Etc, etc…
I do want to
1. Parse id
2. Find the line with "ERRORS"
3. Build the new record as (id, Comment) like:
Id Comment
xxxxxx **ERRORS M(20) < 10 INDICATES THAT THE CELL IS SUSPECT**ERRORS**
xxxxxx ***ERRORS jgkhdfghdkjfhghg;dsfh;ghs;
xxxxxx ***ERRORS**dlhagdlajgdlakj
YYYYYY **ERRORS**jgkhdfghdkjfhghg
YYYYYY **ERRORS jgkhdfghdkjfhghg;dsfh;ghs;
YYYYYY **ERRORS**1242433535354545
YYYYYY **ERRORS**vnvcbvxcm,vxcbxc
YYYYYY ***ERRORS**
YYYYYY ***ERRORS**nvnmnbmnnbnnmmn
Code where I need help (with another loop) :
Do Until EOF(1)
Line Input #1, MyTextLine
LineNo = LineNo + 1
If Mid(MyTextLine, 1, 17) = "BEGIN DATA FOR ID" Then
ID = Mid(MyTextLine, 19, 6)
Debug.Print ID
'''need another loop to find line with "ERRORS"
End If
Loop
I hope this is enough information..
Looking forward to a very elegant solution....
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
MartinLiss,
MartinLiss,
Thank you for your prompt reply.
Works Perfectly!
I replaced line "Case InStr(1, MyTextLine, "*ERRORS") > 0
with "Case InStr(MyTextLine, "ERRORS") > 0
Thanks again!
Sincerely
PDF
MartinLiss,
Thank you for your prompt reply.
Works Perfectly!
I replaced line "Case InStr(1, MyTextLine, "*ERRORS") > 0
with "Case InStr(MyTextLine, "ERRORS") > 0
Thanks again!
Sincerely
OK, with or without the "1", the search starts at position one because that's the default and I think it's better to specifically include the value, but in any case you're welcome and I'm glad I was able to help.
In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2014
In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2014
ASKER
Hi MartinLiss,
Just to be 100% sure that I have all lines in that block of data
That starts with "****errors",I would like to do Loop.
If are outputs identical, than I can use accepted solution.
(In case that line doesn’t start with ****errors but it is still part of block.)
Thank you very much & thank you for link.
I appreciate You a lot!
Just to be 100% sure that I have all lines in that block of data
That starts with "****errors",I would like to do Loop.
If are outputs identical, than I can use accepted solution.
(In case that line doesn’t start with ****errors but it is still part of block.)
Thank you very much & thank you for link.
I appreciate You a lot!
Just to be 100% sure that I have all lines in that block of data
That starts with "****errors",I would like to do Loop.
If are outputs identical, than I can use accepted solution.
(In case that line doesn’t start with ****errors but it is still part of block.)
Thank you very much & thank you for link.
I'm sorry but I don't know if you are asking me a question.
In any case you're welcome and I'm glad I was able to help.
In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2014
ASKER
Well,
My question was "to loop through a txt file and find specific "Block of data" .
I accepted your comment BUT,
I want to be absolutely certain that I have all lines and I would like to use an loop.
My question was "to loop through a txt file and find specific "Block of data" .
I accepted your comment BUT,
I want to be absolutely certain that I have all lines and I would like to use an loop.
Trust me, there's no need to use a second loop. The code as is looks at every line in the file and...
If the line contains "BEGIN DATA FOR ID" it stores the ID
If the line contains "ERRORS" it writes the line to SQL (after you added the proper code to do that)
All other lines are ignored.
ASKER
Right ,if line not start with "ERRORS " is ignored,but can be part of block.
Unfortunately ...there can be some line that not start with "ERRORS" but they are part of block.
BEGIN DATA FOR ID XXXXXX
ryyyyywerty
ssgfshghgd
afdgfdsgfh
aaaaaaaaaaaaaaaaadg
**ERRORS----- Start of block
ggaewgleealf ---- need to be part of sql
***ERRORS jgkhdfghdkjfhghg;dsfh;ghs; ghsd;ghsd
ahgjajgfdakjkas ----need to be part of sql
***ERRORS**dlhagdlajgdlakj alkfjalfla jgfljakgfl agdljfgalk jdgf
END DATA OUTPUT FOR ID XXXXXX
Unfortunately ...there can be some line that not start with "ERRORS" but they are part of block.
BEGIN DATA FOR ID XXXXXX
ryyyyywerty
ssgfshghgd
afdgfdsgfh
aaaaaaaaaaaaaaaaadg
**ERRORS----- Start of block
ggaewgleealf ---- need to be part of sql
***ERRORS jgkhdfghdkjfhghg;dsfh;ghs;
ahgjajgfdakjkas ----need to be part of sql
***ERRORS**dlhagdlajgdlakj
END DATA OUTPUT FOR ID XXXXXX
So are you saying that everything between the "BEGIN DATA FOR ID XXXXXX" line and the "END DATA OUTPUT FOR ID XXXXXX" line are all a part of the same block? If not then using the same lines, please show what each block should contain.
ASKER
Block of data for each entry start as:
"BEGIN DATA FOR ID XXXXXX"
data
data
data
100 more lines of data
**ERRORS----- Start of Errors block
***ERRORS**dlha
***ERRORS**dlha
Data
Data
***ERRORS**dlha
Data
End:"END DATA OUTPUT FOR ID XXXXXX"
I need all lines between **ERRORS and "END DATA OUTPUT FOR ID XXXXXX"
"BEGIN DATA FOR ID XXXXXX"
data
data
data
100 more lines of data
**ERRORS----- Start of Errors block
***ERRORS**dlha
***ERRORS**dlha
Data
Data
***ERRORS**dlha
Data
End:"END DATA OUTPUT FOR ID XXXXXX"
I need all lines between **ERRORS and "END DATA OUTPUT FOR ID XXXXXX"
Just to be 100% clear, do you mean these lines?
or do you mean these lines?
**ERRORS----- Start of Errors block
***ERRORS**dlha
***ERRORS**dlha
Data
Data
***ERRORS**dlha
Data
or do you mean these lines?
Data
Data
Data
ASKER
all lines :
**ERRORS----- Start of Errors block
***ERRORS**dlha
***ERRORS**dlha
Data
Data
***ERRORS**dlha
Data
End:"END DATA OUTPUT FOR ID XXXXXX"
**ERRORS----- Start of Errors block
***ERRORS**dlha
***ERRORS**dlha
Data
Data
***ERRORS**dlha
Data
End:"END DATA OUTPUT FOR ID XXXXXX"
Try this.
Dim FF As Integer
Dim MyTextLine As String
Dim ID As String
Dim bErrorFound As Boolean
FF = FreeFile
Open "C:\temp\errors.txt" For Input As #FF
Do Until EOF(FF)
Line Input #FF, MyTextLine
Select Case True
Case InStr(1, UCase(MyTextLine), "BEGIN DATA FOR ID") > 0
ID = Mid(MyTextLine, 19, 6)
bErrorFound = False
Debug.Print ""
Debug.Print ID
Case InStr(1, UCase(MyTextLine), "END DATA") > 0
bErrorFound = False
' Write this line to SQL
Debug.Print "written to SQL: " & MyTextLine
Case Else
If InStr(1, UCase(MyTextLine), "ERRORS") > 0 Or bErrorFound Then
' write error to SQL
Debug.Print "written to SQL: " & MyTextLine
bErrorFound = True
End If
End Select
Loop
Close
ASKER
Thank you for posting code.
I'll get back to you tomorrow.
I'll get back to you tomorrow.
ASKER
EXCELLENT !!!
Thank you, MartinLiss for your help!
Thank you, MartinLiss for your help!
You're welcome.
Open in new window