Parse text file - regex or something else?

Hello! I have a text file I would like to parse, this data I plan to insert into a database.
Here is some sample data
<STMTTRN>
<TRNTYPE>POS
<DTPOSTED>20051227170000
<TRNAMT>-0000000000163.71
<FITID>2005122701
<NAME>PUBLIX 11109 WINTHROP M
<MEMO>12/24 RIVERVIEW    FL 8010I441922
</STMTTRN>

1. Loop through text and locatate the <STMTTRN> </STMTTRN> blocks
2. extract out data and set variables TRNTYPE = POS, DTPOSTED = 20051227170000
Then I can insert into the db

LVL 8
JRockFLAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Fernando SotoRetiredCommented:
Hi JRockFL;

The following code should do what you want.

Imports System.IO
Imports System.Text.RegularExpressions

        Dim TRNTYPE As String
        Dim DTPOSTED As String

        Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n" & _
            "<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"
        Dim re As New Regex(pattern, _
            RegexOptions.Compiled Or RegexOptions.Singleline)
        Dim mc As MatchCollection
        Dim sr As New StreamReader("C:\Temp\InputData.dat")
        Dim input As String = sr.ReadToEnd()
        sr.Close()

        mc = re.Matches(input)
        For Each m As Match In mc
            TRNTYPE = m.Groups("TRNTYPE").Value()
            DTPOSTED = m.Groups("DTPOSTED").Value
            ' Do what you need to do and write to Database
        Next

I hope that this is of some help.

Fernando

Hey JRockFL; does the FL in JRockFL stand for Florida?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
JRockFLAuthor Commented:
Hey Fernando

Thank you for the reply, that is exactly what I am looking for. I figured I needed a regex, I just found an article on code project that goes into regex and what all the symbols mean. Are there any good reference web sites?

Yes, I'm in Florida, just outside of Tampa.
Fernando SotoRetiredCommented:
Hi JRockFL;

I use this site when I want to test a pattern, http://regexlib.com/RETester.aspx . And I use the Microsoft documentation because not all Regex are the same, Unix, POSIX standard, and of course Microsoft. The documentation web site is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp

You could also download the program called The Regulator, a Regex pattern testing software at this link http://sourceforge.net/projects/regulator

BTW I live in Apopka just noth of Orlando.

Good Luck

Fernando
Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

JRockFLAuthor Commented:
Thanks for the links! I will check them out.
I need a more basic example to understand this...
Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n" & _
            "<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"

How would you write it to pull out the word Apopka?

<city>Apopka<city>

Cool! You enjoying this nice weather too? It got cold yesterday!

Fernando SotoRetiredCommented:
OK;

This Regex pattern :
"<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"

Will start looking for the regular characters <STMTTRN>. The next symbol is a . which is a Regex meta-character that stands for any single character. The next symbol is the * which stands for 0 or more repetitions of the symbol before it. The ? tells the Regex engine take the smallest repetition up to it finds the regular characters <TRNTYPE>. The next set of symbols (?<TRNTYPE>.*?) is a named capture group which is defined as (?<TheNameOfTheCapture>The characters to capture). The \n is a Regex meta-character which is the new line character. Then we look for the next info <DTPOSTED> then capture the info in a named capture group and then another new line character. Then we search till we find the end of the info which is </STMTTRN>. Then if there is more characters in the input string it starts looking from the beginning of the pattern.

Pattern string would be "<city>(?<City>\w+)</city>"

In this pattern we search for the regular string <city> when we find that it captures the next set of word characters which is represented by the \w and looks for 1 or more word characters. Word characters are defined as the set of the following characters, [a-zA-Z_0-9]. When it hits a non word character it checks that it is </city>.

There you have it.

Fernando
JRockFLAuthor Commented:
That was perfect!! Thank you.
Fernando SotoRetiredCommented:
No problem.
JRockFLAuthor Commented:
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic.NET

From novice to tech pro — start learning today.