Link to home
Create AccountLog in
Avatar of JRockFL
JRockFLFlag for United States of America

asked on

Parse text file - regex or something else?

Hello! I have a text file I would like to parse, this data I plan to insert into a database.
Here is some sample data
<STMTTRN>
<TRNTYPE>POS
<DTPOSTED>20051227170000
<TRNAMT>-0000000000163.71
<FITID>2005122701
<NAME>PUBLIX 11109 WINTHROP M
<MEMO>12/24 RIVERVIEW    FL 8010I441922
</STMTTRN>

1. Loop through text and locatate the <STMTTRN> </STMTTRN> blocks
2. extract out data and set variables TRNTYPE = POS, DTPOSTED = 20051227170000
Then I can insert into the db

ASKER CERTIFIED SOLUTION
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Avatar of JRockFL

ASKER

Hey Fernando

Thank you for the reply, that is exactly what I am looking for. I figured I needed a regex, I just found an article on code project that goes into regex and what all the symbols mean. Are there any good reference web sites?

Yes, I'm in Florida, just outside of Tampa.
Hi JRockFL;

I use this site when I want to test a pattern, http://regexlib.com/RETester.aspx . And I use the Microsoft documentation because not all Regex are the same, Unix, POSIX standard, and of course Microsoft. The documentation web site is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp

You could also download the program called The Regulator, a Regex pattern testing software at this link http://sourceforge.net/projects/regulator

BTW I live in Apopka just noth of Orlando.

Good Luck

Fernando
Avatar of JRockFL

ASKER

Thanks for the links! I will check them out.
I need a more basic example to understand this...
Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n" & _
            "<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"

How would you write it to pull out the word Apopka?

<city>Apopka<city>

Cool! You enjoying this nice weather too? It got cold yesterday!

OK;

This Regex pattern :
"<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"

Will start looking for the regular characters <STMTTRN>. The next symbol is a . which is a Regex meta-character that stands for any single character. The next symbol is the * which stands for 0 or more repetitions of the symbol before it. The ? tells the Regex engine take the smallest repetition up to it finds the regular characters <TRNTYPE>. The next set of symbols (?<TRNTYPE>.*?) is a named capture group which is defined as (?<TheNameOfTheCapture>The characters to capture). The \n is a Regex meta-character which is the new line character. Then we look for the next info <DTPOSTED> then capture the info in a named capture group and then another new line character. Then we search till we find the end of the info which is </STMTTRN>. Then if there is more characters in the input string it starts looking from the beginning of the pattern.

Pattern string would be "<city>(?<City>\w+)</city>"

In this pattern we search for the regular string <city> when we find that it captures the next set of word characters which is represented by the \w and looks for 1 or more word characters. Word characters are defined as the set of the following characters, [a-zA-Z_0-9]. When it hits a non word character it checks that it is </city>.

There you have it.

Fernando
Avatar of JRockFL

ASKER

That was perfect!! Thank you.
No problem.
Avatar of JRockFL

ASKER

I ran into a little snag, I posted a new question at

https://www.experts-exchange.com/questions/21789400/Regex-Help.html