JRockFL
asked on
Parse text file - regex or something else?
Hello! I have a text file I would like to parse, this data I plan to insert into a database.
Here is some sample data
<STMTTRN>
<TRNTYPE>POS
<DTPOSTED>20051227170000
<TRNAMT>-0000000000163.71
<FITID>2005122701
<NAME>PUBLIX 11109 WINTHROP M
<MEMO>12/24 RIVERVIEW FL 8010I441922
</STMTTRN>
1. Loop through text and locatate the <STMTTRN> </STMTTRN> blocks
2. extract out data and set variables TRNTYPE = POS, DTPOSTED = 20051227170000
Then I can insert into the db
Here is some sample data
<STMTTRN>
<TRNTYPE>POS
<DTPOSTED>20051227170000
<TRNAMT>-0000000000163.71
<FITID>2005122701
<NAME>PUBLIX 11109 WINTHROP M
<MEMO>12/24 RIVERVIEW FL 8010I441922
</STMTTRN>
1. Loop through text and locatate the <STMTTRN> </STMTTRN> blocks
2. extract out data and set variables TRNTYPE = POS, DTPOSTED = 20051227170000
Then I can insert into the db
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Hi JRockFL;
I use this site when I want to test a pattern, http://regexlib.com/RETester.aspx . And I use the Microsoft documentation because not all Regex are the same, Unix, POSIX standard, and of course Microsoft. The documentation web site is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp
You could also download the program called The Regulator, a Regex pattern testing software at this link http://sourceforge.net/projects/regulator
BTW I live in Apopka just noth of Orlando.
Good Luck
Fernando
I use this site when I want to test a pattern, http://regexlib.com/RETester.aspx . And I use the Microsoft documentation because not all Regex are the same, Unix, POSIX standard, and of course Microsoft. The documentation web site is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp
You could also download the program called The Regulator, a Regex pattern testing software at this link http://sourceforge.net/projects/regulator
BTW I live in Apopka just noth of Orlando.
Good Luck
Fernando
ASKER
Thanks for the links! I will check them out.
I need a more basic example to understand this...
Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<T RNTYPE>.*? )\n" & _
"<DTPOSTED>(?<DTPOSTED>.*? )\n.*?</ST MTTRN>"
How would you write it to pull out the word Apopka?
<city>Apopka<city>
Cool! You enjoying this nice weather too? It got cold yesterday!
I need a more basic example to understand this...
Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<T
"<DTPOSTED>(?<DTPOSTED>.*?
How would you write it to pull out the word Apopka?
<city>Apopka<city>
Cool! You enjoying this nice weather too? It got cold yesterday!
OK;
This Regex pattern :
"<STMTTRN>.*?<TRNTYPE>(?<T RNTYPE>.*? )\n<DTPOST ED>(?<DTPO STED>.*?)\ n.*?</STMT TRN>"
Will start looking for the regular characters <STMTTRN>. The next symbol is a . which is a Regex meta-character that stands for any single character. The next symbol is the * which stands for 0 or more repetitions of the symbol before it. The ? tells the Regex engine take the smallest repetition up to it finds the regular characters <TRNTYPE>. The next set of symbols (?<TRNTYPE>.*?) is a named capture group which is defined as (?<TheNameOfTheCapture>The characters to capture). The \n is a Regex meta-character which is the new line character. Then we look for the next info <DTPOSTED> then capture the info in a named capture group and then another new line character. Then we search till we find the end of the info which is </STMTTRN>. Then if there is more characters in the input string it starts looking from the beginning of the pattern.
Pattern string would be "<city>(?<City>\w+)</city> "
In this pattern we search for the regular string <city> when we find that it captures the next set of word characters which is represented by the \w and looks for 1 or more word characters. Word characters are defined as the set of the following characters, [a-zA-Z_0-9]. When it hits a non word character it checks that it is </city>.
There you have it.
Fernando
This Regex pattern :
"<STMTTRN>.*?<TRNTYPE>(?<T
Will start looking for the regular characters <STMTTRN>. The next symbol is a . which is a Regex meta-character that stands for any single character. The next symbol is the * which stands for 0 or more repetitions of the symbol before it. The ? tells the Regex engine take the smallest repetition up to it finds the regular characters <TRNTYPE>. The next set of symbols (?<TRNTYPE>.*?) is a named capture group which is defined as (?<TheNameOfTheCapture>The
Pattern string would be "<city>(?<City>\w+)</city>
In this pattern we search for the regular string <city> when we find that it captures the next set of word characters which is represented by the \w and looks for 1 or more word characters. Word characters are defined as the set of the following characters, [a-zA-Z_0-9]. When it hits a non word character it checks that it is </city>.
There you have it.
Fernando
ASKER
That was perfect!! Thank you.
No problem.
ASKER
I ran into a little snag, I posted a new question at
https://www.experts-exchange.com/questions/21789400/Regex-Help.html
https://www.experts-exchange.com/questions/21789400/Regex-Help.html
ASKER
Thank you for the reply, that is exactly what I am looking for. I figured I needed a regex, I just found an article on code project that goes into regex and what all the symbols mean. Are there any good reference web sites?
Yes, I'm in Florida, just outside of Tampa.