[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Parse text file - regex or something else?

Posted on 2006-03-25
8
Medium Priority
?
279 Views
Last Modified: 2006-11-18
Hello! I have a text file I would like to parse, this data I plan to insert into a database.
Here is some sample data
<STMTTRN>
<TRNTYPE>POS
<DTPOSTED>20051227170000
<TRNAMT>-0000000000163.71
<FITID>2005122701
<NAME>PUBLIX 11109 WINTHROP M
<MEMO>12/24 RIVERVIEW    FL 8010I441922
</STMTTRN>

1. Loop through text and locatate the <STMTTRN> </STMTTRN> blocks
2. extract out data and set variables TRNTYPE = POS, DTPOSTED = 20051227170000
Then I can insert into the db

0
Comment
Question by:JRockFL
  • 4
  • 4
8 Comments
 
LVL 64

Accepted Solution

by:
Fernando Soto earned 2000 total points
ID: 16291078
Hi JRockFL;

The following code should do what you want.

Imports System.IO
Imports System.Text.RegularExpressions

        Dim TRNTYPE As String
        Dim DTPOSTED As String

        Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n" & _
            "<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"
        Dim re As New Regex(pattern, _
            RegexOptions.Compiled Or RegexOptions.Singleline)
        Dim mc As MatchCollection
        Dim sr As New StreamReader("C:\Temp\InputData.dat")
        Dim input As String = sr.ReadToEnd()
        sr.Close()

        mc = re.Matches(input)
        For Each m As Match In mc
            TRNTYPE = m.Groups("TRNTYPE").Value()
            DTPOSTED = m.Groups("DTPOSTED").Value
            ' Do what you need to do and write to Database
        Next

I hope that this is of some help.

Fernando

Hey JRockFL; does the FL in JRockFL stand for Florida?
0
 
LVL 8

Author Comment

by:JRockFL
ID: 16291149
Hey Fernando

Thank you for the reply, that is exactly what I am looking for. I figured I needed a regex, I just found an article on code project that goes into regex and what all the symbols mean. Are there any good reference web sites?

Yes, I'm in Florida, just outside of Tampa.
0
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 16291238
Hi JRockFL;

I use this site when I want to test a pattern, http://regexlib.com/RETester.aspx . And I use the Microsoft documentation because not all Regex are the same, Unix, POSIX standard, and of course Microsoft. The documentation web site is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp

You could also download the program called The Regulator, a Regex pattern testing software at this link http://sourceforge.net/projects/regulator

BTW I live in Apopka just noth of Orlando.

Good Luck

Fernando
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 8

Author Comment

by:JRockFL
ID: 16291265
Thanks for the links! I will check them out.
I need a more basic example to understand this...
Dim pattern As String = "<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n" & _
            "<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"

How would you write it to pull out the word Apopka?

<city>Apopka<city>

Cool! You enjoying this nice weather too? It got cold yesterday!

0
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 16291622
OK;

This Regex pattern :
"<STMTTRN>.*?<TRNTYPE>(?<TRNTYPE>.*?)\n<DTPOSTED>(?<DTPOSTED>.*?)\n.*?</STMTTRN>"

Will start looking for the regular characters <STMTTRN>. The next symbol is a . which is a Regex meta-character that stands for any single character. The next symbol is the * which stands for 0 or more repetitions of the symbol before it. The ? tells the Regex engine take the smallest repetition up to it finds the regular characters <TRNTYPE>. The next set of symbols (?<TRNTYPE>.*?) is a named capture group which is defined as (?<TheNameOfTheCapture>The characters to capture). The \n is a Regex meta-character which is the new line character. Then we look for the next info <DTPOSTED> then capture the info in a named capture group and then another new line character. Then we search till we find the end of the info which is </STMTTRN>. Then if there is more characters in the input string it starts looking from the beginning of the pattern.

Pattern string would be "<city>(?<City>\w+)</city>"

In this pattern we search for the regular string <city> when we find that it captures the next set of word characters which is represented by the \w and looks for 1 or more word characters. Word characters are defined as the set of the following characters, [a-zA-Z_0-9]. When it hits a non word character it checks that it is </city>.

There you have it.

Fernando
0
 
LVL 8

Author Comment

by:JRockFL
ID: 16291735
That was perfect!! Thank you.
0
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 16291794
No problem.
0
 
LVL 8

Author Comment

by:JRockFL
ID: 16295151
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction When many people think of the WebBrowser (http://msdn.microsoft.com/en-us/library/2te2y1x6%28v=VS.85%29.aspx) control, they immediately think of a control which allows the viewing and navigation of web pages. While this is true, it's a…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Is your OST file inaccessible, Need to transfer OST file from one computer to another? Want to convert OST file to PST? If the answer to any of the above question is yes, then look no further. With the help of Stellar OST to PST Converter, you can e…
As many of you are aware about Scanpst.exe utility which is owned by Microsoft itself to repair inaccessible or damaged PST files, but the question is do you really think Scanpst.exe is capable to repair all sorts of PST related corruption issues?
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question