Solved

Multiline match regular expression in .NET

Posted on 2008-10-22
24
1,096 Views
Last Modified: 2012-05-05
Hello experts,

As I am still a bit of a starter in .NET regular expressions, I have another question about it.
Below you see a piece of a bank file statement.

I do already match everything I need except for the description. The description is given in the lines starting with :86: If any bankcode (9 or 10 characters) available, it is given at the most upper line starting with :86: (direct below a line starting with :61:)

Now I need a regex that does one of the following things:

- a regrex that matches everything (incl bankaccount) on the different lines 86 (except for :86: itself)

- a regrex that skips the bankaccount (if any) and selects everything else on the different lines 86 (except for :86: itself)


I am very curious to the right regex, cause I couldn't solve it myself and I don't even know if it is possible in one way or another.

Thanks in advance for tips or solutions, if you need more info, just reply!
:61:071222D208,00N026

:86:P  12345678BELASTINGDIENST       F8R03782497                $GH

:86:$0000009                         BETALINGSKENM. 123456789123456

:86:0 1234567891234560                                             

:61:071225C758,70N078

:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD

:86:CITY 48772-54314                                                   

:61:071225C425,05N078

:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA

:86:LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     

:61:071225C850,00N078

:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR

:86:DERNR. 53846 REF. MAIL 21-02

Open in new window

0
Comment
Question by:AGION
  • 11
  • 10
  • 3
24 Comments
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
VB.NET or C#.NET?  Do you just want to match the relevant lines or specific pieces of data in those lines?  If it is specific data you should tell us what submatches you need from each line.
0
 

Author Comment

by:AGION
Comment Utility
It is a bit difficult to explain what I would like to see, but I will give it a try though:
If it is possible I want the regular expression(s) to match the underscored (cause a bankaccount is available after :86:) the bankaccount is P  12345678 and is allways 9 or 10 characters and it is allways in this position.
:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560  
But it should do something else in the following lines, cause there is no bankaccount it should select the underscored:
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02

I hope this makes it a bit more clear to you. If a bankaccount is available, it should be skipped and if not it should start directly after :86:
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
VB.NET or C#.NET?
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
In C#.NET:
using System;

using System.Text.RegularExpressions;

namespace myapp

{

    class Class1

    {

        static void Main(string[] args)

        {

            String sourcestring = @":61:071222D208,00N026

:86:

:86:P  12345678BELASTINGDIENST       F8R03782497                $GH

:86:$0000009                         BETALINGSKENM. 123456789123456

:86:0 1234567891234560                                             

:61:071225C758,70N078

:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD

:86:CITY 48772-54314                                                   

:61:071225C425,05N078

:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA

:86:LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     

:61:071225C850,00N078

:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR

:86:DERNR. 53846 REF. MAIL 21-02";

            Regex re = new Regex(@"(?<=^:86:(?:P +\d+)?)[^\r]+", RegexOptions.Multiline);

            MatchCollection mc = re.Matches(sourcestring);

            int mIdx = 0;

            foreach (Match m in mc)

            {

                for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                {

                    Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);

                }

                mIdx++;

            }

        }

    }

}

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
In testing you could also use the pattern without the ^ and Multiline option:
            Regex re = new Regex(@"(?<=:86:(?:P +\d+)?)[^\r]+");

Open in new window

0
 

Author Comment

by:AGION
Comment Utility
ddrudik,
I am sorry for my late reply. I am not familiar with the solution you gave in C# Net. All my regular expressions just contain 1 "sentence", eg:
(:86:.*\r\n){0,3}(?=((:61:)|$))
I guess that must be VB.NET?!
 
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
Here's that in VB.NET:
Imports System.Text.RegularExpressions

Module Module1

    Sub Main()

        Dim sourcestring As String = "replace with your sourcestring"

        Dim re As Regex = New Regex("(?<=^:86:(?:P +\d+)?)[^\r]+")

        Dim mc As MatchCollection = re.Matches(sourcestring)

        Dim mIdx As Integer = 0

        For Each m As Match In mc

            For groupIdx As Integer = 0 To m.Groups.Count - 1

                Console.Writeline("[" & mIdx & "][" & re.GetGroupNames(groupIdx) & "] = " & m.Groups(groupIdx).Value)

            Next

            mIdx = mIdx + 1

        Next

    End Sub

End Module

Open in new window

0
 

Author Comment

by:AGION
Comment Utility
Hello ddrudik,
That aint the solution eighter. As you can see I posted an example of the sort regular expression I ment. I came a far way through help from someone on a other forum and by trying myself.
0
 
LVL 23

Expert Comment

by:Tiggerito
Comment Utility
I'm not sure how toy determine what is a bankcode. You state 9 or 10 characters,

Your example includes this as a bank account:

P  12345678

but not this:

AUTOBETALI

Both are 10 characters!

So I think we need more rules to slit out the bank code bit?

e.g. if a bank account is present it always starts with "P " and then is a sequence of digits, with a length of 7 or 8

Do you want the captured text to be concatinated into one single line of text?
0
 

Author Comment

by:AGION
Comment Utility
Tiggerito,
Thanks for your reply, I will give you the regexes I made allready. These regexes do match whether a bankaccount is present or not. But I do have problems with the linebreaks and the :86: in every line (I don't want the regex to show it).
(?<=:86:(P|0).{0,}\s+).*(.*\r\n)*
A dutch bankaccount always starts with a P or an 0, so it matches if a P or an 0 is present and if so, it skips all characters till a white space, after that it selects everything (but that is not a problem, the linebreaks and the :86: in the secondline are however)
(?<=:86:)([^P0].{0}).*(.*\r\n)*
This one matches that there is not a P or an 0 right after :86: and if not it starts selecting everything right away. Again it is not a problem that it selects everything, cause it is part of a tree structure, so it will not select everything).
I have the possibility of using strings to convert certain data but I don't know how to use this correctly. Someone advised me to use: translate('stringname',':86:',' ') but that did not work.
I hope you understand the problem a bit better now, if not: keep asking for more info.
0
 
LVL 23

Accepted Solution

by:
Tiggerito earned 250 total points
Comment Utility
I'm going on a slightly different tack to solve this.

This code will parses the lines into a more managable form, then applies the regex to this tidied format.

I've changed the bank detecting regex to the following interpretation:

Starts with a P or a 0
then zero or more spaces
then 7 to 9 digits
Regex regex = new Regex(@"^(?<bank>[P0]{1} *[0-9]{7,9})?(?<details>.*)");
 

StreamReader reader = File.OpenText("Bank.txt");
 

// next account data

string account = null;

string bankcode = null;

string details = null;
 

string line = null;
 

while ((line = reader.ReadLine()) != null)

{

    if (line.Length >= 4)

    {

        string header = line.Substring(0, 4);

        string data = line.Substring(4).Trim();

        

        switch (header)

        {

            case ":61:":

                // new account so process previous account if it exists

                if (account != null) 

                {

                    Debug.WriteLine("Account: " + account);

                    Debug.WriteLine("BankCode: " + bankcode);

                    Debug.WriteLine("Details: " + details);

                }

                

                // start new account

                account = data;

                bankcode = null;

                details = null;

                break;

            case ":86:":

                if (details==null)

                {

                    // first 86 line. so may start with the bank
 

                    Match match = regex.Match(data);
 

                    if (match.Success)

                    {

                        bankcode = match.Groups["bank"].Value;

                        details = match.Groups["details"].Value;

                    }

                    else

                    {

                        bankcode = String.Empty;

                        details = data;

                    }

                }

                else

                    details += data; // or maybe... Environment.NewLine + data;

                break;

            default:

                // unknown line type - skip

                break;

        }

    }

    // else - line with no header - skip

}
 

if (account!=null) // make sure we process the last account

{

    Debug.WriteLine("Account: " + account);

    Debug.WriteLine("BankCode: " + bankcode);

    Debug.WriteLine("Details: " + details);

}
 

reader.Close();

Open in new window

0
 

Author Comment

by:AGION
Comment Utility
You guys all make real big codes, but I can't drop them in my program, it should really be a short regex, like the ones I mentioned above. I can imagine that a solution like that is not possible.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 27

Assisted Solution

by:ddrudik
ddrudik earned 250 total points
Comment Utility
AGION, the regex pattern:
(?<=^:86:(?:P +\d+)?)[^\r]+
Will work for you what you need, I guess you still need to work out how to use that in your application.  I have provided C#.NET and VB.NET code, but if your application doesn't accept VB or C# .NET code I'm not sure how I can help you with the integration.  Possibly enlist the assistance of a programmer to assist with the integration of the pattern into your application.
0
 

Author Comment

by:AGION
Comment Utility
ddrudik,
this is more like it! I tried it with Regexbuddy, but it does not match a thing, but the regex is valid. So I am not sure what is going wrong, but is seems promising: it looks like a regex I need! I could give you a normal bankfile, so you could test it yourself, but I can't spread this in public, since it belongs to a customer of mine.
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
AGION, unfortunately I don't think that's allowed by EE's rules (private messaging not allowed).  Maybe you could sanitize it enough to post here.
0
 

Author Comment

by:AGION
Comment Utility
well the rules I posted in the codes snippet above are sanitized allready. how do you test your own function? also with regexbuddy?
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
You would need a .NET regex tester to test the variable lookbehind:
http://www.dotnetcoders.com/web/Learning/Regex/RegexTester.aspx

See example:
example.png
0
 

Author Comment

by:AGION
Comment Utility
I will check it tonight, thanks for your effort allready! I have weekend now, so its time for a hard deserved beer ;) but again, I will check it tonight to see if it fits
0
 
LVL 23

Expert Comment

by:Tiggerito
Comment Utility
0
 

Author Comment

by:AGION
Comment Utility
I tried the solution of ddrudik a couple of times during the weekend and I checked out if I could modify it a bit. But it doesn't work for me.. no info is matched when I check it in RegexBuddy and it doesn't work eighter when I use it in my own program.
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
Show your code from your program where you use this regex.  I see the matches in the .NET regex tester with your data so I know it is operational if your data matches the sample you gave in your question.
0
 

Author Comment

by:AGION
Comment Utility
It is the same code I am using there. To explain the situation a bit: I work with SAP, this financial program has the option to read in bank statements. Those bank statements are being checked by a certain file. This file matches the bank statement file by regular expressions. I checked out what kind of regexes it does support and .NET is the answer. But when I fill in your regex it does not match anything at all.
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
Either it doesn't have the regex support for that syntax or your sample data posted doesn't match your real data, since that pattern is matching with your provided sample.  I would suggest you contact your tech support for your application and have them provide you specific documentation regarding the use of regex in your app.
0
 

Author Closing Comment

by:AGION
Comment Utility
the setup for the regex was good, but too bad it was not supported by my application
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now