Multiline match regular expression in .NET

Hello experts,

As I am still a bit of a starter in .NET regular expressions, I have another question about it.
Below you see a piece of a bank file statement.

I do already match everything I need except for the description. The description is given in the lines starting with :86: If any bankcode (9 or 10 characters) available, it is given at the most upper line starting with :86: (direct below a line starting with :61:)

Now I need a regex that does one of the following things:

- a regrex that matches everything (incl bankaccount) on the different lines 86 (except for :86: itself)

- a regrex that skips the bankaccount (if any) and selects everything else on the different lines 86 (except for :86: itself)


I am very curious to the right regex, cause I couldn't solve it myself and I don't even know if it is possible in one way or another.

Thanks in advance for tips or solutions, if you need more info, just reply!
:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560                                             
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
:86:CITY 48772-54314                                                   
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
:86:LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02

Open in new window

AGIONAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ddrudikCommented:
VB.NET or C#.NET?  Do you just want to match the relevant lines or specific pieces of data in those lines?  If it is specific data you should tell us what submatches you need from each line.
0
AGIONAuthor Commented:
It is a bit difficult to explain what I would like to see, but I will give it a try though:
If it is possible I want the regular expression(s) to match the underscored (cause a bankaccount is available after :86:) the bankaccount is P  12345678 and is allways 9 or 10 characters and it is allways in this position.
:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560  
But it should do something else in the following lines, cause there is no bankaccount it should select the underscored:
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02

I hope this makes it a bit more clear to you. If a bankaccount is available, it should be skipped and if not it should start directly after :86:
0
ddrudikCommented:
VB.NET or C#.NET?
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

ddrudikCommented:
In C#.NET:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
    class Class1
    {
        static void Main(string[] args)
        {
            String sourcestring = @":61:071222D208,00N026
:86:
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560                                             
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
:86:CITY 48772-54314                                                   
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
:86:LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02";
            Regex re = new Regex(@"(?<=^:86:(?:P +\d+)?)[^\r]+", RegexOptions.Multiline);
            MatchCollection mc = re.Matches(sourcestring);
            int mIdx = 0;
            foreach (Match m in mc)
            {
                for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                {
                    Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                }
                mIdx++;
            }
        }
    }
}

Open in new window

0
ddrudikCommented:
In testing you could also use the pattern without the ^ and Multiline option:
            Regex re = new Regex(@"(?<=:86:(?:P +\d+)?)[^\r]+");

Open in new window

0
AGIONAuthor Commented:
ddrudik,
I am sorry for my late reply. I am not familiar with the solution you gave in C# Net. All my regular expressions just contain 1 "sentence", eg:
(:86:.*\r\n){0,3}(?=((:61:)|$))
I guess that must be VB.NET?!
 
0
ddrudikCommented:
Here's that in VB.NET:
Imports System.Text.RegularExpressions
Module Module1
    Sub Main()
        Dim sourcestring As String = "replace with your sourcestring"
        Dim re As Regex = New Regex("(?<=^:86:(?:P +\d+)?)[^\r]+")
        Dim mc As MatchCollection = re.Matches(sourcestring)
        Dim mIdx As Integer = 0
        For Each m As Match In mc
            For groupIdx As Integer = 0 To m.Groups.Count - 1
                Console.Writeline("[" & mIdx & "][" & re.GetGroupNames(groupIdx) & "] = " & m.Groups(groupIdx).Value)
            Next
            mIdx = mIdx + 1
        Next
    End Sub
End Module

Open in new window

0
AGIONAuthor Commented:
Hello ddrudik,
That aint the solution eighter. As you can see I posted an example of the sort regular expression I ment. I came a far way through help from someone on a other forum and by trying myself.
0
Tony McCreathTechnical SEO ConsultantCommented:
I'm not sure how toy determine what is a bankcode. You state 9 or 10 characters,

Your example includes this as a bank account:

P  12345678

but not this:

AUTOBETALI

Both are 10 characters!

So I think we need more rules to slit out the bank code bit?

e.g. if a bank account is present it always starts with "P " and then is a sequence of digits, with a length of 7 or 8

Do you want the captured text to be concatinated into one single line of text?
0
AGIONAuthor Commented:
Tiggerito,
Thanks for your reply, I will give you the regexes I made allready. These regexes do match whether a bankaccount is present or not. But I do have problems with the linebreaks and the :86: in every line (I don't want the regex to show it).
(?<=:86:(P|0).{0,}\s+).*(.*\r\n)*
A dutch bankaccount always starts with a P or an 0, so it matches if a P or an 0 is present and if so, it skips all characters till a white space, after that it selects everything (but that is not a problem, the linebreaks and the :86: in the secondline are however)
(?<=:86:)([^P0].{0}).*(.*\r\n)*
This one matches that there is not a P or an 0 right after :86: and if not it starts selecting everything right away. Again it is not a problem that it selects everything, cause it is part of a tree structure, so it will not select everything).
I have the possibility of using strings to convert certain data but I don't know how to use this correctly. Someone advised me to use: translate('stringname',':86:',' ') but that did not work.
I hope you understand the problem a bit better now, if not: keep asking for more info.
0
Tony McCreathTechnical SEO ConsultantCommented:
I'm going on a slightly different tack to solve this.

This code will parses the lines into a more managable form, then applies the regex to this tidied format.

I've changed the bank detecting regex to the following interpretation:

Starts with a P or a 0
then zero or more spaces
then 7 to 9 digits
Regex regex = new Regex(@"^(?<bank>[P0]{1} *[0-9]{7,9})?(?<details>.*)");
 
StreamReader reader = File.OpenText("Bank.txt");
 
// next account data
string account = null;
string bankcode = null;
string details = null;
 
string line = null;
 
while ((line = reader.ReadLine()) != null)
{
    if (line.Length >= 4)
    {
        string header = line.Substring(0, 4);
        string data = line.Substring(4).Trim();
        
        switch (header)
        {
            case ":61:":
                // new account so process previous account if it exists
                if (account != null) 
                {
                    Debug.WriteLine("Account: " + account);
                    Debug.WriteLine("BankCode: " + bankcode);
                    Debug.WriteLine("Details: " + details);
                }
                
                // start new account
                account = data;
                bankcode = null;
                details = null;
                break;
            case ":86:":
                if (details==null)
                {
                    // first 86 line. so may start with the bank
 
                    Match match = regex.Match(data);
 
                    if (match.Success)
                    {
                        bankcode = match.Groups["bank"].Value;
                        details = match.Groups["details"].Value;
                    }
                    else
                    {
                        bankcode = String.Empty;
                        details = data;
                    }
                }
                else
                    details += data; // or maybe... Environment.NewLine + data;
                break;
            default:
                // unknown line type - skip
                break;
        }
    }
    // else - line with no header - skip
}
 
if (account!=null) // make sure we process the last account
{
    Debug.WriteLine("Account: " + account);
    Debug.WriteLine("BankCode: " + bankcode);
    Debug.WriteLine("Details: " + details);
}
 
reader.Close();

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
AGIONAuthor Commented:
You guys all make real big codes, but I can't drop them in my program, it should really be a short regex, like the ones I mentioned above. I can imagine that a solution like that is not possible.
0
ddrudikCommented:
AGION, the regex pattern:
(?<=^:86:(?:P +\d+)?)[^\r]+
Will work for you what you need, I guess you still need to work out how to use that in your application.  I have provided C#.NET and VB.NET code, but if your application doesn't accept VB or C# .NET code I'm not sure how I can help you with the integration.  Possibly enlist the assistance of a programmer to assist with the integration of the pattern into your application.
0
AGIONAuthor Commented:
ddrudik,
this is more like it! I tried it with Regexbuddy, but it does not match a thing, but the regex is valid. So I am not sure what is going wrong, but is seems promising: it looks like a regex I need! I could give you a normal bankfile, so you could test it yourself, but I can't spread this in public, since it belongs to a customer of mine.
0
ddrudikCommented:
AGION, unfortunately I don't think that's allowed by EE's rules (private messaging not allowed).  Maybe you could sanitize it enough to post here.
0
AGIONAuthor Commented:
well the rules I posted in the codes snippet above are sanitized allready. how do you test your own function? also with regexbuddy?
0
ddrudikCommented:
You would need a .NET regex tester to test the variable lookbehind:
http://www.dotnetcoders.com/web/Learning/Regex/RegexTester.aspx

See example:
example.png
0
AGIONAuthor Commented:
I will check it tonight, thanks for your effort allready! I have weekend now, so its time for a hard deserved beer ;) but again, I will check it tonight to see if it fits
0
Tony McCreathTechnical SEO ConsultantCommented:
0
AGIONAuthor Commented:
I tried the solution of ddrudik a couple of times during the weekend and I checked out if I could modify it a bit. But it doesn't work for me.. no info is matched when I check it in RegexBuddy and it doesn't work eighter when I use it in my own program.
0
ddrudikCommented:
Show your code from your program where you use this regex.  I see the matches in the .NET regex tester with your data so I know it is operational if your data matches the sample you gave in your question.
0
AGIONAuthor Commented:
It is the same code I am using there. To explain the situation a bit: I work with SAP, this financial program has the option to read in bank statements. Those bank statements are being checked by a certain file. This file matches the bank statement file by regular expressions. I checked out what kind of regexes it does support and .NET is the answer. But when I fill in your regex it does not match anything at all.
0
ddrudikCommented:
Either it doesn't have the regex support for that syntax or your sample data posted doesn't match your real data, since that pattern is matching with your provided sample.  I would suggest you contact your tech support for your application and have them provide you specific documentation regarding the use of regex in your app.
0
AGIONAuthor Commented:
the setup for the regex was good, but too bad it was not supported by my application
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.