Solved

Multiline match regular expression in .NET

Posted on 2008-10-22
24
1,129 Views
Last Modified: 2012-05-05
Hello experts,

As I am still a bit of a starter in .NET regular expressions, I have another question about it.
Below you see a piece of a bank file statement.

I do already match everything I need except for the description. The description is given in the lines starting with :86: If any bankcode (9 or 10 characters) available, it is given at the most upper line starting with :86: (direct below a line starting with :61:)

Now I need a regex that does one of the following things:

- a regrex that matches everything (incl bankaccount) on the different lines 86 (except for :86: itself)

- a regrex that skips the bankaccount (if any) and selects everything else on the different lines 86 (except for :86: itself)


I am very curious to the right regex, cause I couldn't solve it myself and I don't even know if it is possible in one way or another.

Thanks in advance for tips or solutions, if you need more info, just reply!
:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560                                             
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
:86:CITY 48772-54314                                                   
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
:86:LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02

Open in new window

0
Comment
Question by:AGION
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 10
  • 3
24 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 22777399
VB.NET or C#.NET?  Do you just want to match the relevant lines or specific pieces of data in those lines?  If it is specific data you should tell us what submatches you need from each line.
0
 

Author Comment

by:AGION
ID: 22777562
It is a bit difficult to explain what I would like to see, but I will give it a try though:
If it is possible I want the regular expression(s) to match the underscored (cause a bankaccount is available after :86:) the bankaccount is P  12345678 and is allways 9 or 10 characters and it is allways in this position.
:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560  
But it should do something else in the following lines, cause there is no bankaccount it should select the underscored:
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02

I hope this makes it a bit more clear to you. If a bankaccount is available, it should be skipped and if not it should start directly after :86:
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22777846
VB.NET or C#.NET?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 27

Expert Comment

by:ddrudik
ID: 22778494
In C#.NET:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
    class Class1
    {
        static void Main(string[] args)
        {
            String sourcestring = @":61:071222D208,00N026
:86:
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
:86:$0000009                         BETALINGSKENM. 123456789123456
:86:0 1234567891234560                                             
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
:86:CITY 48772-54314                                                   
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
:86:LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     
:61:071225C850,00N078
:86:AUTOBETALING 01 TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
:86:DERNR. 53846 REF. MAIL 21-02";
            Regex re = new Regex(@"(?<=^:86:(?:P +\d+)?)[^\r]+", RegexOptions.Multiline);
            MatchCollection mc = re.Matches(sourcestring);
            int mIdx = 0;
            foreach (Match m in mc)
            {
                for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                {
                    Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                }
                mIdx++;
            }
        }
    }
}

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22778510
In testing you could also use the pattern without the ^ and Multiline option:
            Regex re = new Regex(@"(?<=:86:(?:P +\d+)?)[^\r]+");

Open in new window

0
 

Author Comment

by:AGION
ID: 22783613
ddrudik,
I am sorry for my late reply. I am not familiar with the solution you gave in C# Net. All my regular expressions just contain 1 "sentence", eg:
(:86:.*\r\n){0,3}(?=((:61:)|$))
I guess that must be VB.NET?!
 
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22785534
Here's that in VB.NET:
Imports System.Text.RegularExpressions
Module Module1
    Sub Main()
        Dim sourcestring As String = "replace with your sourcestring"
        Dim re As Regex = New Regex("(?<=^:86:(?:P +\d+)?)[^\r]+")
        Dim mc As MatchCollection = re.Matches(sourcestring)
        Dim mIdx As Integer = 0
        For Each m As Match In mc
            For groupIdx As Integer = 0 To m.Groups.Count - 1
                Console.Writeline("[" & mIdx & "][" & re.GetGroupNames(groupIdx) & "] = " & m.Groups(groupIdx).Value)
            Next
            mIdx = mIdx + 1
        Next
    End Sub
End Module

Open in new window

0
 

Author Comment

by:AGION
ID: 22794026
Hello ddrudik,
That aint the solution eighter. As you can see I posted an example of the sort regular expression I ment. I came a far way through help from someone on a other forum and by trying myself.
0
 
LVL 23

Expert Comment

by:Tony McCreath
ID: 22794183
I'm not sure how toy determine what is a bankcode. You state 9 or 10 characters,

Your example includes this as a bank account:

P  12345678

but not this:

AUTOBETALI

Both are 10 characters!

So I think we need more rules to slit out the bank code bit?

e.g. if a bank account is present it always starts with "P " and then is a sequence of digits, with a length of 7 or 8

Do you want the captured text to be concatinated into one single line of text?
0
 

Author Comment

by:AGION
ID: 22794236
Tiggerito,
Thanks for your reply, I will give you the regexes I made allready. These regexes do match whether a bankaccount is present or not. But I do have problems with the linebreaks and the :86: in every line (I don't want the regex to show it).
(?<=:86:(P|0).{0,}\s+).*(.*\r\n)*
A dutch bankaccount always starts with a P or an 0, so it matches if a P or an 0 is present and if so, it skips all characters till a white space, after that it selects everything (but that is not a problem, the linebreaks and the :86: in the secondline are however)
(?<=:86:)([^P0].{0}).*(.*\r\n)*
This one matches that there is not a P or an 0 right after :86: and if not it starts selecting everything right away. Again it is not a problem that it selects everything, cause it is part of a tree structure, so it will not select everything).
I have the possibility of using strings to convert certain data but I don't know how to use this correctly. Someone advised me to use: translate('stringname',':86:',' ') but that did not work.
I hope you understand the problem a bit better now, if not: keep asking for more info.
0
 
LVL 23

Accepted Solution

by:
Tony McCreath earned 250 total points
ID: 22794471
I'm going on a slightly different tack to solve this.

This code will parses the lines into a more managable form, then applies the regex to this tidied format.

I've changed the bank detecting regex to the following interpretation:

Starts with a P or a 0
then zero or more spaces
then 7 to 9 digits
Regex regex = new Regex(@"^(?<bank>[P0]{1} *[0-9]{7,9})?(?<details>.*)");
 
StreamReader reader = File.OpenText("Bank.txt");
 
// next account data
string account = null;
string bankcode = null;
string details = null;
 
string line = null;
 
while ((line = reader.ReadLine()) != null)
{
    if (line.Length >= 4)
    {
        string header = line.Substring(0, 4);
        string data = line.Substring(4).Trim();
        
        switch (header)
        {
            case ":61:":
                // new account so process previous account if it exists
                if (account != null) 
                {
                    Debug.WriteLine("Account: " + account);
                    Debug.WriteLine("BankCode: " + bankcode);
                    Debug.WriteLine("Details: " + details);
                }
                
                // start new account
                account = data;
                bankcode = null;
                details = null;
                break;
            case ":86:":
                if (details==null)
                {
                    // first 86 line. so may start with the bank
 
                    Match match = regex.Match(data);
 
                    if (match.Success)
                    {
                        bankcode = match.Groups["bank"].Value;
                        details = match.Groups["details"].Value;
                    }
                    else
                    {
                        bankcode = String.Empty;
                        details = data;
                    }
                }
                else
                    details += data; // or maybe... Environment.NewLine + data;
                break;
            default:
                // unknown line type - skip
                break;
        }
    }
    // else - line with no header - skip
}
 
if (account!=null) // make sure we process the last account
{
    Debug.WriteLine("Account: " + account);
    Debug.WriteLine("BankCode: " + bankcode);
    Debug.WriteLine("Details: " + details);
}
 
reader.Close();

Open in new window

0
 

Author Comment

by:AGION
ID: 22794685
You guys all make real big codes, but I can't drop them in my program, it should really be a short regex, like the ones I mentioned above. I can imagine that a solution like that is not possible.
0
 
LVL 27

Assisted Solution

by:ddrudik
ddrudik earned 250 total points
ID: 22796247
AGION, the regex pattern:
(?<=^:86:(?:P +\d+)?)[^\r]+
Will work for you what you need, I guess you still need to work out how to use that in your application.  I have provided C#.NET and VB.NET code, but if your application doesn't accept VB or C# .NET code I'm not sure how I can help you with the integration.  Possibly enlist the assistance of a programmer to assist with the integration of the pattern into your application.
0
 

Author Comment

by:AGION
ID: 22796479
ddrudik,
this is more like it! I tried it with Regexbuddy, but it does not match a thing, but the regex is valid. So I am not sure what is going wrong, but is seems promising: it looks like a regex I need! I could give you a normal bankfile, so you could test it yourself, but I can't spread this in public, since it belongs to a customer of mine.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22796610
AGION, unfortunately I don't think that's allowed by EE's rules (private messaging not allowed).  Maybe you could sanitize it enough to post here.
0
 

Author Comment

by:AGION
ID: 22796720
well the rules I posted in the codes snippet above are sanitized allready. how do you test your own function? also with regexbuddy?
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22796962
You would need a .NET regex tester to test the variable lookbehind:
http://www.dotnetcoders.com/web/Learning/Regex/RegexTester.aspx

See example:
example.png
0
 

Author Comment

by:AGION
ID: 22797102
I will check it tonight, thanks for your effort allready! I have weekend now, so its time for a hard deserved beer ;) but again, I will check it tonight to see if it fits
0
 
LVL 23

Expert Comment

by:Tony McCreath
ID: 22797199
0
 

Author Comment

by:AGION
ID: 22810664
I tried the solution of ddrudik a couple of times during the weekend and I checked out if I could modify it a bit. But it doesn't work for me.. no info is matched when I check it in RegexBuddy and it doesn't work eighter when I use it in my own program.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22811958
Show your code from your program where you use this regex.  I see the matches in the .NET regex tester with your data so I know it is operational if your data matches the sample you gave in your question.
0
 

Author Comment

by:AGION
ID: 22812061
It is the same code I am using there. To explain the situation a bit: I work with SAP, this financial program has the option to read in bank statements. Those bank statements are being checked by a certain file. This file matches the bank statement file by regular expressions. I checked out what kind of regexes it does support and .NET is the answer. But when I fill in your regex it does not match anything at all.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22812677
Either it doesn't have the regex support for that syntax or your sample data posted doesn't match your real data, since that pattern is matching with your provided sample.  I would suggest you contact your tech support for your application and have them provide you specific documentation regarding the use of regex in your app.
0
 

Author Closing Comment

by:AGION
ID: 31508779
the setup for the regex was good, but too bad it was not supported by my application
0

Featured Post

Enroll in July's Course of the Month

July's Course of the Month is now available! Enroll to learn HTML5 and prepare for certification. It's free for Premium Members, Team Accounts, and Qualified Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question