How do I select the right lines due regular expressions

Experts,

I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.


Summary of my problem:

In SAP Business One it is possible to use ban statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached txt file).


I need regular expressions for the following:

- a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
- a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
- a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number

I am looking forward to the right solutions, I can give more info if you need any.

Included: code snippet with a couple of lines


:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
$0000009                         BETALINGSKENM. 123456789123456
0 1234567891234560                                             
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314                                                   
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     
:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02

Open in new window

AGIONAsked:
Who is Participating?
 
ahoffmannConnect With a Mentor Commented:
 (?=:62(F|M):)

is positive lookahead, something most flavour do not support

  *?
is a non-greedy match, also something most flavours do not support

so you see, how important it is that you know which regex flavour to use
0
 
ahoffmannCommented:
> .. starting with :61: and line :86: including next lines (if available),
could you please give an example for *not available*

# following regex perl-style:
> - a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
$account=~s/^:86:(.{9}).*/$1/;

> - a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
$rest=~s/^:86:.{9}(.*)/$1/;
0
 
AGIONAuthor Commented:
Thanks for your answer, but these expressions are not the ones I ment. I will give you an example of the ones being used as common.
The program uses a sort of tree-structure. So in the first expression it applies to both kind of rows with this expression:      :61:(.*\r\n)*?(?=:62(F|M):)
The level underneath it selects:      (:61:.*\r\n){1,2}(:86:.*\r\n){1,6}
And for the level underneath the previous one it selects:      (:86:[^\r\n]*\r\n){0,6}
 
Hereby as you requested an example of a line starting with :86: without secondlines:
:61:071225C2330,00N078
:86:0936017481 INVESTGROUP UNKNOWN-BPBETALINGSKENM. 74985  
 
I hope this info makes it a bit more clear, if not please ask again.
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
ahoffmannCommented:
> an example of a line starting with :86: without secondlines:
hmm, and how do you distinguish that from:

:61:071225C758,70N078
:86:0936017481 INVESTGROUP UNKNOWN-BPBETALINGSKENM. 74985
0 1234567891234560                                            
0
 
AGIONAuthor Commented:
Okay, I experimented a bit, the trick is how I can select :61: and all the text placed after it, until the :61: occurs again..

:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
$0000009                         BETALINGSKENM. 123456789123456
0 1234567891234560                  
                         
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314            
                                     
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN    

:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02                

Like that..

Im almost there by using: (^:61:).*?(:61:)  .. but it takes the first occuring :61: with it again..                  
 
0
 
ahoffmannCommented:
in perl you can match such a "record" with:

$record=~s/(:61:)(.*)(?:61:)/$1$2/m
0
 
AGIONAuthor Commented:
Could you translate that for me to a normal reg expression? Because I dont use Perl..
0
 
ahoffmannCommented:
> (:61:)(.*)(?:61:)

match (:61:) and keep backreference
match anything 'til :61: and keep backreference
match (:61:) and don't keep reference

> $1$2
replace first reference followed by second (see matches above)

> m
multiline match
0
 
AGIONAuthor Commented:
I tried the regular expression     > (:61:)(.*)(?:61:)     but it does not match a thing.
I need a regex that matches all occuring lines starting with :61:
I need a regex that matches all occuring lines starting with :86: and eventually next lines, till a line starting with :61: starts again.
Within lines starting with :86: I need a regex that matches the 5th till the 14th character of the first line and I need another regex which will select the rest of the line(s) till :61: again.
 
If this is a tuff one, please notify me, I could raise the points you guys can earn, but I need this one badly.
0
 
AGIONAuthor Commented:
I made the following regular expression myself allready:
(:61:)(.*)
(:86:)(.*)(\s+.*)(.*)
it selects the following:
> all lines starting with :61:
> all lines starting with :86: and the lines under it if available.
if you need a visualisation of the whole thing, look at the attachment.
 

Regular-Expression.bmp
0
 
ahoffmannCommented:
> but it does not match a thing.
does your regex flavour support multiline matching?
0
 
AGIONAuthor Commented:
I am experimenting in RegexBuddy at the moment.
But in the end it should all fit in another program, I will post a little screenshot of that aswell.
If I do understand you right, both programs do support multiline matching (look at the other screenshot). But as I said in my startpost, I am brandnew at this kind of programming.
 
Nb. Some explanation of this screenshot: there are several layers/levels and they contain different kind of regular expressions.
By instance the level I selected in the screenshot does match lines starting with :61: and the first line starting with :86:
The levels below that will match more specific parts of the lines :61: or :86:
 

Tree-Structure.bmp
0
 
ahoffmannCommented:
> I am experimenting in RegexBuddy at the moment.
> But in the end it should all fit in another program,
hmm, the regex flavour you're finally using is the most important (I'm not sure if RegexBuddy supports all flavours in detail).

> I am brandnew at this kind of programming.
regex are a complex world, I suggest that you make yourself used to the basics of it (using simple tools like awk, egrep, sed, perl) and then do your sophisticated things in the language aou write your code.
IMHO it's a bad idea to use something like RegexBuddy to test what your language should do finally.
Note: I'm not telling you that RegexBuddy is wrong, but you need to take care what it does *and* you need to know how regex work!

What is you final  coding language (which supports regex)?
0
 
AGIONAuthor Commented:
> hmm, the regex flavour you're finally using is the most important (I'm not sure if RegexBuddy supports all flavours in detail).

RegexBuddy does support a lot of languages, but I don't know what type of regular expression the final program does use. I can give you some examples of standard regular expressions the program does use:
:61:(.*\r\n)*?(?=:62(F|M):)
(:61:.*\r\n){1,2}(:86:.*\r\n){1,6}
(:86:[^\r\n]*\r\n){0,6}
> regex are a complex world, I suggest that you make yourself used to the basics of it (using simple tools like awk, egrep, sed, perl) and then do your sophisticated things in the language aou write your code.
Since I am new at this, I had to take a look what I could find out about this. So I made this topic to see some examples of what I needed. I could have taken a close look on the regular expressions the experts made, so I could learn from them.
0
 
AGIONAuthor Commented:
yes I see, I think your explanation is quite clear. I didn't know that when I started trying to configure this. I could try to find out which particular language is been used by the final program, that would make it a lot easier to you experts and myself.
0
 
AGIONAuthor Commented:
I found out that the regular expressions should be in .NET
I hope someone can help me now solving my problems.
0
 
ahoffmannCommented:
.NET supports all suggested examples, IIRC
0
 
AGIONAuthor Commented:
I found the solutions myself already (with help from SAP) , but the points go to ahoffmann because of his various tries, so thanks for your effort.
0
 
AGIONAuthor Commented:
Hoffmann pointed on the different types of regular expressions, so I knew more exactly where to search for (and to get help from others), so I could solve my own question.
0
 
ahoffmannCommented:
would you like to share the solution with us?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.