Solved

How do I select the right lines due regular expressions

Posted on 2008-10-13
20
301 Views
Last Modified: 2012-06-22
Experts,

I have a question about regular expressions. I am a newbie in regular expressions and I could use some help on this one. I tried some 6 hours, but I can't get solve it myself.


Summary of my problem:

In SAP Business One it is possible to use ban statement processing. A file (full of regular expressions) is to be selected, so it can match certain criteria to the bank statement file. The bank statement file consists of a certain pattern (look at the attached txt file).


I need regular expressions for the following:

- a regular expression that selects lines starting with :61: and line :86: including next lines (if available), so in fact it has to select everything from :86: till :61: again.
- a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
- a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number

I am looking forward to the right solutions, I can give more info if you need any.

Included: code snippet with a couple of lines


:61:071222D208,00N026

:86:P  12345678BELASTINGDIENST       F8R03782497                $GH

$0000009                         BETALINGSKENM. 123456789123456

0 1234567891234560                                             

:61:071225C758,70N078

:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD

CITY 48772-54314                                                   

:61:071225C425,05N078

:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA

LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN     

:61:071225C850,00N078

:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR

DERNR. 53846 REF. MAIL 21-02

Open in new window

0
Comment
Question by:AGION
  • 11
  • 9
20 Comments
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> .. starting with :61: and line :86: including next lines (if available),
could you please give an example for *not available*

# following regex perl-style:
> - a regular expression that selects the bank account number (position 5-14) from lines starting with :86:
$account=~s/^:86:(.{9}).*/$1/;

> - a regular expression that selects all other info from lines starting with :86: (and following if any), so all positions that follow after the bank account number
$rest=~s/^:86:.{9}(.*)/$1/;
0
 

Author Comment

by:AGION
Comment Utility
Thanks for your answer, but these expressions are not the ones I ment. I will give you an example of the ones being used as common.
The program uses a sort of tree-structure. So in the first expression it applies to both kind of rows with this expression:      :61:(.*\r\n)*?(?=:62(F|M):)
The level underneath it selects:      (:61:.*\r\n){1,2}(:86:.*\r\n){1,6}
And for the level underneath the previous one it selects:      (:86:[^\r\n]*\r\n){0,6}
 
Hereby as you requested an example of a line starting with :86: without secondlines:
:61:071225C2330,00N078
:86:0936017481 INVESTGROUP UNKNOWN-BPBETALINGSKENM. 74985  
 
I hope this info makes it a bit more clear, if not please ask again.
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> an example of a line starting with :86: without secondlines:
hmm, and how do you distinguish that from:

:61:071225C758,70N078
:86:0936017481 INVESTGROUP UNKNOWN-BPBETALINGSKENM. 74985
0 1234567891234560                                            
0
 

Author Comment

by:AGION
Comment Utility
Okay, I experimented a bit, the trick is how I can select :61: and all the text placed after it, until the :61: occurs again..

:61:071222D208,00N026
:86:P  12345678BELASTINGDIENST       F8R03782497                $GH
$0000009                         BETALINGSKENM. 123456789123456
0 1234567891234560                  
                         
:61:071225C758,70N078
:86:0116664495 REGULA B.V. HELPMESTRAAT 243 B 5371 AM HARDCITY HARD
CITY 48772-54314            
                                     
:61:071225C425,05N078
:86:0329883585 J. MANSSHOT PATTRIOTISLAND 38 1996 PT HELMEN BIJBETA
LING VOOR RELOOP RMP1 SET ORDERNR* 69866 / SPOEDIG LEVEREN    

:61:071225C850,00N078
:86:0105327212 POSE TELEFOONSTRAAT 43 6448 SL S-ROTTERDAM MIJN OR
DERNR. 53846 REF. MAIL 21-02                

Like that..

Im almost there by using: (^:61:).*?(:61:)  .. but it takes the first occuring :61: with it again..                  
 
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
in perl you can match such a "record" with:

$record=~s/(:61:)(.*)(?:61:)/$1$2/m
0
 

Author Comment

by:AGION
Comment Utility
Could you translate that for me to a normal reg expression? Because I dont use Perl..
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> (:61:)(.*)(?:61:)

match (:61:) and keep backreference
match anything 'til :61: and keep backreference
match (:61:) and don't keep reference

> $1$2
replace first reference followed by second (see matches above)

> m
multiline match
0
 

Author Comment

by:AGION
Comment Utility
I tried the regular expression     > (:61:)(.*)(?:61:)     but it does not match a thing.
I need a regex that matches all occuring lines starting with :61:
I need a regex that matches all occuring lines starting with :86: and eventually next lines, till a line starting with :61: starts again.
Within lines starting with :86: I need a regex that matches the 5th till the 14th character of the first line and I need another regex which will select the rest of the line(s) till :61: again.
 
If this is a tuff one, please notify me, I could raise the points you guys can earn, but I need this one badly.
0
 

Author Comment

by:AGION
Comment Utility
I made the following regular expression myself allready:
(:61:)(.*)
(:86:)(.*)(\s+.*)(.*)
it selects the following:
> all lines starting with :61:
> all lines starting with :86: and the lines under it if available.
if you need a visualisation of the whole thing, look at the attachment.
 

Regular-Expression.bmp
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> but it does not match a thing.
does your regex flavour support multiline matching?
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:AGION
Comment Utility
I am experimenting in RegexBuddy at the moment.
But in the end it should all fit in another program, I will post a little screenshot of that aswell.
If I do understand you right, both programs do support multiline matching (look at the other screenshot). But as I said in my startpost, I am brandnew at this kind of programming.
 
Nb. Some explanation of this screenshot: there are several layers/levels and they contain different kind of regular expressions.
By instance the level I selected in the screenshot does match lines starting with :61: and the first line starting with :86:
The levels below that will match more specific parts of the lines :61: or :86:
 

Tree-Structure.bmp
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> I am experimenting in RegexBuddy at the moment.
> But in the end it should all fit in another program,
hmm, the regex flavour you're finally using is the most important (I'm not sure if RegexBuddy supports all flavours in detail).

> I am brandnew at this kind of programming.
regex are a complex world, I suggest that you make yourself used to the basics of it (using simple tools like awk, egrep, sed, perl) and then do your sophisticated things in the language aou write your code.
IMHO it's a bad idea to use something like RegexBuddy to test what your language should do finally.
Note: I'm not telling you that RegexBuddy is wrong, but you need to take care what it does *and* you need to know how regex work!

What is you final  coding language (which supports regex)?
0
 

Author Comment

by:AGION
Comment Utility
> hmm, the regex flavour you're finally using is the most important (I'm not sure if RegexBuddy supports all flavours in detail).

RegexBuddy does support a lot of languages, but I don't know what type of regular expression the final program does use. I can give you some examples of standard regular expressions the program does use:
:61:(.*\r\n)*?(?=:62(F|M):)
(:61:.*\r\n){1,2}(:86:.*\r\n){1,6}
(:86:[^\r\n]*\r\n){0,6}
> regex are a complex world, I suggest that you make yourself used to the basics of it (using simple tools like awk, egrep, sed, perl) and then do your sophisticated things in the language aou write your code.
Since I am new at this, I had to take a look what I could find out about this. So I made this topic to see some examples of what I needed. I could have taken a close look on the regular expressions the experts made, so I could learn from them.
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 500 total points
Comment Utility
 (?=:62(F|M):)

is positive lookahead, something most flavour do not support

  *?
is a non-greedy match, also something most flavours do not support

so you see, how important it is that you know which regex flavour to use
0
 

Author Comment

by:AGION
Comment Utility
yes I see, I think your explanation is quite clear. I didn't know that when I started trying to configure this. I could try to find out which particular language is been used by the final program, that would make it a lot easier to you experts and myself.
0
 

Author Comment

by:AGION
Comment Utility
I found out that the regular expressions should be in .NET
I hope someone can help me now solving my problems.
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
.NET supports all suggested examples, IIRC
0
 

Author Comment

by:AGION
Comment Utility
I found the solutions myself already (with help from SAP) , but the points go to ahoffmann because of his various tries, so thanks for your effort.
0
 

Author Closing Comment

by:AGION
Comment Utility
Hoffmann pointed on the different types of regular expressions, so I knew more exactly where to search for (and to get help from others), so I could solve my own question.
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
would you like to share the solution with us?
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now