?
Solved

Parsing report using regular expressions

Posted on 2005-03-23
12
Medium Priority
?
195 Views
Last Modified: 2010-03-05
Treaty: ABC
Totals:  Transaction: New Business      

      Amount:             500.00
      Tax:                  30.00

Treaty: ABC
Totals:  Transaction: Terminations      

      Amount:            -200.00
      Tax:                    0.00

Treaty: ABC
Totals:  Total ABC      

      Amount:             300.00
      Tax:                  30.00

Treaty: XYZ
Totals:  Transaction: New Business      

      Amount:             600.00
      Tax:                  40.00

Treaty: XYZ
Totals:  Total XYZ      

      Amount:             600.00
      Tax:                  40.00


How can I parse to this Format getting only Totals for each Treaty:

Outcome Sample:

Treaty      Amount      Tax
ABC      300.00      30.00
XYZ      600.00      40.00
0
Comment
Question by:mbasov
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
12 Comments
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13613779
use strict ;
use warnings ;
my %amount_hash = () ;
my %tax_hash = () ;
my $cur_key = "" ;
while(<>) {
if (m/^Treaty:\s*(\w+)\s*$/) {$key = $1}
if (m/^\s*Amount:\s*([-\d]+)\s*$/i) {$amount_hash{$key} += $1}
if (m/^\s*Tax:\s*([-\d]+)\s*$/i) {$tax_hash{$key} += $1}
}
foreach (keys %amount_hash) {
print "$_ $amount_hash{$_} $tax_hash{$_}" ;
}

0
 
LVL 18

Expert Comment

by:kandura
ID: 13613853
$t = do { local $/; <DATA> };
while($t =~
    /
        Treaty:\s+(\w+)
        \s+
        Totals:\s+Total\s+\1
        \s+
        Amount:\s+([\d.]+)
        \s+
        Tax:\s+([\d.]+)
    /gsx
    )
{
    push @tr, [ $1, $2, $3 ];
}

printf "%-10s |%-10s |%-10s \n", qw/ Treaty Amount Tax /;
foreach (@tr) {
    printf "%-10s |% 10.2f |% 10.2f \n", @$_;
}

__DATA__
Treaty: ABC
Totals:  Transaction: New Business    

     Amount:            500.00
     Tax:              30.00

Treaty: ABC
Totals:  Transaction: Terminations    

     Amount:           -200.00
     Tax:                0.00

Treaty: ABC
Totals:  Total ABC    

     Amount:            300.00
     Tax:              30.00

Treaty: XYZ
Totals:  Transaction: New Business    

     Amount:            600.00
     Tax:              40.00

Treaty: XYZ
Totals:  Total XYZ    

     Amount:            600.00
     Tax:              40.00
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13613897
Wont read the whole file in a single go, but more clumsier with the regex.....

use strict ;
use warnings ;
my %amount_hash = () ;
my %tax_hash = () ;
my $key = "" ;
while(<DATA>) {
if (m/^Totals:\s*Transaction/i) {$key = ''}
if (m/^Totals:\s*Total\s*(\w+)\s*$/) {$key = $1}
if (m/^\s*Amount:\s*([-\d.]+)\s*$/i) {$amount_hash{$key} += $1}
if (m/^\s*Tax:\s*([-\d.]+)\s*$/i) {$tax_hash{$key} += $1}
}
foreach (grep{length($_)>1}keys %amount_hash) {
print "$_ $amount_hash{$_} $tax_hash{$_}\n" ;
}
__DATA__
Treaty: ABC
Totals:  Transaction: New Business

     Amount:            500.00
     Tax:                  30.00

Treaty: ABC
Totals:  Transaction: Terminations

     Amount:           -200.00
     Tax:                    0.00

Treaty: ABC
Totals:  Total ABC

     Amount:            300.00
     Tax:                  30.00

Treaty: XYZ
Totals:  Transaction: New Business

     Amount:            600.00
     Tax:                  40.00

Treaty: XYZ
Totals:  Total XYZ

     Amount:            600.00
     Tax:                  40.00
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 18

Accepted Solution

by:
kandura earned 672 total points
ID: 13613928
Oops! I missed the possibility of negative totals. Also, the /s modifier on the regexp is pointless.
Here's another version, same __DATA__ section:
   
    $t = do { local $/; <DATA> };
    while($t =~
        /
            Treaty:\s+(\w+)
            \s+
            Totals:\s+Total\s+\1
            \s+
            Amount:\s+([\d.-]+)
            \s+
            Tax:\s+([\d.-]+)
        /gx
        )
    {
        push @tr, [ $1, $2, $3 ];
    }
   
    printf "| %-10s | %-10s | %-10s |\n", qw/ Treaty Amount Tax /;
    foreach (@tr) {
        printf "| %-10s | % 10.2f | % 10.2f |\n", @$_;
    }


Output:

| Treaty     | Amount     | Tax        |
| ABC        |     300.00 |      30.00 |
| XYZ        |     600.00 |      40.00 |


(This is nicely formatted in a fixed font... which I really wish EE would use, since so much of our comments consists of code)
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13614008
This will reduce the number of regex-matches done but still won't read in the whole file.....ALso, the formatting is nicer ;)

use strict ;
use warnings ;
my %amount_hash = () ;
my %tax_hash = () ;
my $key = "" ;
while(<DATA>) {
if (m/^Totals:\s*Total\s*(\w+)\s*$/) {$key = $1}
if (/^Totals:\s*Total\s*(\w+)\s*$/../^Treaty/) {
      if (m/^\s*Amount:\s*([-\d.]+)\s*$/i) {$amount_hash{$key} += $1}
      if (m/^\s*Tax:\s*([-\d.]+)\s*$/i) {$tax_hash{$key} += $1}
}
}
printf ("%10s%10s%10s\n", "Treaty","Amount","Tax") ;
foreach (grep{length($_)>1}keys %amount_hash) {
printf ("%10s%10s%10s\n",$_,$amount_hash{$_},$tax_hash{$_}) ;
}
__DATA__
Treaty: ABC
Totals:  Transaction: New Business

     Amount:            500.00
     Tax:                  30.00

Treaty: ABC
Totals:  Transaction: Terminations

     Amount:           -200.00
     Tax:                    0.00

Treaty: ABC
Totals:  Total ABC

     Amount:            300.00
     Tax:                  30.00

Treaty: XYZ
Totals:  Transaction: New Business

     Amount:            600.00
     Tax:                  40.00

Treaty: XYZ
Totals:  Total XYZ

     Amount:            600.00
     Tax:                  40.00
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13614016
In all the above solutions, replace <DATA> with <> and then sullply the name of your input file as the command line argument to this script as follows....

./your_script.pl /path/to/input_file
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 664 total points
ID: 13614451
my($treaty,$totals);
printf"%10s%10s%10s\n",'Treaty','Amount','Tax';
while( <> ){
    $treaty = $1 if /Treaty:\s*(\S+)/;
    $totals = $1 if /Totals:\s*Total\s*(\S+)/;
    printf"%10s%10s",$treaty,$1 if $totals eq $treaty && /Amount:\s*(\S+)/;
    printf"%10s\n",$1 if $totals eq $treaty && /Tax:\s*(\S+)/;
}
__DATA__
Treaty: ABC
Totals:  Transaction: New Business

     Amount:            500.00
     Tax:                  30.00

Treaty: ABC
Totals:  Transaction: Terminations

     Amount:           -200.00
     Tax:                    0.00

Treaty: ABC
Totals:  Total ABC

     Amount:            300.00
     Tax:                  30.00

Treaty: XYZ
Totals:  Transaction: New Business

     Amount:            600.00
     Tax:                  40.00

Treaty: XYZ
Totals:  Total XYZ

     Amount:            600.00
     Tax:                  40.00
0
 

Author Comment

by:mbasov
ID: 13615692
Manav Mathur,
Can you please comment this section:

while(<DATA>) {
if (m/^Totals:\s*Total\s*(\w+)\s*$/) {$key = $1}
if (/^Totals:\s*Total\s*(\w+)\s*$/../^Treaty/) {
     if (m/^\s*Amount:\s*([-\d.]+)\s*$/i) {$amount_hash{$key} += $1}
     if (m/^\s*Tax:\s*([-\d.]+)\s*$/i) {$tax_hash{$key} += $1}
}

I am not sure what it does.
0
 
LVL 16

Assisted Solution

by:manav_mathur
manav_mathur earned 664 total points
ID: 13619217
> if (m/^Totals:\s*Total\s*(\w+)\s*$/) {$key = $1}
This will extract the ABC part from
Totals:  Total ABC

and store it into $key

> if (/^Totals:\s*Total\s*(\w+)\s*$/../^Treaty/) {

This will become true on a line matching /^Totals:\s*Total\s*(\w+)\s*$/ i.e Totals:  Total ABC
and become false when it next encounters a line starting with Treaty.....

Hence it will be true fr blocks like

Totals:  Total ABC

     Amount:            300.00
     Tax:                  30.00


Manav



 


0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question