simadownnow
asked on
How to parse file with Perl
Hi,
I am trying to parse a message file and extract messages from the file that meet the criteria. So what I am looking for are certain types of messages that are logged in a file. Example. I would like to pull out messages that have say CA, DC, OP, etc.. from this file example: It is located after the ORC segment. I only want to pull out the message when it finds the two letter code between the pipes, usually located on the ORC line. I want it to match the |CA| in ORC and not consider the CA in CAMPING as a match.
2nd Part: How can I pull out a message that falls between the two Ignored: lines if the next line after Ignored: is ***Message Ignored, Incorrect Order Type? I would want everything that falls after that first ignored until the next ignored appears. Hopefully this isn't too confusing and I described exactly what I want. My code can pull out info but it matches also on the CA in CAMPING. I am also doing a count for each time it appears and I only want it to find the |CA| instances as well so this does not work correctly as well.
Ignored:
*** Message Ignored, Incorrect Order Type
MSH|^~\&|RP|GJH|ALL|ASD|20 0809220017 07||RDE^O0 1|20080922 0017070086 97|P|2.3|| |||||||
PID|||1234567^^^^^||Person ||number|L |||||||||| |||||||||| ||||||||||
PV1||H|B12^B1266^A^K||||10 0022^James ^James,||| ||||||||I| 947258034| |||||||||| |||||||||| |||||||||| ||
ORC|CA|In63708710|94725803 4-24-1|203 15933|||1^ BID&0800,2 000^INDEF^ 2008092200 10^^R^^111 11110^||20 0809220017 |1965435^F FR^SHN|196 5435^FFR^S HN|100022^ James^Jame s,|||||||1 96u86735^F FR^SHN
RXO|5132^Product^SEQNO|||| |||||||||| ||
RXE|1^BID&0800,2000^INDEF^ 2008092200 10^^R^^111 11110^|824 6300^CAMPI NG|50||mg| TAP|^|||1| EACH||1000 22|||||||| VEND|||||| |||F|24240 000^Descri ption||||| ||||||||M| ^
RXG|1||1^20080922001000
RXG|2||1^20080922080000
Ignored:
*** Message Ignored, Incorrect Order Type:
MSH|^~\&|RX|GJH|ALL_...... ....
Next message in similar structure as above....
Any help would be really appreciated.
I am trying to parse a message file and extract messages from the file that meet the criteria. So what I am looking for are certain types of messages that are logged in a file. Example. I would like to pull out messages that have say CA, DC, OP, etc.. from this file example: It is located after the ORC segment. I only want to pull out the message when it finds the two letter code between the pipes, usually located on the ORC line. I want it to match the |CA| in ORC and not consider the CA in CAMPING as a match.
2nd Part: How can I pull out a message that falls between the two Ignored: lines if the next line after Ignored: is ***Message Ignored, Incorrect Order Type? I would want everything that falls after that first ignored until the next ignored appears. Hopefully this isn't too confusing and I described exactly what I want. My code can pull out info but it matches also on the CA in CAMPING. I am also doing a count for each time it appears and I only want it to find the |CA| instances as well so this does not work correctly as well.
Ignored:
*** Message Ignored, Incorrect Order Type
MSH|^~\&|RP|GJH|ALL|ASD|20
PID|||1234567^^^^^||Person
PV1||H|B12^B1266^A^K||||10
ORC|CA|In63708710|94725803
RXO|5132^Product^SEQNO||||
RXE|1^BID&0800,2000^INDEF^
RXG|1||1^20080922001000
RXG|2||1^20080922080000
Ignored:
*** Message Ignored, Incorrect Order Type:
MSH|^~\&|RX|GJH|ALL_......
Next message in similar structure as above....
Any help would be really appreciated.
#! /usr/bin/perl
use warnings;
use strict;
use diagnostics;
open(INFILE, "rxp.ign") or die "Can't open input.txt: $!";
open(OUTFILE, ">resultsCA.txt") or die "Can't open output.txt: $!";
while (<INFILE>) {
if( /\bCA\b/ig ) {
print OUTFILE $_;
}
}
close OUTFILE;
close INFILE;
my $val = <rxp2.ign>;
chomp ($val);
my $cnt=0;
open (HNDL, "$val") || die "wrong filename";
while ($val = <HNDL>)
{
while ($val =~ /\bCA\b/ig)
{
++$cnt;
}
print "Number of instances of 'CA's' found: $cnt2\n\n";
awk '/^Ignored:/{if(f==1){prin t x};f==0;}/^ORC\|(CA|DC|OP) \|/{f=1}{x =sprintf(" %s\n%s",x, $0)}END{if (f==1){pri nt x}}' you-file
ASKER
I've never used the awk command before because I'm new to Perl. Could you explain a little better what this is doing? Do I need to declare any variables, awk? Also when you write you-file, does that mean my file name that I want to parse? rxp.ign? in quotes or anything? Does this also count the instances of each type of message CA, DC, OP?
> .. explain a little
set a flag if string Ignored is found at beginning of line
collect all lines if flaf is set
print collected line if string Ignored is found (and after reading file, as there is no more such line but probably a collected one)
> Do I need to declare any variables,
no, as all variables are o (integer) or '' (empty string) by default
> does that mean my file name that I want to parse?
yes
> in quotes or anything?
depends on your shell (i.g. without quotes if the filename does not contains meta characters)
> Does this also count the instances of each type of message CA, DC, OP?
no
to do that, use something like:
awk '/^Ignored:/{if(f==1){prin t x};f=0;}/^ORC\|(CA|DC|OP)\ |/{f=1}{x= sprintf("% s\n%s",x,$ 0);if(/\|C A\|/){c++} };if(/\|DC \|/){d++}} ;if(/\|OP\ |/){o++}}} END{if(f== 1){print x};print "CA: ",c;print "DC: ",d;print "OP: ",o}' rxp.ign
(not that my first post contains an error: f==0 muxt be f=0)
---
that's quick&dirty with awk, if you need more text precessing it's probably better to start with perl right away
set a flag if string Ignored is found at beginning of line
collect all lines if flaf is set
print collected line if string Ignored is found (and after reading file, as there is no more such line but probably a collected one)
> Do I need to declare any variables,
no, as all variables are o (integer) or '' (empty string) by default
> does that mean my file name that I want to parse?
yes
> in quotes or anything?
depends on your shell (i.g. without quotes if the filename does not contains meta characters)
> Does this also count the instances of each type of message CA, DC, OP?
no
to do that, use something like:
awk '/^Ignored:/{if(f==1){prin
(not that my first post contains an error: f==0 muxt be f=0)
---
that's quick&dirty with awk, if you need more text precessing it's probably better to start with perl right away
ASKER
This log file is created on a windows box so I am using active perl for winXP which is what my workstation is.
Thanks for the quick reply and explanation. I don't know how to insert this in my perl code and execute the awk command. I've tried but I get errors. Should there be a BEGIN statement to go with the END that is in the code?
Thanks for the quick reply and explanation. I don't know how to insert this in my perl code and execute the awk command. I've tried but I get errors. Should there be a BEGIN statement to go with the END that is in the code?
same as perl code (quick&dirty converted from awk)
perl -ane 'm/^Ignored:/&&do{if($f==1 ){print $x};$f=0;};m/^ORC\|(CA|DC| OP)\|/&&do {$f=1};{$x .=$_;if(/\ |CA\|/){$c ++};if(/\| DC\|/){$d+ +};if(/\|O P\|/){$o++ }}END{if($ f==1){prin t $x};print "CA: ",$c;print ", DC: ",$d;print ", OP: ",$o;}' rxp.ign
perl -ane 'm/^Ignored:/&&do{if($f==1
ASKER
Hoffman,
What is the -ane in the command line? Usually when I run a perl script I run it from the command line i.e perl whatever.pl. Have you got this to parse the example I put up? You can duplicate the message back to back to increase the message instances. I can't get this to run, and I am probably executing it wrong. Man I am a lamen with PERL, I need to pick up a book. Sorry to keep asking, this must seem mundane to you..
What is the -ane in the command line? Usually when I run a perl script I run it from the command line i.e perl whatever.pl. Have you got this to parse the example I put up? You can duplicate the message back to back to increase the message instances. I can't get this to run, and I am probably executing it wrong. Man I am a lamen with PERL, I need to pick up a book. Sorry to keep asking, this must seem mundane to you..
> Have you got this to parse the example I put up?
simply stuff anthing between single quotes ' in your .pl file and execute it
> I can't get this to run ..
are you on unreliable systems like windoze? bad luck, you have to use a file for the script or fiddle arround M$'s strange handling of any kind of quotes.
Get any reliable shell and it works as posted. or use a script file. Sorry, I'm not responsible for stupid systems :)
> Man I am a lamen with PERL,
we fix that ;-)
> What is the -ane ...
man perl
man perlrun
-a awk mode
-n no print
-e execute these commands
or more detailled (shamless stolen from perl's man-pages):
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the @F array is done as
the first thing inside the implicit while loop produced by the -n or -p.
-e commandline
may be used to enter one line of program. If -e is given, Perl will not look for a filename in the argu
ment list. Multiple -e commands may be given to build up a multi-line script. Make sure to use semicolons
where you would in a normal program.
-n causes Perl to assume the following loop around your program, which makes it iterate over filename argu
ments somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See -p to have lines printed. If a file named by an argu
ment cannot be opened for some reason, Perl warns you about it and moves on to the next file.
Here is an efficient way to delete all files older than a week:
find . -mtime +7 -print | perl -nle unlink
This is faster than using the -exec switch of find because you don't have to start a process on every file
name found. It does suffer from the bug of mishandling newlines in pathnames, which you can fix if you
follow the example under -0.
"BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as
in awk.
simply stuff anthing between single quotes ' in your .pl file and execute it
> I can't get this to run ..
are you on unreliable systems like windoze? bad luck, you have to use a file for the script or fiddle arround M$'s strange handling of any kind of quotes.
Get any reliable shell and it works as posted. or use a script file. Sorry, I'm not responsible for stupid systems :)
> Man I am a lamen with PERL,
we fix that ;-)
> What is the -ane ...
man perl
man perlrun
-a awk mode
-n no print
-e execute these commands
or more detailled (shamless stolen from perl's man-pages):
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the @F array is done as
the first thing inside the implicit while loop produced by the -n or -p.
-e commandline
may be used to enter one line of program. If -e is given, Perl will not look for a filename in the argu
ment list. Multiple -e commands may be given to build up a multi-line script. Make sure to use semicolons
where you would in a normal program.
-n causes Perl to assume the following loop around your program, which makes it iterate over filename argu
ments somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See -p to have lines printed. If a file named by an argu
ment cannot be opened for some reason, Perl warns you about it and moves on to the next file.
Here is an efficient way to delete all files older than a week:
find . -mtime +7 -print | perl -nle unlink
This is faster than using the -exec switch of find because you don't have to start a process on every file
name found. It does suffer from the bug of mishandling newlines in pathnames, which you can fix if you
follow the example under -0.
"BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as
in awk.
ASKER
I am still confused how to get this to run. I am using windowsXP and that is what I need to run the script on. Unfortunatley I cannot use a more reliable shell. I want to run the command against a file, I don't want to have to copy and paste data between quotes. So I would like to run a cmd line such as perl (extract.pl) and have the script execute the commands from within extract.pl script which will then open the file with the data and parse it. this is how I usually get commands to run.
So am I suppose to put 'm/^Ignored:/&&do{if($f==1 ){print $x};$f=0;};m/^ORC\|(CA|DC| OP)\|/&&do {$f=1};{$x .=$_;if(/\ |CA\|/){$c ++};if(/\| DC\|/){$d+ +};if(/\|O P\|/){$o++ }}END{if($ f==1){prin t $x};print "CA: ",$c;print ", DC: ",$d;print ", OP: ",$o;}' rxp.ign in the extract.pl so that it can run?
So am I suppose to put 'm/^Ignored:/&&do{if($f==1
> .. would like to run a cmd line such as perl (extract.pl) ..
simply write the code between the single quotes in your file (extract.pl) and run it like
perl extract.pl rxp.ign
simply write the code between the single quotes in your file (extract.pl) and run it like
perl extract.pl rxp.ign
ASKER
I did and I got this
C:\Logs>perl extract.pl rxp.ign
Useless use of a constant in void context at extract.pl line 6 (#1)
(W void) You did something without a side effect in a context that does
nothing with the return value, such as a statement that doesn't return a
value from a block, or the left side of a scalar comma operator. Very
often this points not to stupidity on your part, but a failure of Perl
to parse your program the way you thought it would. For example, you'd
get this if you mixed up your C precedence with Python precedence and
said
$one, $two = 1, 2;
when you meant to say
($one, $two) = (1, 2);
Another common error is to use ordinary parentheses to construct a list
reference when you should be using square or curly brackets, for
example, if you say
$array = (1,2);
when you should have said
$array = [1,2];
The square brackets explicitly turn a list value into a scalar value,
while parentheses do not. So when a parenthesized list is evaluated in
a scalar context, the comma is treated like C's comma operator, which
throws away the left argument, which is not what you want. See
perlref for more on this.
This warning will not be issued for numerical constants equal to 0 or 1
since they are often used in statements like
1 while sub_with_side_effects();
String constants that would normally evaluate to 0 or 1 are warned
C:\Logs>perl extract.pl rxp.ign
Useless use of a constant in void context at extract.pl line 6 (#1)
(W void) You did something without a side effect in a context that does
nothing with the return value, such as a statement that doesn't return a
value from a block, or the left side of a scalar comma operator. Very
often this points not to stupidity on your part, but a failure of Perl
to parse your program the way you thought it would. For example, you'd
get this if you mixed up your C precedence with Python precedence and
said
$one, $two = 1, 2;
when you meant to say
($one, $two) = (1, 2);
Another common error is to use ordinary parentheses to construct a list
reference when you should be using square or curly brackets, for
example, if you say
$array = (1,2);
when you should have said
$array = [1,2];
The square brackets explicitly turn a list value into a scalar value,
while parentheses do not. So when a parenthesized list is evaluated in
a scalar context, the comma is treated like C's comma operator, which
throws away the left argument, which is not what you want. See
perlref for more on this.
This warning will not be issued for numerical constants equal to 0 or 1
since they are often used in statements like
1 while sub_with_side_effects();
String constants that would normally evaluate to 0 or 1 are warned
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
> I did and I got this
use following as first line of your script
my $c=$d=$o=$f=0; my $x='';
use following as first line of your script
my $c=$d=$o=$f=0; my $x='';
ASKER
Okay I ran the exact command that ozo did, and added the declarations that Ahoff. put up. It works on counting the instances but I want to be able to extract each of those types of messages once the locator field is found as I mentioned in the question and put them in a file for each so CA.txt, DC.txt. If a CA occurs I want the entire message from the end of the first ignore to the beginning of the next ignore if that makes it easy enough. I dont want it to pull out CA if it finds it in a word within the entire message "like CALL" as well which I believe this is doing just like the script I wrote and pasted. Hopefully we can make it a little stricter on searching and matching. Almost there....
ASKER
What if I wanted what was just on the second line for each message and possibly count those i.e
*** Message Ignored, Incorrect Order Type:
*** Message Ignored, Multi-component order not supported.
*** Message Ignored, Incorrect Order Type: OP
*** Message Ignored, Incorrect Order Type: NW
*** Message Ignored, Incorrect Order Type: DC
# of OP messages = $OP
SO would it be if(/\|***Message Ignored,Incorrect Order Type: OP\|/){$OP++};
I think the ***create issues due to wildcard, not sure how to use as a search character within Perl
I would like it to capture anything that appears after the *** on that line after the Ignored and before the MSH next line. If these could also be totalled, that would be great as well. This may help me to be able to change the script to what I need in other cases.
*** Message Ignored, Incorrect Order Type:
*** Message Ignored, Multi-component order not supported.
*** Message Ignored, Incorrect Order Type: OP
*** Message Ignored, Incorrect Order Type: NW
*** Message Ignored, Incorrect Order Type: DC
# of OP messages = $OP
SO would it be if(/\|***Message Ignored,Incorrect Order Type: OP\|/){$OP++};
I think the ***create issues due to wildcard, not sure how to use as a search character within Perl
I would like it to capture anything that appears after the *** on that line after the Ignored and before the MSH next line. If these could also be totalled, that would be great as well. This may help me to be able to change the script to what I need in other cases.
if(/\*\*\*Message Ignored,Incorrect Order Type: OP/){$OP++};
ASKER
Okay I an going to work this in, to see if this helps. So lets say if this line matches so the IF statement is TRUE and it increments OP by one, can I put the message that follows into a OP file and append all other OP messages when it matches that statement. so from the MSH to the end of that message or the next Ignored:?
> .. can I put the message that follows into a OP file ..
assuming you opend that file at beginning, somehow like:
open(OP, ">>path/to/OP-file")||die" ERROR: cannot open OP-file: $!";
then you can write to that file like:
print OP $_; # assuming that $_ contains your current line
assuming you opend that file at beginning, somehow like:
open(OP, ">>path/to/OP-file")||die"
then you can write to that file like:
print OP $_; # assuming that $_ contains your current line
ASKER
But how does it know where to start the message and end the message for output into the OP file after it matches the current line I am searching on?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thanks for the help!