Tolgar
asked on
How to fix this parser bug in my PERL code?
This is the code that I use:
This is the text file that I parse:
The problem is; this code only detects the first line of CS (or Comments) line.
The expected output is:
How can I change this code to detect any number of lines belongs of CS or Comments tag?
sub submitFileParser ($) {
my $filename = shift;
my @paragraphs;
# local($/) = '';
open( FILE, "< $filename" ) or die "Can't open $filename : $!";
@paragraphs = <FILE>;
close FILE;
return read_paragraphs (@paragraphs);
}
sub read_paragraphs (@) {
# read lines as parameters
my @rippedParagraphs = @_;
my @submits = ();
# Storage for all sections
# Temporary storages for single section of each type
my (@Files, @CR, @RR, @CS, @Options, @Mailsent, @file_info);
# Flags for file traversal logic
my ($opt_flag, $file_flag, $mail_sent_to_flag);
my $submitFileExist = 0;
#read the file
for ( @rippedParagraphs ) {
s/\r//g;
if (/^USER=(\S+)\,/) {
#obtain the login from USER=
$geckLogin = $1;
}
if (/^\s*Submit\s+file\s*$/) {
# We record the accumulated data:
push(
@submits,
{
"Options" => [@Options],
"FileInfo" => [@file_info],
"GeckLogin" => $geckLogin,
"SubmitFileExistFlag" => $submitFileExist,
"Mail sent to" => [@Mailsent],
}
) if @file_info;
@Options = @Mailsent = @file_info = ();
$submitFileExist = 1;
next;
}
if ($submitFileExist == 1) {
if (/^\s*\=+\s*$/) {
$submitFileExist++;
$mail_sent_to_flag = 0;
}
next;
}
if ($submitFileExist == 2) {
if ($mail_sent_to_flag) {
push(@Mailsent, $_);
next;
}
#if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
# # Match the login name in the submit file - if it has not
# # already been done
# $geckLogin ||= $1;
#}
# If we encounter a comment or empty string
if (/^\#/ || !/\S/) {
# we haven't encountered an option to start doing anything
next unless $opt_flag || $file_flag;
# If we're done with options, let's start reading file sections
if ($opt_flag == 1) {
$opt_flag = 0;
$file_flag = 1;
}
elsif ($opt_flag > 1) {
# Addresses the empty line within Options:
$opt_flag--;
}
next;
}
if (/^Options/) {
# We start reading options
$opt_flag = 2;
next;
}
if (/^Mail sent to/) {
$mail_sent_to_flag = 1;
push(@Mailsent, $_);
next;
}
# Matching beginning of the line to determine the type of the string
# and placing it in temporary storage
/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && goto CHECK;
/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && goto CHECK;
/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && goto CHECK;
# General text is either files or options info, depending on the
# value of the option flag
$opt_flag ? push(@Options, $_) : push(@Files, $_);
CHECK:
if (@RR && @CR && @CS) {
push(
@file_info,
{
"Files" => [@Files],
"Comments" => [@CS],
"RelatedRecords" => [@RR],
"CodeReviewers" => [@CR],
"SubmitFileExistFlag" => $submitFileExist,
},
);
@Files = @CS = @RR = @CR = ();
}
}
}
if ($submitFileExist == 0) {
my @noSubmitFileSubmits = ();
$submitFileExist = 0; # two-line grammar didn't hold
my $parsedData = parseWithoutSubmitFile(@rippedParagraphs);
#submit file does not exist flag
$submitFileExist = 0;
push(
@noSubmitFileSubmits,
{
"GeckLogin" => $geckLogin,
"ParsedData" => $parsedData,
"SubmitFileExistFlag" => $submitFileExist,
"Cluster" => $parsedData->{t},
"JobID" => $parsedData->{dollar_},
"gLogFilesOption" => exists $parsedData->{GLOGFILES},
"gLogSbcheckOption" => exists $parsedData->{GLOGSBCHECK},
}
) if $parsedData;
return \@noSubmitFileSubmits;
}
push(
@submits,
{
"Options" => [@Options],
"Mail sent to" => [@Mailsent],
"FileInfo" => [@file_info],
"GeckLogin" => $geckLogin,
}
) if @file_info;
return \@submits;
}
sub parseWithoutSubmitFile (@) {
my $arg_flag = 0;
my $parsedData = {};
my $current_option = '';
while (my $line = shift @_) {
if ($arg_flag == 1) {
if ($line =~ /^Currently (\$\_=.*)/) {
local $_;
eval "$1;";
$parsedData->{dollar_} = $_;
$arg_flag = 0;
}
elsif ($line =~ /^\s+\-(.*)/) {
$current_option = $1;
$parsedData->{$current_option} = undef;
next;
}
elsif ($current_option && $line =~ /^\s+(.*)/) {
$parsedData->{$current_option} = $1;
$current_option = undef;
}
}
else {
if ($line =~ /^Original arguments:/) {
$arg_flag = 1;
next;
}
}
}
return $parsedData;
}
This is the text file that I parse:
USER=testman, HOST=testman-deb6-64, ARCH=glnxa64
Revisions: /st/hub/share/apps/bat//share/mmit: 07/26-09:48:58; csubmitItem.pm: 2011/07/26-09:48:56
Original arguments:
-t
Atk
-F
20110914.submit
Currently $_='154551'
main:/st/hub/share/apps/bat/bat2.15.17/share/../lib/csubmitCache.pm:44 called main::submissionHistory
main:/st/hub/share/apps/bat/bat2.15.17/share/submit:3871 called main::CreateCacheFile
Current directory ($PWD) = /st/devel/sandbox/testman/Aslrtw
Submit file
===========================
# Component : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for : 2000
#
# Description:
# Unlocking making changes
#
# Documentation impact:
# None
#
# QE items:
# None
#
# Type of change:
# Unlocking making changes
#
# submit file for use with msubmit. To use run the command
# submit -F 24.submit
# or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk/glnxa64'>/sandbox/testman/Atk_ests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2
st/ert/variants/variants4.c
CR: testman2
RR: 123456
CS: Unlocking before making changes
and adding this line for this case
Mail sent to:
st.devel.submit: Unlocking making changes
Files:
st/ert/variants/variants5.c
Submit file
===========================
# Component : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for : 2000
#
# Description:
# Unlocking making changes
#
# Documentation impact:
# None
#
# QE items:
# None
#
# Type of change:
# Unlocking making changes
#
# submit file for use with msubmit. To use run the command
# submit -F 14.submit
# or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2
st/ert/variants/variants5.c
st/ert/variants/variants6.c
CS: Unlocking before making changes
and adding this line for this case
CR: testman2
RR: 333333, 444444
st/ert/variants/variants7.c
CR: testman2
Comments: Unlocking before making changes
and adding this line for this case
RR: 555555, 666666
Mail sent to:
st.devel.submit: Unlocking making changes
Files:
st/ert/variants/variants5.c
The problem is; this code only detects the first line of CS (or Comments) line.
The expected output is:
Unlocking before making changes
and adding this line for this case
How can I change this code to detect any number of lines belongs of CS or Comments tag?
This will wreck everything since you will lose RRs and CRs.
You asked a misleading question in the other thread, because it solves a partial problem on expense of creating another one.
As I said, we need to expand our flag set, install one indicating parsing a CS entry and proceed according to it.
You asked a misleading question in the other thread, because it solves a partial problem on expense of creating another one.
As I said, we need to expand our flag set, install one indicating parsing a CS entry and proceed according to it.
ASKER
@parparov:
Why do we lose RRs and CRs? I think it is because, we parse it line by line. Am I right?
On the other hand, are you gonna be able to expand the flag set?
Thanks,
Why do we lose RRs and CRs? I think it is because, we parse it line by line. Am I right?
On the other hand, are you gonna be able to expand the flag set?
Thanks,
Because you're pushing into @CS once you match either RR, CS, or CR without distinguishing what you actually matched.
ASKER
What if we push something into CS that comes after CS line, if this line is not a CR: (or Code Reviewer:) or RR: (Related Records: ) or Mail sent to: line
Does this work?
What do you think?
Thanks,
Does this work?
What do you think?
Thanks,
ASKER
@parparov: I wonder, if you will be able to work on it. I would appreciate if you can let me know.
Thanks,
Thanks,
Still thinking
ASKER
ok. Thanks.
I will wait for your reply.
I will wait for your reply.
There is a problem with the grammar.
Consider a multiline CS statement at the end of CS/CR/RR block.
Consider an absent Mail sent to block
Now, how would we distinguish a continuing CS statement from another file entry within the same Submit File upper block.
Like this:
Consider a multiline CS statement at the end of CS/CR/RR block.
Consider an absent Mail sent to block
Now, how would we distinguish a continuing CS statement from another file entry within the same Submit File upper block.
Like this:
file/path1
CR: cr1 cr1
RR: rr1 rr1
CS: cs1
cs1
cs1
file/path2
file/path3
CR: cr1 cr2
RR: rr2 rr2
CS:cs2
cs/in/form/of/file/path/to/make/life/harder
...
ASKER
file/path1
CR: cr1 cr1
RR: rr1 rr1
CS: cs1
cs1
cs1
file/path2
file/path3
CR: cr1 cr2
RR: rr2 rr2
CS:cs2
cs2
There is always a single blank line (with only one new line character.) between these blocks.
This is a line with only one new line character. (But new line character should cover both unix and windows new line)
Can we detect this an indicator?
Thanks,
CR: cr1 cr1
RR: rr1 rr1
CS: cs1
cs1
cs1
file/path2
file/path3
CR: cr1 cr2
RR: rr2 rr2
CS:cs2
cs2
There is always a single blank line (with only one new line character.) between these blocks.
This is a line with only one new line character. (But new line character should cover both unix and windows new line)
Can we detect this an indicator?
Thanks,
But a CS can contain a blank line, can't it?
It can worst case even emulate a whole entry?
It can worst case even emulate a whole entry?
ASKER
I didn't really understand your point in here. Why does it emulate the whole entry in the worst case?
On the other hand, we can assume that the CS line will always contain some information.
On the other hand, we can assume that the CS line will always contain some information.
CS: this
comment
will make us headache
because after this blank line
file/path1
is actually part of the comment
and only here comment ends.
file/path2
ASKER
Why is file/path1 part of the comment?
It comes before the comment.
Can't we push anything we find after CS line until a single new line line if they don't start with RR or CR?
If the line starts with RR or CR or only contains a single new line character, then we stop.
Does it sound reasonable?
It comes before the comment.
Can't we push anything we find after CS line until a single new line line if they don't start with RR or CR?
If the line starts with RR or CR or only contains a single new line character, then we stop.
Does it sound reasonable?
Why?
Because the comment is "FREE TEXT" so it can contain anything, even string file/path1
Or, you are saying comments CANNOT contain a blank line? That would simplify things.
Because the comment is "FREE TEXT" so it can contain anything, even string file/path1
Or, you are saying comments CANNOT contain a blank line? That would simplify things.
ASKER
Absolutely... Comments cannot contain blank lines.
How this helps...
How this helps...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Perfect !!!! It worked..
ASKER
@parparov: While I was testing the code, I realized that if I create the the text file in in unix the code does not work or vice versa. I checked the line endings. In one case it ends with LF and the other ends with CRLF.
How can I cover both cases in this code?
Thanks,
How can I cover both cases in this code?
Thanks,
ASKER
Note: In one way, I need to do this conversion:
Thanks,
<CR><LF> --> <LF>
<LF><CR> --> <LF>
<CR> --> <LF>
Thanks,
ASKER
Note: 2: I found something like this :
but how am I gonna apply it to the entire file in my code?
Thanks,
my $rippedText = scalar(grep(s/\015+$// | s/\015/\n/g, @r));
but how am I gonna apply it to the entire file in my code?
Thanks,
You can do a 'dos2unix' on the input file before you run your perl script. But I guess this deserves a separate question. :)
ASKER
@parparov: I created a new question. Can you please reply ASAP?
ID: 27393665
Thanks,
ID: 27393665
Thanks,
ASKER
If we look at line 92:
Open in new window
Each match is upto the new line character.
If we change the new line character for this CS line with the following list then the problem can be solved:
(RR: or Related Records:) or (CR: or Code Reviewer:) or (Mail sent to:)
So I propose something like this:
Open in new window
How can I code this in Perl?
Thanks,