asked on

How to fix this parser bug in my PERL code?

This is the code that I use:

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options, @Mailsent, @file_info);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag, $mail_sent_to_flag);

	my $submitFileExist = 0;
	#read the file
	for ( @rippedParagraphs ) {
	    s/\r//g;
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			# We record the accumulated data:
			push(
				@submits,
				{
					"Options"              => [@Options],
					"FileInfo"             => [@file_info],
					"GeckLogin" 		   => $geckLogin,
					"SubmitFileExistFlag"     => $submitFileExist,
					"Mail sent to"         => [@Mailsent],
				}
			) if @file_info;
			@Options = @Mailsent = @file_info = ();
			$submitFileExist = 1;
			next;
		}
		
		if ($submitFileExist == 1) {
			if (/^\s*\=+\s*$/) {
				$submitFileExist++;
				$mail_sent_to_flag = 0;
			} 
			next;
		}
		if ($submitFileExist == 2) {
			if ($mail_sent_to_flag) {
				push(@Mailsent, $_);
				next;
			}
			#if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
			#	# Match the login name in the submit file - if it has not
			#	# already been done
			#	$geckLogin ||= $1;
			#}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			}
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			if (/^Mail sent to/) {
				$mail_sent_to_flag = 1;
				push(@Mailsent, $_);
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && goto CHECK;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && goto CHECK;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && goto CHECK;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
			CHECK:
			if (@RR && @CR && @CS) {
				push(
					@file_info,
					{
						"Files"                => [@Files],
						"Comments"             => [@CS],
						"RelatedRecords"       => [@RR],
						"CodeReviewers"        => [@CR],
						"SubmitFileExistFlag"  => $submitFileExist,
					},
				);
				@Files = @CS = @RR = @CR = ();
			}
        }
	}
	
	if ($submitFileExist == 0) {
			my @noSubmitFileSubmits = ();
			$submitFileExist = 0; # two-line grammar didn't hold
			my $parsedData = parseWithoutSubmitFile(@rippedParagraphs);
			#submit file does not exist flag
			$submitFileExist = 0;
			push(
				@noSubmitFileSubmits,
				{
				"GeckLogin" 		   => $geckLogin,
				"ParsedData"		   => $parsedData,
				"SubmitFileExistFlag"  => $submitFileExist,
				"Cluster"              => $parsedData->{t},
				"JobID"                => $parsedData->{dollar_},
				"gLogFilesOption"      => exists $parsedData->{GLOGFILES},
				"gLogSbcheckOption"    => exists $parsedData->{GLOGSBCHECK},
				}
			) if $parsedData;
				return \@noSubmitFileSubmits;
		}
	push(
		@submits,
		{
			"Options"              => [@Options],
			"Mail sent to"         => [@Mailsent],
			"FileInfo"             => [@file_info],
			"GeckLogin" 		   => $geckLogin,
		}
	) if @file_info;
	return \@submits;
}


sub parseWithoutSubmitFile (@) {
	my $arg_flag = 0;
	my $parsedData = {};
	my $current_option = '';
		while (my $line = shift @_) {
		if ($arg_flag == 1) {
		if ($line =~ /^Currently (\$\_=.*)/) {
		local $_;
		eval "$1;";
		$parsedData->{dollar_} = $_;
		$arg_flag = 0;
		}
		elsif ($line =~ /^\s+\-(.*)/) {
		$current_option = $1;
		$parsedData->{$current_option} = undef;
		next;
		}
		elsif ($current_option && $line =~ /^\s+(.*)/) {
		$parsedData->{$current_option} = $1;
		$current_option = undef;
		}
		}
		else {
		if ($line =~ /^Original arguments:/) {
		$arg_flag = 1;
		next;
		}
		}
	}
return $parsedData;
}

Open in new window

This is the text file that I parse:

USER=testman, HOST=testman-deb6-64, ARCH=glnxa64
Revisions: /st/hub/share/apps/bat//share/mmit: 07/26-09:48:58; csubmitItem.pm: 2011/07/26-09:48:56
Original arguments:
        -t
        Atk
        -F
        20110914.submit
Currently $_='154551'

        main:/st/hub/share/apps/bat/bat2.15.17/share/../lib/csubmitCache.pm:44 called main::submissionHistory
        main:/st/hub/share/apps/bat/bat2.15.17/share/submit:3871 called main::CreateCacheFile

Current directory ($PWD) = /st/devel/sandbox/testman/Aslrtw
                Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 24.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk/glnxa64'>/sandbox/testman/Atk_ests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants4.c
CR: testman2
RR: 123456
CS: Unlocking before making changes
and adding this line for this case

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

	
				Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 14.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants5.c
st/ert/variants/variants6.c
CS: Unlocking before making changes
and adding this line for this case
CR: testman2
RR: 333333, 444444

st/ert/variants/variants7.c
CR: testman2
Comments: Unlocking before making changes
and adding this line for this case
RR: 555555, 666666

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

Open in new window

The problem is; this code only detects the first line of CS (or Comments) line.

The expected output is:

Unlocking before making changes
and adding this line for this case

Open in new window

How can I change this code to detect any number of lines belongs of CS or Comments tag?

Tolgar

ASKER

I think I have an idea but I don't know how to code it correctly:

If we look at line 92:

/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && goto CHECK;

Open in new window

Each match is upto the new line character.

If we change the new line character for this CS line with the following list then the problem can be solved:

(RR: or Related Records:) or (CR: or Code Reviewer:) or (Mail sent to:)

So I propose something like this:

/^C(S|omments):\s*(.{*(RR: or Related Records:) or (CR: or Code Reviewer:) or (Mail sent to:)})/ && push(@CS, $2) && goto CHECK;

Open in new window

How can I code this in Perl?

Thanks,

parparov

This will wreck everything since you will lose RRs and CRs.
You asked a misleading question in the other thread, because it solves a partial problem on expense of creating another one.

As I said, we need to expand our flag set, install one indicating parsing a CS entry and proceed according to it.

Tolgar

ASKER

@parparov:
Why do we lose RRs and CRs? I think it is because, we parse it line by line. Am I right?

On the other hand, are you gonna be able to expand the flag set?

Thanks,

parparov

Because you're pushing into @CS once you match either RR, CS, or CR without distinguishing what you actually matched.

Tolgar

ASKER

What if we push something into CS that comes after CS line, if this line is not a CR: (or Code Reviewer:) or RR: (Related Records: ) or Mail sent to: line

Does this work?

What do you think?

Thanks,

Tolgar

ASKER

@parparov: I wonder, if you will be able to work on it. I would appreciate if you can let me know.

Thanks,

parparov

Still thinking

Tolgar

ASKER

ok. Thanks.

I will wait for your reply.

parparov

There is a problem with the grammar.

Consider a multiline CS statement at the end of CS/CR/RR block.
Consider an absent Mail sent to block
Now, how would we distinguish a continuing CS statement from another file entry within the same Submit File upper block.
Like this:

file/path1
CR: cr1 cr1
RR: rr1 rr1
CS: cs1
cs1
cs1
file/path2
file/path3
CR: cr1 cr2
RR: rr2 rr2
CS:cs2
cs/in/form/of/file/path/to/make/life/harder
...

Open in new window

Tolgar

ASKER

file/path1
CR: cr1 cr1
RR: rr1 rr1
CS: cs1
cs1
cs1

file/path2
file/path3
CR: cr1 cr2
RR: rr2 rr2
CS:cs2
cs2

There is always a single blank line (with only one new line character.) between these blocks.

This is a line with only one new line character. (But new line character should cover both unix and windows new line)

Can we detect this an indicator?

Thanks,

parparov

But a CS can contain a blank line, can't it?
It can worst case even emulate a whole entry?

Tolgar

ASKER

I didn't really understand your point in here. Why does it emulate the whole entry in the worst case?

On the other hand, we can assume that the CS line will always contain some information.

parparov

CS: this
comment

will make us headache
because after this blank line

file/path1
is actually part of the comment
and only here comment ends.

file/path2

Open in new window

Tolgar

ASKER

Why is file/path1 part of the comment?

It comes before the comment.

Can't we push anything we find after CS line until a single new line line if they don't start with RR or CR?
If the line starts with RR or CR or only contains a single new line character, then we stop.

Does it sound reasonable?

parparov

Why?
Because the comment is "FREE TEXT" so it can contain anything, even string file/path1

Or, you are saying comments CANNOT contain a blank line? That would simplify things.

Tolgar

ASKER

Absolutely... Comments cannot contain blank lines.

How this helps...

ASKER CERTIFIED SOLUTION

parparov

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Tolgar

ASKER

Perfect !!!! It worked..

Tolgar

ASKER

@parparov: While I was testing the code, I realized that if I create the the text file in in unix the code does not work or vice versa. I checked the line endings. In one case it ends with LF and the other ends with CRLF.

How can I cover both cases in this code?

Thanks,

Tolgar

ASKER

Note: In one way, I need to do this conversion:

  <CR><LF>   -->   <LF>
  <LF><CR>   -->   <LF>
  <CR>           -->   <LF>

Open in new window

Thanks,

Tolgar

ASKER

Note: 2: I found something like this :

my $rippedText = scalar(grep(s/\015+$// | s/\015/\n/g, @r));

Open in new window

but how am I gonna apply it to the entire file in my code?

Thanks,

parparov

You can do a 'dos2unix' on the input file before you run your perl script. But I guess this deserves a separate question. :)

Tolgar

ASKER

@parparov: I created a new question. Can you please reply ASAP?

ID: 27393665

Thanks,