Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 353
  • Last Modified:

How to break down information in an array which we read from a text file in PERL?

Follow up question for ID: 27312349

As I explained in the last post of the reference post, I would like to break the following per entry:

Options
Files
Comments
Related_Records
Code_Reviewers

The Geck_Login will be same for all entries because we capture it in the beginning of the file.

If we look at the text file again:

USER=testman, HOST=testman-deb6-64, ARCH=glnxa64
Revisions: /st/hub/share/apps/bat//share/mmit: 07/26-09:48:58; csubmitItem.pm: 2011/07/26-09:48:56
Original arguments:
        -t
        Atk
        -F
        20110914.submit
Currently $_='154551'

        main:/st/hub/share/apps/bat/bat2.15.17/share/../lib/csubmitCache.pm:44 called main::submissionHistory
        main:/st/hub/share/apps/bat/bat2.15.17/share/submit:3871 called main::CreateCacheFile

Current directory ($PWD) = /st/devel/sandbox/testman/Aslrtw
                Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 24.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk/glnxa64'>/sandbox/testman/Atk_ests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "locking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants5.c
CR: testman2
RR: 987654
CS: locking before making changes

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

	
				Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 14.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants6.c
CR: testman3
RR: 123456
CS: Unlocking before making changes

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

Open in new window


I would like to have the following in arrays (From Submit File 1 and From Submit File 2 texts are only for clarification. We don't actually need them.):
@Options:
From Submit File 1:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

From Submit File 2:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2


@Files:
From Submit File 1:
st/ert/variants/variants5.c

From Submit File 2:
st/ert/variants/variants6.c

@Comments:
From submit File 1:
locking before making changes

From submit file 2:
Unlocking before making changes

@Related_Records:
From submit File 1:
987654

From submit File 2:
123456

@Code_Reviewers:
From submit File 1:
testman2

From submit File 2:
testman3

Geck_login: testman

Open in new window



When we have this output, I will pass them to another code and log this broken down information into another file separately. That's why I need to know which information belongs to which submit file.

Note: There can be any number of Submit Files in one text file.

I would prefer to have only one Options, Files, Comments, Related_Records, Code_Reviewers arrays and manipulate this data inside these arrays for different information.

Let's say:
The first element of Options should only include options from Submit File 1.
The second element of Options should only include options from Submit File 2.

same thing for the list of files and others.

What i mean is we don't need to put every option or file or others in one element of an array. Same group of information from same Submit file should be in the same array element. Then I can dump this information anywhere I want without causing confusion

I hope this explains everything clearly.

Thanks,


0
Tolgar
Asked:
Tolgar
  • 14
  • 8
2 Solutions
 
parparovCommented:
Allow me to clarify:

There's one Files, Comments, Options etc. entry per 'submit file' section?
The user is read from TEST= at the beginning or from sandbox location?
Because sandbox location may return, theoretically, different users.

0
 
TolgarAuthor Commented:
yes, there is one files, comments, option etc entry per submit file section.

Well, to read the user from the beginning of the file is more reliable but I would prefer to keep the sandbox location for now. If possible please also read it from the beginning. But I would like you to comment it out for now. I guess they will be in the same section of the code.

thanks,

0
 
parparovCommented:
Here is a reworked code.
The return data structure has been changed. Please study the examples of data accessing.
#!/usr/bin/perl

use strict;
use warnings;

our @HEADERS = ("GeckLogin", "Options", "Files", "Comments", "RelatedRecords", "CodeReviewers");
# a prototype for convenience)
sub print_data1 ($);
sub print_data2 ($);

my $data = submitFileParser(shift @ARGV);
my $geckLogin;
use Data::Dumper;
# A look at the data
print Dumper $data;

# Examples of accessing data
print_data1($data);
print_data2($data);

sub print_data1 ($) {
	my $data = shift;

	for my $submit (@{$data}) {
		for my $header (@HEADERS) {
			print "$header:\n";
			if ($header eq 'GeckLogin') {
				print "$submit->{$header}\n";
			}
			else {
				print @{$submit->{$header}};
			}
			print "\n";
		}
		print "\n";
	}
}

sub print_data2 ($) {
	my $data = shift;

	for my $header (@HEADERS) {
		if ($header eq 'GeckLogin') {
			print "GeckLogin: $data->[0]{GeckLogin}\n";
			next;
		}
		print "$header:\n";
		for my $i (1..@{$data}) {
			print "From submit file $i\n";
			print @{$data->[$i-1]{$header}};
			print "\n";
		}
		print "\n";
	}
}

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag);

	my $submit_file = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			$submit_file = 1;
			next;
		}
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
			} else {
				$submit_file = 0; # two-line grammar didn't hold
			}
			next;
		}
		if ($submit_file == 2) {
			if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
				# Match the login name in the submit file - if it has not
				# already been done
				$geckLogin ||= $1;
			}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			}
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) &&
				# CS record is the last one, we commit after it
				push(
					@submits,
					{
						"Options"              => [@Options],
						"Files"                => [@Files],
						"Comments"             => [@CS],
						"RelatedRecords"       => [@RR],
						"CodeReviewers"        => [@CR],
						"GeckLogin" 		   => $geckLogin,
					}
				) &&
				((@Options = @Files = @CR = @CS = @RR = ()) || ($submit_file = 0) || 1)
			&& next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
                }
	}
	return \@submits;
}

Open in new window

0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
TolgarAuthor Commented:
Hi,
Thank you for your prompt reply.

Can you please explain me what you mean in these lines?

Line 127
Line 142
Line 144

Note: The order of CR, CS and RR can be anything in the text. You know that right?

Thanks,
0
 
parparovCommented:
No, I assumed the CS: is the last section of a submit. Otherwise I don't see how to get rid of the trailing "Mail sent to:"

Hope this explains lines 127, 142 and 144
0
 
TolgarAuthor Commented:
ok. let's put this question for a later discussion.

I have another question. When I debug the code, I did the following.

231:                            my @Options = @{$cache_data}[0]->{Options};
  DB<3> x @Options
  empty array
  DB<4> x @{$cache_data}[0]->{Options}
0  ARRAY(0x15961d0)
   0  "-CJ \"<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>\"\cM\cJ"
   1  "-nowrap\cM\cJ"
   2  "-subject \"Unlocking making changes\"\cM\cJ"
   3  "-KEYWORD1\cM\cJ"
   4  "-KEYWORD2\cM\cJ"

Open in new window


And @Options is empty in my first attempt. But then, when print the the right handside directly it worked. So how can I assign the right handside -which is an array- to a new array -like @Options- ?
0
 
parparovCommented:
You did the dereferencing wrong way.
You need to:
231:                            my @Options = @{$cache_data->[0]{Options}};

Open in new window

0
 
TolgarAuthor Commented:
ok.

I have 2 questions:

1-
It worked for all of them except for Related_Records.

This returns the correct data:

@{$cache_data->[0]{RelatedRecords}}

Open in new window


but this one returns empty array:

my @Related_Records = @{$cache_data->[0]{RelatedRecords}}

Open in new window


What am I doing wrong? The only difference is, this data is an integer.

2- Why do I get \cM\cJ at the end of all array elements.

e.g.

 DB<9> x @Comments
0  "Unlocking before making changes\cM\cJ"

Open in new window



Thanks,



0
 
TolgarAuthor Commented:
Hi,
For the question which I have asked in ID: 36569304:

Can we change the code in a way that, we don't make any assumption on which one (CR, RR or CS) will be the last field and then "Mail sent to " can be treated as another field like RR, CS or CR.

Then I can just ignore that one when I pass them to another code.

Can we do that?

Thanks,

0
 
parparovCommented:
Yes, we can do that. I'll post updatyed code later.
\cM\cJ is the carriage return+newline display
0
 
TolgarAuthor Commented:
Hi,
When I log this information to a text file, is \cM\cJ going to be seen or are they gonna be processed?

I am waiting for your updated code.

Thanks,
0
 
TolgarAuthor Commented:
hi,
I wonder if you would be able to post the updated code till Sunday morning.

Thanks,

0
 
parparovCommented:
This code works with your example input, including related records I am testing explicitly:
#!/usr/bin/perl

use strict;
use warnings;

our @HEADERS = ("GeckLogin", "Options", "Files", "Comments", "RelatedRecords", "CodeReviewers", "Mail sent to");
# a prototype for convenience)
sub print_data1 ($);
sub print_data2 ($);
sub submitFileParser($);

my $data = submitFileParser(shift @ARGV);
my $geckLogin;
use Data::Dumper;
# A look at the data
print Dumper $data;

# Examples of accessing data
print_data1($data);
print "++++++++++++++++++++\n";
print_data2($data);
print "++++++++++++++++++++\n";

my @rr = @{$data->[0]{RelatedRecords}};
print Dumper \@rr;
print Dumper $data->[0]{RelatedRecords};

sub print_data1 ($) {
	my $data = shift;

	for my $submit (@{$data}) {
		for my $header (@HEADERS) {
			print "$header:\n";
			if ($header eq 'GeckLogin') {
				print "$submit->{$header}\n";
			}
			else {
				print @{$submit->{$header}};
			}
			print "\n";
		}
		print "\n";
	}
}

sub print_data2 ($) {
	my $data = shift;

	for my $header (@HEADERS) {
		if ($header eq 'GeckLogin') {
			print "GeckLogin: $data->[0]{GeckLogin}\n";
			next;
		}
		print "$header:\n";
		for my $i (1..@{$data}) {
			print "From submit file $i\n";
			print @{$data->[$i-1]{$header}};
			print "\n";
		}
		print "\n";
	}
}

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options, @Mailsent);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag, $mail_sent_to_flag);

	my $submit_file = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			# We record the accumulated data:
			push(
				@submits,
				{
					"Options"              => [@Options],
					"Files"                => [@Files],
					"Comments"             => [@CS],
					"RelatedRecords"       => [@RR],
					"CodeReviewers"        => [@CR],
					"GeckLogin" 		   => $geckLogin,
					"Mail sent to"         => [@Mailsent],
				}
			) if @Files;
			@Options = @Files = @CR = @CS = @RR = ();
			$submit_file = 1;
			next;
		}
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
				$mail_sent_to_flag = 0;
			} else {
				$submit_file = 0; # two-line grammar didn't hold
			}
			next;
		}
		if ($submit_file == 2) {
			if ($mail_sent_to_flag) {
				push(@Mailsent, $_);
				next;
			}
			if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
				# Match the login name in the submit file - if it has not
				# already been done
				$geckLogin ||= $1;
			}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			}
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			if (/^Mail sent to/) {
				$mail_sent_to_flag = 1;
				push(@Mailsent, $_);
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
        }
	}
	push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
		}
	) if @Files;
	return \@submits;
}

Open in new window

0
 
TolgarAuthor Commented:
Hi,
This is works perfect.

I remember a discussion before but I couldn't find the answer in the discussions. So, the dicussion was about the line endings in Windows and in Unix.

My question is:

Can this code parse text files that are created both in Unix and Windows? Because they will have different line endings.

Thanks,



0
 
TolgarAuthor Commented:
Hi,
How can I get the length the of $data in your code?

Because, for the length of it, I will loop through its contents.

Thanks,

0
 
TolgarAuthor Commented:
Let me clarify the last question:

$data in our case has two parts. One is from the first submit file group and the second one is from the second submit file group.

So I should get "2" as result of this command.

Thanks,

0
 
parparovCommented:
The length of data is
my $data_length = scalar @{$data}

Open in new window

gives number of elements in the list
my $data_largest_index = $#{$data}

Open in new window

gives the last index ($data_length-1) in the list.

This code preserves the line endings as they are, they do not affect the code.
The files on Windows usually have a carriage return ("\r" or ^M) at the end in addition to newline. You can get rid of these chars, for example, by using utility dos2unix (or add them by using unix2dos) in linux.
0
 
TolgarAuthor Commented:
perfect solution!!!
0
 
TolgarAuthor Commented:
I have a follow up question:

ID:27331899


Thanks,
0
 
TolgarAuthor Commented:
@parparov:

Can you please expain me what this means? Especially, why we say if @Files; at the end.

push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
			"NoSubmitFileFlag"     => $noSubmitFileFlag,
		}
	) if @Files;
	return \@submits;

Open in new window



Thanks,
0
 
parparovCommented:
It means to push something only if some actual files were encountered. Otherwise it unconditionally push empty arrays into the resulting data structures.
0
 
TolgarAuthor Commented:
Thanks for the clarification
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 14
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now