Link to home
Start Free TrialLog in
Avatar of Tolgar
Tolgar

asked on

How to break down information in an array which we read from a text file in PERL?

Follow up question for ID: 27312349

As I explained in the last post of the reference post, I would like to break the following per entry:

Options
Files
Comments
Related_Records
Code_Reviewers

The Geck_Login will be same for all entries because we capture it in the beginning of the file.

If we look at the text file again:

USER=testman, HOST=testman-deb6-64, ARCH=glnxa64
Revisions: /st/hub/share/apps/bat//share/mmit: 07/26-09:48:58; csubmitItem.pm: 2011/07/26-09:48:56
Original arguments:
        -t
        Atk
        -F
        20110914.submit
Currently $_='154551'

        main:/st/hub/share/apps/bat/bat2.15.17/share/../lib/csubmitCache.pm:44 called main::submissionHistory
        main:/st/hub/share/apps/bat/bat2.15.17/share/submit:3871 called main::CreateCacheFile

Current directory ($PWD) = /st/devel/sandbox/testman/Aslrtw
                Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 24.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk/glnxa64'>/sandbox/testman/Atk_ests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "locking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants5.c
CR: testman2
RR: 987654
CS: locking before making changes

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

	
				Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 14.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants6.c
CR: testman3
RR: 123456
CS: Unlocking before making changes

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

Open in new window


I would like to have the following in arrays (From Submit File 1 and From Submit File 2 texts are only for clarification. We don't actually need them.):
@Options:
From Submit File 1:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

From Submit File 2:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2


@Files:
From Submit File 1:
st/ert/variants/variants5.c

From Submit File 2:
st/ert/variants/variants6.c

@Comments:
From submit File 1:
locking before making changes

From submit file 2:
Unlocking before making changes

@Related_Records:
From submit File 1:
987654

From submit File 2:
123456

@Code_Reviewers:
From submit File 1:
testman2

From submit File 2:
testman3

Geck_login: testman

Open in new window



When we have this output, I will pass them to another code and log this broken down information into another file separately. That's why I need to know which information belongs to which submit file.

Note: There can be any number of Submit Files in one text file.

I would prefer to have only one Options, Files, Comments, Related_Records, Code_Reviewers arrays and manipulate this data inside these arrays for different information.

Let's say:
The first element of Options should only include options from Submit File 1.
The second element of Options should only include options from Submit File 2.

same thing for the list of files and others.

What i mean is we don't need to put every option or file or others in one element of an array. Same group of information from same Submit file should be in the same array element. Then I can dump this information anywhere I want without causing confusion

I hope this explains everything clearly.

Thanks,


Avatar of parparov
parparov
Flag of United States of America image

Allow me to clarify:

There's one Files, Comments, Options etc. entry per 'submit file' section?
The user is read from TEST= at the beginning or from sandbox location?
Because sandbox location may return, theoretically, different users.

Avatar of Tolgar
Tolgar

ASKER

yes, there is one files, comments, option etc entry per submit file section.

Well, to read the user from the beginning of the file is more reliable but I would prefer to keep the sandbox location for now. If possible please also read it from the beginning. But I would like you to comment it out for now. I guess they will be in the same section of the code.

thanks,

Here is a reworked code.
The return data structure has been changed. Please study the examples of data accessing.
#!/usr/bin/perl

use strict;
use warnings;

our @HEADERS = ("GeckLogin", "Options", "Files", "Comments", "RelatedRecords", "CodeReviewers");
# a prototype for convenience)
sub print_data1 ($);
sub print_data2 ($);

my $data = submitFileParser(shift @ARGV);
my $geckLogin;
use Data::Dumper;
# A look at the data
print Dumper $data;

# Examples of accessing data
print_data1($data);
print_data2($data);

sub print_data1 ($) {
	my $data = shift;

	for my $submit (@{$data}) {
		for my $header (@HEADERS) {
			print "$header:\n";
			if ($header eq 'GeckLogin') {
				print "$submit->{$header}\n";
			}
			else {
				print @{$submit->{$header}};
			}
			print "\n";
		}
		print "\n";
	}
}

sub print_data2 ($) {
	my $data = shift;

	for my $header (@HEADERS) {
		if ($header eq 'GeckLogin') {
			print "GeckLogin: $data->[0]{GeckLogin}\n";
			next;
		}
		print "$header:\n";
		for my $i (1..@{$data}) {
			print "From submit file $i\n";
			print @{$data->[$i-1]{$header}};
			print "\n";
		}
		print "\n";
	}
}

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag);

	my $submit_file = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			$submit_file = 1;
			next;
		}
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
			} else {
				$submit_file = 0; # two-line grammar didn't hold
			}
			next;
		}
		if ($submit_file == 2) {
			if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
				# Match the login name in the submit file - if it has not
				# already been done
				$geckLogin ||= $1;
			}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			}
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) &&
				# CS record is the last one, we commit after it
				push(
					@submits,
					{
						"Options"              => [@Options],
						"Files"                => [@Files],
						"Comments"             => [@CS],
						"RelatedRecords"       => [@RR],
						"CodeReviewers"        => [@CR],
						"GeckLogin" 		   => $geckLogin,
					}
				) &&
				((@Options = @Files = @CR = @CS = @RR = ()) || ($submit_file = 0) || 1)
			&& next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
                }
	}
	return \@submits;
}

Open in new window

Avatar of Tolgar

ASKER

Hi,
Thank you for your prompt reply.

Can you please explain me what you mean in these lines?

Line 127
Line 142
Line 144

Note: The order of CR, CS and RR can be anything in the text. You know that right?

Thanks,
No, I assumed the CS: is the last section of a submit. Otherwise I don't see how to get rid of the trailing "Mail sent to:"

Hope this explains lines 127, 142 and 144
Avatar of Tolgar

ASKER

ok. let's put this question for a later discussion.

I have another question. When I debug the code, I did the following.

231:                            my @Options = @{$cache_data}[0]->{Options};
  DB<3> x @Options
  empty array
  DB<4> x @{$cache_data}[0]->{Options}
0  ARRAY(0x15961d0)
   0  "-CJ \"<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>\"\cM\cJ"
   1  "-nowrap\cM\cJ"
   2  "-subject \"Unlocking making changes\"\cM\cJ"
   3  "-KEYWORD1\cM\cJ"
   4  "-KEYWORD2\cM\cJ"

Open in new window


And @Options is empty in my first attempt. But then, when print the the right handside directly it worked. So how can I assign the right handside -which is an array- to a new array -like @Options- ?
You did the dereferencing wrong way.
You need to:
231:                            my @Options = @{$cache_data->[0]{Options}};

Open in new window

Avatar of Tolgar

ASKER

ok.

I have 2 questions:

1-
It worked for all of them except for Related_Records.

This returns the correct data:

@{$cache_data->[0]{RelatedRecords}}

Open in new window


but this one returns empty array:

my @Related_Records = @{$cache_data->[0]{RelatedRecords}}

Open in new window


What am I doing wrong? The only difference is, this data is an integer.

2- Why do I get \cM\cJ at the end of all array elements.

e.g.

 DB<9> x @Comments
0  "Unlocking before making changes\cM\cJ"

Open in new window



Thanks,



Avatar of Tolgar

ASKER

Hi,
For the question which I have asked in ID: 36569304:

Can we change the code in a way that, we don't make any assumption on which one (CR, RR or CS) will be the last field and then "Mail sent to " can be treated as another field like RR, CS or CR.

Then I can just ignore that one when I pass them to another code.

Can we do that?

Thanks,

Yes, we can do that. I'll post updatyed code later.
\cM\cJ is the carriage return+newline display
Avatar of Tolgar

ASKER

Hi,
When I log this information to a text file, is \cM\cJ going to be seen or are they gonna be processed?

I am waiting for your updated code.

Thanks,
Avatar of Tolgar

ASKER

hi,
I wonder if you would be able to post the updated code till Sunday morning.

Thanks,

ASKER CERTIFIED SOLUTION
Avatar of parparov
parparov
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tolgar

ASKER

Hi,
This is works perfect.

I remember a discussion before but I couldn't find the answer in the discussions. So, the dicussion was about the line endings in Windows and in Unix.

My question is:

Can this code parse text files that are created both in Unix and Windows? Because they will have different line endings.

Thanks,



Avatar of Tolgar

ASKER

Hi,
How can I get the length the of $data in your code?

Because, for the length of it, I will loop through its contents.

Thanks,

Avatar of Tolgar

ASKER

Let me clarify the last question:

$data in our case has two parts. One is from the first submit file group and the second one is from the second submit file group.

So I should get "2" as result of this command.

Thanks,

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tolgar

ASKER

perfect solution!!!
Avatar of Tolgar

ASKER

I have a follow up question:

ID:27331899


Thanks,
Avatar of Tolgar

ASKER

@parparov:

Can you please expain me what this means? Especially, why we say if @Files; at the end.

push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
			"NoSubmitFileFlag"     => $noSubmitFileFlag,
		}
	) if @Files;
	return \@submits;

Open in new window



Thanks,
It means to push something only if some actual files were encountered. Otherwise it unconditionally push empty arrays into the resulting data structures.
Avatar of Tolgar

ASKER

Thanks for the clarification