Solved

How to break down information in an array which we read from a text file in PERL?

Posted on 2011-09-19
22
311 Views
Last Modified: 2012-05-12
Follow up question for ID: 27312349

As I explained in the last post of the reference post, I would like to break the following per entry:

Options
Files
Comments
Related_Records
Code_Reviewers

The Geck_Login will be same for all entries because we capture it in the beginning of the file.

If we look at the text file again:

USER=testman, HOST=testman-deb6-64, ARCH=glnxa64
Revisions: /st/hub/share/apps/bat//share/mmit: 07/26-09:48:58; csubmitItem.pm: 2011/07/26-09:48:56
Original arguments:
        -t
        Atk
        -F
        20110914.submit
Currently $_='154551'

        main:/st/hub/share/apps/bat/bat2.15.17/share/../lib/csubmitCache.pm:44 called main::submissionHistory
        main:/st/hub/share/apps/bat/bat2.15.17/share/submit:3871 called main::CreateCacheFile

Current directory ($PWD) = /st/devel/sandbox/testman/Aslrtw
                Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 24.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk/glnxa64'>/sandbox/testman/Atk_ests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "locking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants5.c
CR: testman2
RR: 987654
CS: locking before making changes

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

	
				Submit file
        ===========================
# Component        : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for   : 2000
#
# Description:
#   Unlocking making changes
#
# Documentation impact:
#   None
#
# QE items:
#   None
#
# Type of change:
#   Unlocking making changes
#

# submit file for use with msubmit.  To use run the command
#      submit -F 14.submit
#   or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:

-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

st/ert/variants/variants6.c
CR: testman3
RR: 123456
CS: Unlocking before making changes

Mail sent to:
    st.devel.submit: Unlocking making changes
    Files:
    st/ert/variants/variants5.c

Open in new window


I would like to have the following in arrays (From Submit File 1 and From Submit File 2 texts are only for clarification. We don't actually need them.):
@Options:
From Submit File 1:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2

From Submit File 2:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2


@Files:
From Submit File 1:
st/ert/variants/variants5.c

From Submit File 2:
st/ert/variants/variants6.c

@Comments:
From submit File 1:
locking before making changes

From submit file 2:
Unlocking before making changes

@Related_Records:
From submit File 1:
987654

From submit File 2:
123456

@Code_Reviewers:
From submit File 1:
testman2

From submit File 2:
testman3

Geck_login: testman

Open in new window



When we have this output, I will pass them to another code and log this broken down information into another file separately. That's why I need to know which information belongs to which submit file.

Note: There can be any number of Submit Files in one text file.

I would prefer to have only one Options, Files, Comments, Related_Records, Code_Reviewers arrays and manipulate this data inside these arrays for different information.

Let's say:
The first element of Options should only include options from Submit File 1.
The second element of Options should only include options from Submit File 2.

same thing for the list of files and others.

What i mean is we don't need to put every option or file or others in one element of an array. Same group of information from same Submit file should be in the same array element. Then I can dump this information anywhere I want without causing confusion

I hope this explains everything clearly.

Thanks,


0
Comment
Question by:Tolgar
  • 14
  • 8
22 Comments
 
LVL 9

Expert Comment

by:parparov
Comment Utility
Allow me to clarify:

There's one Files, Comments, Options etc. entry per 'submit file' section?
The user is read from TEST= at the beginning or from sandbox location?
Because sandbox location may return, theoretically, different users.

0
 

Author Comment

by:Tolgar
Comment Utility
yes, there is one files, comments, option etc entry per submit file section.

Well, to read the user from the beginning of the file is more reliable but I would prefer to keep the sandbox location for now. If possible please also read it from the beginning. But I would like you to comment it out for now. I guess they will be in the same section of the code.

thanks,

0
 
LVL 9

Expert Comment

by:parparov
Comment Utility
Here is a reworked code.
The return data structure has been changed. Please study the examples of data accessing.
#!/usr/bin/perl

use strict;
use warnings;

our @HEADERS = ("GeckLogin", "Options", "Files", "Comments", "RelatedRecords", "CodeReviewers");
# a prototype for convenience)
sub print_data1 ($);
sub print_data2 ($);

my $data = submitFileParser(shift @ARGV);
my $geckLogin;
use Data::Dumper;
# A look at the data
print Dumper $data;

# Examples of accessing data
print_data1($data);
print_data2($data);

sub print_data1 ($) {
	my $data = shift;

	for my $submit (@{$data}) {
		for my $header (@HEADERS) {
			print "$header:\n";
			if ($header eq 'GeckLogin') {
				print "$submit->{$header}\n";
			}
			else {
				print @{$submit->{$header}};
			}
			print "\n";
		}
		print "\n";
	}
}

sub print_data2 ($) {
	my $data = shift;

	for my $header (@HEADERS) {
		if ($header eq 'GeckLogin') {
			print "GeckLogin: $data->[0]{GeckLogin}\n";
			next;
		}
		print "$header:\n";
		for my $i (1..@{$data}) {
			print "From submit file $i\n";
			print @{$data->[$i-1]{$header}};
			print "\n";
		}
		print "\n";
	}
}

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag);

	my $submit_file = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			$submit_file = 1;
			next;
		}
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
			} else {
				$submit_file = 0; # two-line grammar didn't hold
			}
			next;
		}
		if ($submit_file == 2) {
			if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
				# Match the login name in the submit file - if it has not
				# already been done
				$geckLogin ||= $1;
			}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			}
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) &&
				# CS record is the last one, we commit after it
				push(
					@submits,
					{
						"Options"              => [@Options],
						"Files"                => [@Files],
						"Comments"             => [@CS],
						"RelatedRecords"       => [@RR],
						"CodeReviewers"        => [@CR],
						"GeckLogin" 		   => $geckLogin,
					}
				) &&
				((@Options = @Files = @CR = @CS = @RR = ()) || ($submit_file = 0) || 1)
			&& next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
                }
	}
	return \@submits;
}

Open in new window

0
 

Author Comment

by:Tolgar
Comment Utility
Hi,
Thank you for your prompt reply.

Can you please explain me what you mean in these lines?

Line 127
Line 142
Line 144

Note: The order of CR, CS and RR can be anything in the text. You know that right?

Thanks,
0
 
LVL 9

Expert Comment

by:parparov
Comment Utility
No, I assumed the CS: is the last section of a submit. Otherwise I don't see how to get rid of the trailing "Mail sent to:"

Hope this explains lines 127, 142 and 144
0
 

Author Comment

by:Tolgar
Comment Utility
ok. let's put this question for a later discussion.

I have another question. When I debug the code, I did the following.

231:                            my @Options = @{$cache_data}[0]->{Options};
  DB<3> x @Options
  empty array
  DB<4> x @{$cache_data}[0]->{Options}
0  ARRAY(0x15961d0)
   0  "-CJ \"<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>\"\cM\cJ"
   1  "-nowrap\cM\cJ"
   2  "-subject \"Unlocking making changes\"\cM\cJ"
   3  "-KEYWORD1\cM\cJ"
   4  "-KEYWORD2\cM\cJ"

Open in new window


And @Options is empty in my first attempt. But then, when print the the right handside directly it worked. So how can I assign the right handside -which is an array- to a new array -like @Options- ?
0
 
LVL 9

Expert Comment

by:parparov
Comment Utility
You did the dereferencing wrong way.
You need to:
231:                            my @Options = @{$cache_data->[0]{Options}};

Open in new window

0
 

Author Comment

by:Tolgar
Comment Utility
ok.

I have 2 questions:

1-
It worked for all of them except for Related_Records.

This returns the correct data:

@{$cache_data->[0]{RelatedRecords}}

Open in new window


but this one returns empty array:

my @Related_Records = @{$cache_data->[0]{RelatedRecords}}

Open in new window


What am I doing wrong? The only difference is, this data is an integer.

2- Why do I get \cM\cJ at the end of all array elements.

e.g.

 DB<9> x @Comments
0  "Unlocking before making changes\cM\cJ"

Open in new window



Thanks,



0
 

Author Comment

by:Tolgar
Comment Utility
Hi,
For the question which I have asked in ID: 36569304:

Can we change the code in a way that, we don't make any assumption on which one (CR, RR or CS) will be the last field and then "Mail sent to " can be treated as another field like RR, CS or CR.

Then I can just ignore that one when I pass them to another code.

Can we do that?

Thanks,

0
 
LVL 9

Expert Comment

by:parparov
Comment Utility
Yes, we can do that. I'll post updatyed code later.
\cM\cJ is the carriage return+newline display
0
 

Author Comment

by:Tolgar
Comment Utility
Hi,
When I log this information to a text file, is \cM\cJ going to be seen or are they gonna be processed?

I am waiting for your updated code.

Thanks,
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:Tolgar
Comment Utility
hi,
I wonder if you would be able to post the updated code till Sunday morning.

Thanks,

0
 
LVL 9

Accepted Solution

by:
parparov earned 500 total points
Comment Utility
This code works with your example input, including related records I am testing explicitly:
#!/usr/bin/perl

use strict;
use warnings;

our @HEADERS = ("GeckLogin", "Options", "Files", "Comments", "RelatedRecords", "CodeReviewers", "Mail sent to");
# a prototype for convenience)
sub print_data1 ($);
sub print_data2 ($);
sub submitFileParser($);

my $data = submitFileParser(shift @ARGV);
my $geckLogin;
use Data::Dumper;
# A look at the data
print Dumper $data;

# Examples of accessing data
print_data1($data);
print "++++++++++++++++++++\n";
print_data2($data);
print "++++++++++++++++++++\n";

my @rr = @{$data->[0]{RelatedRecords}};
print Dumper \@rr;
print Dumper $data->[0]{RelatedRecords};

sub print_data1 ($) {
	my $data = shift;

	for my $submit (@{$data}) {
		for my $header (@HEADERS) {
			print "$header:\n";
			if ($header eq 'GeckLogin') {
				print "$submit->{$header}\n";
			}
			else {
				print @{$submit->{$header}};
			}
			print "\n";
		}
		print "\n";
	}
}

sub print_data2 ($) {
	my $data = shift;

	for my $header (@HEADERS) {
		if ($header eq 'GeckLogin') {
			print "GeckLogin: $data->[0]{GeckLogin}\n";
			next;
		}
		print "$header:\n";
		for my $i (1..@{$data}) {
			print "From submit file $i\n";
			print @{$data->[$i-1]{$header}};
			print "\n";
		}
		print "\n";
	}
}

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options, @Mailsent);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag, $mail_sent_to_flag);

	my $submit_file = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			# We record the accumulated data:
			push(
				@submits,
				{
					"Options"              => [@Options],
					"Files"                => [@Files],
					"Comments"             => [@CS],
					"RelatedRecords"       => [@RR],
					"CodeReviewers"        => [@CR],
					"GeckLogin" 		   => $geckLogin,
					"Mail sent to"         => [@Mailsent],
				}
			) if @Files;
			@Options = @Files = @CR = @CS = @RR = ();
			$submit_file = 1;
			next;
		}
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
				$mail_sent_to_flag = 0;
			} else {
				$submit_file = 0; # two-line grammar didn't hold
			}
			next;
		}
		if ($submit_file == 2) {
			if ($mail_sent_to_flag) {
				push(@Mailsent, $_);
				next;
			}
			if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
				# Match the login name in the submit file - if it has not
				# already been done
				$geckLogin ||= $1;
			}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			}
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			if (/^Mail sent to/) {
				$mail_sent_to_flag = 1;
				push(@Mailsent, $_);
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
        }
	}
	push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
		}
	) if @Files;
	return \@submits;
}

Open in new window

0
 

Author Comment

by:Tolgar
Comment Utility
Hi,
This is works perfect.

I remember a discussion before but I couldn't find the answer in the discussions. So, the dicussion was about the line endings in Windows and in Unix.

My question is:

Can this code parse text files that are created both in Unix and Windows? Because they will have different line endings.

Thanks,



0
 

Author Comment

by:Tolgar
Comment Utility
Hi,
How can I get the length the of $data in your code?

Because, for the length of it, I will loop through its contents.

Thanks,

0
 

Author Comment

by:Tolgar
Comment Utility
Let me clarify the last question:

$data in our case has two parts. One is from the first submit file group and the second one is from the second submit file group.

So I should get "2" as result of this command.

Thanks,

0
 
LVL 9

Assisted Solution

by:parparov
parparov earned 500 total points
Comment Utility
The length of data is
my $data_length = scalar @{$data}

Open in new window

gives number of elements in the list
my $data_largest_index = $#{$data}

Open in new window

gives the last index ($data_length-1) in the list.

This code preserves the line endings as they are, they do not affect the code.
The files on Windows usually have a carriage return ("\r" or ^M) at the end in addition to newline. You can get rid of these chars, for example, by using utility dos2unix (or add them by using unix2dos) in linux.
0
 

Author Closing Comment

by:Tolgar
Comment Utility
perfect solution!!!
0
 

Author Comment

by:Tolgar
Comment Utility
I have a follow up question:

ID:27331899


Thanks,
0
 

Author Comment

by:Tolgar
Comment Utility
@parparov:

Can you please expain me what this means? Especially, why we say if @Files; at the end.

push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
			"NoSubmitFileFlag"     => $noSubmitFileFlag,
		}
	) if @Files;
	return \@submits;

Open in new window



Thanks,
0
 
LVL 9

Expert Comment

by:parparov
Comment Utility
It means to push something only if some actual files were encountered. Otherwise it unconditionally push empty arrays into the resulting data structures.
0
 

Author Comment

by:Tolgar
Comment Utility
Thanks for the clarification
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now