Tolgar
asked on
How to break down information in an array which we read from a text file in PERL?
Follow up question for ID: 27312349
As I explained in the last post of the reference post, I would like to break the following per entry:
Options
Files
Comments
Related_Records
Code_Reviewers
The Geck_Login will be same for all entries because we capture it in the beginning of the file.
If we look at the text file again:
I would like to have the following in arrays (From Submit File 1 and From Submit File 2 texts are only for clarification. We don't actually need them.):
When we have this output, I will pass them to another code and log this broken down information into another file separately. That's why I need to know which information belongs to which submit file.
Note: There can be any number of Submit Files in one text file.
I would prefer to have only one Options, Files, Comments, Related_Records, Code_Reviewers arrays and manipulate this data inside these arrays for different information.
Let's say:
The first element of Options should only include options from Submit File 1.
The second element of Options should only include options from Submit File 2.
same thing for the list of files and others.
What i mean is we don't need to put every option or file or others in one element of an array. Same group of information from same Submit file should be in the same array element. Then I can dump this information anywhere I want without causing confusion
I hope this explains everything clearly.
Thanks,
As I explained in the last post of the reference post, I would like to break the following per entry:
Options
Files
Comments
Related_Records
Code_Reviewers
The Geck_Login will be same for all entries because we capture it in the beginning of the file.
If we look at the text file again:
USER=testman, HOST=testman-deb6-64, ARCH=glnxa64
Revisions: /st/hub/share/apps/bat//share/mmit: 07/26-09:48:58; csubmitItem.pm: 2011/07/26-09:48:56
Original arguments:
-t
Atk
-F
20110914.submit
Currently $_='154551'
main:/st/hub/share/apps/bat/bat2.15.17/share/../lib/csubmitCache.pm:44 called main::submissionHistory
main:/st/hub/share/apps/bat/bat2.15.17/share/submit:3871 called main::CreateCacheFile
Current directory ($PWD) = /st/devel/sandbox/testman/Aslrtw
Submit file
===========================
# Component : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for : 2000
#
# Description:
# Unlocking making changes
#
# Documentation impact:
# None
#
# QE items:
# None
#
# Type of change:
# Unlocking making changes
#
# submit file for use with msubmit. To use run the command
# submit -F 24.submit
# or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk/glnxa64'>/sandbox/testman/Atk_ests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "locking making changes"
-KEYWORD1
-KEYWORD2
st/ert/variants/variants5.c
CR: testman2
RR: 987654
CS: locking before making changes
Mail sent to:
st.devel.submit: Unlocking making changes
Files:
st/ert/variants/variants5.c
Submit file
===========================
# Component : Coder
# Sandbox location : /st/devel/sandbox/testman/Atk
# Submission for : 2000
#
# Description:
# Unlocking making changes
#
# Documentation impact:
# None
#
# QE items:
# None
#
# Type of change:
# Unlocking making changes
#
# submit file for use with msubmit. To use run the command
# submit -F 14.submit
# or use C-c C-c from emacs to run this command.
# "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
# "No need for sbruntests: Interactive Tests Update"
Options:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2
st/ert/variants/variants6.c
CR: testman3
RR: 123456
CS: Unlocking before making changes
Mail sent to:
st.devel.submit: Unlocking making changes
Files:
st/ert/variants/variants5.c
I would like to have the following in arrays (From Submit File 1 and From Submit File 2 texts are only for clarification. We don't actually need them.):
@Options:
From Submit File 1:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2
From Submit File 2:
-CJ "<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>"
-nowrap
-subject "Unlocking making changes"
-KEYWORD1
-KEYWORD2
@Files:
From Submit File 1:
st/ert/variants/variants5.c
From Submit File 2:
st/ert/variants/variants6.c
@Comments:
From submit File 1:
locking before making changes
From submit file 2:
Unlocking before making changes
@Related_Records:
From submit File 1:
987654
From submit File 2:
123456
@Code_Reviewers:
From submit File 1:
testman2
From submit File 2:
testman3
Geck_login: testman
When we have this output, I will pass them to another code and log this broken down information into another file separately. That's why I need to know which information belongs to which submit file.
Note: There can be any number of Submit Files in one text file.
I would prefer to have only one Options, Files, Comments, Related_Records, Code_Reviewers arrays and manipulate this data inside these arrays for different information.
Let's say:
The first element of Options should only include options from Submit File 1.
The second element of Options should only include options from Submit File 2.
same thing for the list of files and others.
What i mean is we don't need to put every option or file or others in one element of an array. Same group of information from same Submit file should be in the same array element. Then I can dump this information anywhere I want without causing confusion
I hope this explains everything clearly.
Thanks,
ASKER
yes, there is one files, comments, option etc entry per submit file section.
Well, to read the user from the beginning of the file is more reliable but I would prefer to keep the sandbox location for now. If possible please also read it from the beginning. But I would like you to comment it out for now. I guess they will be in the same section of the code.
thanks,
Well, to read the user from the beginning of the file is more reliable but I would prefer to keep the sandbox location for now. If possible please also read it from the beginning. But I would like you to comment it out for now. I guess they will be in the same section of the code.
thanks,
Here is a reworked code.
The return data structure has been changed. Please study the examples of data accessing.
The return data structure has been changed. Please study the examples of data accessing.
#!/usr/bin/perl
use strict;
use warnings;
our @HEADERS = ("GeckLogin", "Options", "Files", "Comments", "RelatedRecords", "CodeReviewers");
# a prototype for convenience)
sub print_data1 ($);
sub print_data2 ($);
my $data = submitFileParser(shift @ARGV);
my $geckLogin;
use Data::Dumper;
# A look at the data
print Dumper $data;
# Examples of accessing data
print_data1($data);
print_data2($data);
sub print_data1 ($) {
my $data = shift;
for my $submit (@{$data}) {
for my $header (@HEADERS) {
print "$header:\n";
if ($header eq 'GeckLogin') {
print "$submit->{$header}\n";
}
else {
print @{$submit->{$header}};
}
print "\n";
}
print "\n";
}
}
sub print_data2 ($) {
my $data = shift;
for my $header (@HEADERS) {
if ($header eq 'GeckLogin') {
print "GeckLogin: $data->[0]{GeckLogin}\n";
next;
}
print "$header:\n";
for my $i (1..@{$data}) {
print "From submit file $i\n";
print @{$data->[$i-1]{$header}};
print "\n";
}
print "\n";
}
}
sub submitFileParser ($) {
my $filename = shift;
my @paragraphs;
# local($/) = '';
open( FILE, "< $filename" ) or die "Can't open $filename : $!";
@paragraphs = <FILE>;
close FILE;
return read_paragraphs (@paragraphs);
}
sub read_paragraphs (@) {
# read lines as parameters
my @rippedParagraphs = @_;
my @submits = ();
# Storage for all sections
# Temporary storages for single section of each type
my (@Files, @CR, @RR, @CS, @Options);
# Flags for file traversal logic
my ($opt_flag, $file_flag);
my $submit_file = 0;
#read the file
for ( @rippedParagraphs ) {
if (/^USER=(\S+)\,/) {
#obtain the login from USER=
$geckLogin = $1;
}
if (/^\s*Submit\s+file\s*$/) {
$submit_file = 1;
next;
}
if ($submit_file == 1) {
if (/^\s*\=+\s*$/) {
$submit_file++;
} else {
$submit_file = 0; # two-line grammar didn't hold
}
next;
}
if ($submit_file == 2) {
if (m|^\#\s*Sandbox\s+location\s*\:\s*\S*/sandbox/(.*?)/|) {
# Match the login name in the submit file - if it has not
# already been done
$geckLogin ||= $1;
}
# If we encounter a comment or empty string
if (/^\#/ || !/\S/) {
# we haven't encountered an option to start doing anything
next unless $opt_flag || $file_flag;
# If we're done with options, let's start reading file sections
if ($opt_flag == 1) {
$opt_flag = 0;
$file_flag = 1;
}
elsif ($opt_flag > 1) {
# Addresses the empty line within Options:
$opt_flag--;
}
next;
}
if (/^Options/) {
# We start reading options
$opt_flag = 2;
next;
}
# Matching beginning of the line to determine the type of the string
# and placing it in temporary storage
/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) &&
# CS record is the last one, we commit after it
push(
@submits,
{
"Options" => [@Options],
"Files" => [@Files],
"Comments" => [@CS],
"RelatedRecords" => [@RR],
"CodeReviewers" => [@CR],
"GeckLogin" => $geckLogin,
}
) &&
((@Options = @Files = @CR = @CS = @RR = ()) || ($submit_file = 0) || 1)
&& next;
# General text is either files or options info, depending on the
# value of the option flag
$opt_flag ? push(@Options, $_) : push(@Files, $_);
}
}
return \@submits;
}
ASKER
Hi,
Thank you for your prompt reply.
Can you please explain me what you mean in these lines?
Line 127
Line 142
Line 144
Note: The order of CR, CS and RR can be anything in the text. You know that right?
Thanks,
Thank you for your prompt reply.
Can you please explain me what you mean in these lines?
Line 127
Line 142
Line 144
Note: The order of CR, CS and RR can be anything in the text. You know that right?
Thanks,
No, I assumed the CS: is the last section of a submit. Otherwise I don't see how to get rid of the trailing "Mail sent to:"
Hope this explains lines 127, 142 and 144
Hope this explains lines 127, 142 and 144
ASKER
ok. let's put this question for a later discussion.
I have another question. When I debug the code, I did the following.
And @Options is empty in my first attempt. But then, when print the the right handside directly it worked. So how can I assign the right handside -which is an array- to a new array -like @Options- ?
I have another question. When I debug the code, I did the following.
231: my @Options = @{$cache_data}[0]->{Options};
DB<3> x @Options
empty array
DB<4> x @{$cache_data}[0]->{Options}
0 ARRAY(0x15961d0)
0 "-CJ \"<a href='http://www-sandbox/testman/Atk_tests/glnxa64'>/sandbox/testman/Atk_tests/glnxa64</a>\"\cM\cJ"
1 "-nowrap\cM\cJ"
2 "-subject \"Unlocking making changes\"\cM\cJ"
3 "-KEYWORD1\cM\cJ"
4 "-KEYWORD2\cM\cJ"
And @Options is empty in my first attempt. But then, when print the the right handside directly it worked. So how can I assign the right handside -which is an array- to a new array -like @Options- ?
You did the dereferencing wrong way.
You need to:
You need to:
231: my @Options = @{$cache_data->[0]{Options}};
ASKER
ok.
I have 2 questions:
1-
It worked for all of them except for Related_Records.
This returns the correct data:
but this one returns empty array:
What am I doing wrong? The only difference is, this data is an integer.
2- Why do I get \cM\cJ at the end of all array elements.
e.g.
Thanks,
I have 2 questions:
1-
It worked for all of them except for Related_Records.
This returns the correct data:
@{$cache_data->[0]{RelatedRecords}}
but this one returns empty array:
my @Related_Records = @{$cache_data->[0]{RelatedRecords}}
What am I doing wrong? The only difference is, this data is an integer.
2- Why do I get \cM\cJ at the end of all array elements.
e.g.
DB<9> x @Comments
0 "Unlocking before making changes\cM\cJ"
Thanks,
ASKER
Hi,
For the question which I have asked in ID: 36569304:
Can we change the code in a way that, we don't make any assumption on which one (CR, RR or CS) will be the last field and then "Mail sent to " can be treated as another field like RR, CS or CR.
Then I can just ignore that one when I pass them to another code.
Can we do that?
Thanks,
For the question which I have asked in ID: 36569304:
Can we change the code in a way that, we don't make any assumption on which one (CR, RR or CS) will be the last field and then "Mail sent to " can be treated as another field like RR, CS or CR.
Then I can just ignore that one when I pass them to another code.
Can we do that?
Thanks,
Yes, we can do that. I'll post updatyed code later.
\cM\cJ is the carriage return+newline display
\cM\cJ is the carriage return+newline display
ASKER
Hi,
When I log this information to a text file, is \cM\cJ going to be seen or are they gonna be processed?
I am waiting for your updated code.
Thanks,
When I log this information to a text file, is \cM\cJ going to be seen or are they gonna be processed?
I am waiting for your updated code.
Thanks,
ASKER
hi,
I wonder if you would be able to post the updated code till Sunday morning.
Thanks,
I wonder if you would be able to post the updated code till Sunday morning.
Thanks,
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi,
This is works perfect.
I remember a discussion before but I couldn't find the answer in the discussions. So, the dicussion was about the line endings in Windows and in Unix.
My question is:
Can this code parse text files that are created both in Unix and Windows? Because they will have different line endings.
Thanks,
This is works perfect.
I remember a discussion before but I couldn't find the answer in the discussions. So, the dicussion was about the line endings in Windows and in Unix.
My question is:
Can this code parse text files that are created both in Unix and Windows? Because they will have different line endings.
Thanks,
ASKER
Hi,
How can I get the length the of $data in your code?
Because, for the length of it, I will loop through its contents.
Thanks,
How can I get the length the of $data in your code?
Because, for the length of it, I will loop through its contents.
Thanks,
ASKER
Let me clarify the last question:
$data in our case has two parts. One is from the first submit file group and the second one is from the second submit file group.
So I should get "2" as result of this command.
Thanks,
$data in our case has two parts. One is from the first submit file group and the second one is from the second submit file group.
So I should get "2" as result of this command.
Thanks,
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
perfect solution!!!
ASKER
I have a follow up question:
ID:27331899
Thanks,
ID:27331899
Thanks,
ASKER
@parparov:
Can you please expain me what this means? Especially, why we say if @Files; at the end.
Thanks,
Can you please expain me what this means? Especially, why we say if @Files; at the end.
push(
@submits,
{
"Options" => [@Options],
"Files" => [@Files],
"Comments" => [@CS],
"RelatedRecords" => [@RR],
"CodeReviewers" => [@CR],
"Mail sent to" => [@Mailsent],
"GeckLogin" => $geckLogin,
"NoSubmitFileFlag" => $noSubmitFileFlag,
}
) if @Files;
return \@submits;
Thanks,
It means to push something only if some actual files were encountered. Otherwise it unconditionally push empty arrays into the resulting data structures.
ASKER
Thanks for the clarification
There's one Files, Comments, Options etc. entry per 'submit file' section?
The user is read from TEST= at the beginning or from sandbox location?
Because sandbox location may return, theoretically, different users.