Solved

How to parse a text file for specific arguments in Perl?

Posted on 2011-09-26
19
187 Views
Last Modified: 2012-06-27
Hi,
This is a follow up question for ID: 27316317

In the beginning of my text file, I have a part which is seen as below:

Original arguments:
        -t
        Atk
        -F
        20110914.submit
        -KEYWORD1
Currently $_='154551'

Open in new window


In this part of the file I would like to detect the following:

1- what comes after the line "-t'
Answer: Atk

2- What is $_ equal to?
Answer: 154551

3- Check if there is the word "KEYWORD1". If there is assign true to a variable?

How can I do that?

Thanks,
0
Comment
Question by:Tolgar
  • 13
  • 5
19 Comments
 
LVL 9

Expert Comment

by:parparov
ID: 36654373
"Original arguments:" is a fixed grammar?
0
 

Author Comment

by:Tolgar
ID: 36667118
Yes,
and also

Currently $_

is fixed grammar.

Thanks,
0
 
LVL 84

Expert Comment

by:ozo
ID: 36707744
while( <DATA> ){
    $after = $_ if (/-t/..0)==2;
    $equal = $_ if /\$_='?(\w+)/;                                                                                        
}                                                                                                                        
print "Answer: $after\n";                                                                                                
print "Answer: $equal\n";                                                                                                
__DATA__                                                                                                                  
Original arguments:                                                                                                      
        -t                                                                                                                
        Atk                                                                                                              
        -F                                                                                                                
        20110914.submit                                                                                                  
        -KEYWORD1                                                                                                        
Currently $_='154551'    
0
 

Author Comment

by:Tolgar
ID: 36709651
@parparov: I am still waiting for your approach, combined with your previous complete code.

@ozo: thanks.


0
 

Author Comment

by:Tolgar
ID: 36711129
@ parparov: One quick question referencing to my previous posts:

How can I assign this to a string rather than array?

my @Code_Reviewers = @{$cache_data->[i]{CodeReviewers}};

Open in new window

0
 

Author Comment

by:Tolgar
ID: 36711135
@parparov: btw, I am still waiting for your reply for the main question of this post.

Thanks,
0
 
LVL 9

Expert Comment

by:parparov
ID: 36711908
A single code reviewer can be accessed as:
my $Code_Reviewer = $cache_data->[0]{CodeReviewers}[0];

Open in new window


Indexes in the example are arbitrary.
0
 
LVL 9

Accepted Solution

by:
parparov earned 500 total points
ID: 36711912
Code for the original question:
#!/usr/bin/perl

use strict;
use warnings;

my @read_data = <DATA>;

sub parse_data (@) {
	my $arg_flag = 0;
	my $parsed_data = {};
	my $current_option;

	while (my $line = shift @_) {
		if ($arg_flag == 1) {
			if ($line =~ /^Currently (\$\_=.*)/) {
				local $_;
				eval "$1;";
				$parsed_data->{dollar_} = $_;
				$arg_flag = 0;
			}
			elsif ($line =~ /^\s+\-(.*)/) {
				$current_option = $1;
				$parsed_data->{$current_option} = undef;
				next;
			}
			elsif ($current_option && $line =~ /^\s+(.*)/) {
				$parsed_data->{$current_option} = $1;
				$current_option = undef;
			}
		}
		else {
			if ($line =~ /^Original arguments:/) {
				$arg_flag = 1;
				next;
			}
		}
		
	}
	return $parsed_data;
}

my $dollar_;
my $parsed_data = parse_data(@read_data);

print "KEYWORD1 exists\n" if exists $parsed_data->{KEYWORD1};
print "t is $parsed_data->{t}\n" if $parsed_data->{t};
print "\$\_ is $parsed_data->{dollar_}\n" if defined $parsed_data->{dollar_};

__DATA__
Original arguments:
        -t
        Atk
        -F
        20110914.submit
        -KEYWORD1
Currently $_='154551'

Open in new window

0
 

Author Comment

by:Tolgar
ID: 36713318
@parparov: In line 46, you used "t" but I don't see any t in the code.

Is that right?

Thanks,
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 9

Expert Comment

by:parparov
ID: 36713346
I don't parse for specific option keywords, but for any sequence starting with - which is considered a key.
So, you're right.
0
 

Author Comment

by:Tolgar
ID: 36713445
@parparov: I am little confused.

So, when I insert this code into my other code, am I gonna be able to get the following?

Atk
KEYWORD1
154551

Open in new window



Thanks,
0
 
LVL 9

Expert Comment

by:parparov
ID: 36713463
Yes.
Lines 45-47 demonstrate how.
0
 

Author Comment

by:Tolgar
ID: 36818223
@parparov: I combined the previous code and this one.

Now i have two issues.

1- While I was debugging, I realized that a line 30 we check if there is a submit file for every paragraph. So even though, there is a "Submit File" keyword in the entire document, we cannot detect now because we don't check for this string in the entire document at once. This causes a confusion because then code assumes there is no submit file and goes to line 97. However, we should do a global check once and if there is one match for the keyword "Submit File" and the "=" signs afterwards then we don't need check again and again for the same document.

2- Another issue is, $parsedData cannot be reachable at line 114. How can make it reachable at this point of the code?

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options, @Mailsent);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag, $mail_sent_to_flag);

	my $submit_file = 0;
	my $nosubmitFileFlag = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			# We record the accumulated data:
			push(
				@submits,
				{
					"Options"              => [@Options],
					"Files"                => [@Files],
					"Comments"             => [@CS],
					"RelatedRecords"       => [@RR],
					"CodeReviewers"        => [@CR],
					"GeckLogin" 		   => $geckLogin,
					"NoSubmitFileFlag"     => $nosubmitFileFlag,
					"Mail sent to"         => [@Mailsent],
				}
			) if @Files;
			@Options = @Files = @CR = @CS = @RR = ();
			$submit_file = 1;
			next;
		}
		
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
				$mail_sent_to_flag = 1;
			} 
			# if ($submit_file == 2) {
				
				#if ($mail_sent_to_flag) {
				push(@Mailsent, $_);
				#next;
				#}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			} 
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			if (/^Mail sent to/) {
				$mail_sent_to_flag = 1;
				push(@Mailsent, $_);
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
			#}
		}
		else {
			$submit_file = 0; # two-line grammar didn't hold
			my $parsedData = parseWithoutSubmitFile(@rippedParagraphs);
			#submit file does not exist flag
			$nosubmitFileFlag = 1;
		}
		}
	
	push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
			"ParsedData"		   => $parsedData,
			"NoSubmitFileFlag"     => $nosubmitFileFlag,
			"Cluster"              => $parsedData->{t},
			"JobID"                => $parsedData->{dollar_},
			"gLogFiles" 		   => $parsedData->{GLOGFILES},
			"gLogSbcheck"          => $parsedData->{GLOGSBCHECK},
		}
	) if @Files;
	return \@submits;
}


# we parse token differently if user makes the submission without submit file
sub parseWithoutSubmitFile (@) {
	my $arg_flag = 0;
	my $parsedData = {};
	my $current_option;
		while (my $line = shift @_) {
		if ($arg_flag == 1) {
		if ($line =~ /^Currently (\$\_=.*)/) {
		local $_;
		eval "$1;";
		$parsedData->{dollar_} = $_;
		$arg_flag = 0;
		}
		elsif ($line =~ /^\s+\-(.*)/) {
		$current_option = $1;
		$parsedData->{$current_option} = undef;
		next;
		}
		elsif ($current_option && $line =~ /^\s+(.*)/) {
		$parsedData->{$current_option} = $1;
		$current_option = undef;
		}
		}
		else {
		if ($line =~ /^Original arguments:/) {
		$arg_flag = 1;
		next;
		}
		}
	}
return $parsedData;
}

Open in new window



Please let me know ASAP if the questions are not clear.

Thanks,
0
 

Author Comment

by:Tolgar
ID: 36818566
@parparov: I think this opens a new topic. So I created a new question as a follow up

ID: 27362613

Thanks,
0
 

Author Comment

by:Tolgar
ID: 36819041
ok I resolved the issue I asked.

Thanks,
0
 

Author Comment

by:Tolgar
ID: 37007867
@parparov: Can you please put comments in each line for the code that I accepted as the solution.

Thanks,
0
 

Author Comment

by:Tolgar
ID: 37018163
@Parparov: Hi, are you gonna be able to put some comments for your code that I accepted as the solution?

Thanks,
0
 

Author Comment

by:Tolgar
ID: 37018678
@parparov: Especially, what does this line do?

eval "$1;";

Open in new window


Thanks,
0
 

Author Comment

by:Tolgar
ID: 37018745
@parparov: In the following link, this useage is not recommended.

http://cpan.uwinnipeg.ca/htdocs/Perl-Critic/Perl/Critic/Policy/BuiltinFunctions/ProhibitStringyEval.pm.html

But, when I use

eval {$1;};

Open in new window


It does not do what I want.

Do you have any idea?

Thanks,
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now