Solved

How to parse a text file for specific arguments in Perl?

Posted on 2011-09-26
19
194 Views
Last Modified: 2012-06-27
Hi,
This is a follow up question for ID: 27316317

In the beginning of my text file, I have a part which is seen as below:

Original arguments:
        -t
        Atk
        -F
        20110914.submit
        -KEYWORD1
Currently $_='154551'

Open in new window


In this part of the file I would like to detect the following:

1- what comes after the line "-t'
Answer: Atk

2- What is $_ equal to?
Answer: 154551

3- Check if there is the word "KEYWORD1". If there is assign true to a variable?

How can I do that?

Thanks,
0
Comment
Question by:Tolgar
  • 13
  • 5
19 Comments
 
LVL 9

Expert Comment

by:parparov
ID: 36654373
"Original arguments:" is a fixed grammar?
0
 

Author Comment

by:Tolgar
ID: 36667118
Yes,
and also

Currently $_

is fixed grammar.

Thanks,
0
 
LVL 84

Expert Comment

by:ozo
ID: 36707744
while( <DATA> ){
    $after = $_ if (/-t/..0)==2;
    $equal = $_ if /\$_='?(\w+)/;                                                                                        
}                                                                                                                        
print "Answer: $after\n";                                                                                                
print "Answer: $equal\n";                                                                                                
__DATA__                                                                                                                  
Original arguments:                                                                                                      
        -t                                                                                                                
        Atk                                                                                                              
        -F                                                                                                                
        20110914.submit                                                                                                  
        -KEYWORD1                                                                                                        
Currently $_='154551'    
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 

Author Comment

by:Tolgar
ID: 36709651
@parparov: I am still waiting for your approach, combined with your previous complete code.

@ozo: thanks.


0
 

Author Comment

by:Tolgar
ID: 36711129
@ parparov: One quick question referencing to my previous posts:

How can I assign this to a string rather than array?

my @Code_Reviewers = @{$cache_data->[i]{CodeReviewers}};

Open in new window

0
 

Author Comment

by:Tolgar
ID: 36711135
@parparov: btw, I am still waiting for your reply for the main question of this post.

Thanks,
0
 
LVL 9

Expert Comment

by:parparov
ID: 36711908
A single code reviewer can be accessed as:
my $Code_Reviewer = $cache_data->[0]{CodeReviewers}[0];

Open in new window


Indexes in the example are arbitrary.
0
 
LVL 9

Accepted Solution

by:
parparov earned 500 total points
ID: 36711912
Code for the original question:
#!/usr/bin/perl

use strict;
use warnings;

my @read_data = <DATA>;

sub parse_data (@) {
	my $arg_flag = 0;
	my $parsed_data = {};
	my $current_option;

	while (my $line = shift @_) {
		if ($arg_flag == 1) {
			if ($line =~ /^Currently (\$\_=.*)/) {
				local $_;
				eval "$1;";
				$parsed_data->{dollar_} = $_;
				$arg_flag = 0;
			}
			elsif ($line =~ /^\s+\-(.*)/) {
				$current_option = $1;
				$parsed_data->{$current_option} = undef;
				next;
			}
			elsif ($current_option && $line =~ /^\s+(.*)/) {
				$parsed_data->{$current_option} = $1;
				$current_option = undef;
			}
		}
		else {
			if ($line =~ /^Original arguments:/) {
				$arg_flag = 1;
				next;
			}
		}
		
	}
	return $parsed_data;
}

my $dollar_;
my $parsed_data = parse_data(@read_data);

print "KEYWORD1 exists\n" if exists $parsed_data->{KEYWORD1};
print "t is $parsed_data->{t}\n" if $parsed_data->{t};
print "\$\_ is $parsed_data->{dollar_}\n" if defined $parsed_data->{dollar_};

__DATA__
Original arguments:
        -t
        Atk
        -F
        20110914.submit
        -KEYWORD1
Currently $_='154551'

Open in new window

0
 

Author Comment

by:Tolgar
ID: 36713318
@parparov: In line 46, you used "t" but I don't see any t in the code.

Is that right?

Thanks,
0
 
LVL 9

Expert Comment

by:parparov
ID: 36713346
I don't parse for specific option keywords, but for any sequence starting with - which is considered a key.
So, you're right.
0
 

Author Comment

by:Tolgar
ID: 36713445
@parparov: I am little confused.

So, when I insert this code into my other code, am I gonna be able to get the following?

Atk
KEYWORD1
154551

Open in new window



Thanks,
0
 
LVL 9

Expert Comment

by:parparov
ID: 36713463
Yes.
Lines 45-47 demonstrate how.
0
 

Author Comment

by:Tolgar
ID: 36818223
@parparov: I combined the previous code and this one.

Now i have two issues.

1- While I was debugging, I realized that a line 30 we check if there is a submit file for every paragraph. So even though, there is a "Submit File" keyword in the entire document, we cannot detect now because we don't check for this string in the entire document at once. This causes a confusion because then code assumes there is no submit file and goes to line 97. However, we should do a global check once and if there is one match for the keyword "Submit File" and the "=" signs afterwards then we don't need check again and again for the same document.

2- Another issue is, $parsedData cannot be reachable at line 114. How can make it reachable at this point of the code?

sub submitFileParser ($) {
	my $filename = shift;
	my @paragraphs;
#	local($/) = '';
	open( FILE, "< $filename" ) or die "Can't open $filename : $!";
	@paragraphs = <FILE>;
	close FILE;
	return read_paragraphs (@paragraphs);
}

sub read_paragraphs (@) {
	# read lines as parameters
	my @rippedParagraphs = @_;
	my @submits = ();
	# Storage for all sections
	# Temporary storages for single section of each type
	my (@Files, @CR, @RR, @CS, @Options, @Mailsent);
	# Flags for file traversal logic
	my ($opt_flag, $file_flag, $mail_sent_to_flag);

	my $submit_file = 0;
	my $nosubmitFileFlag = 0;
	#read the file
	for ( @rippedParagraphs ) {
		if (/^USER=(\S+)\,/) {
			#obtain the login from USER=
			$geckLogin = $1;
		}
		if (/^\s*Submit\s+file\s*$/) {
			# We record the accumulated data:
			push(
				@submits,
				{
					"Options"              => [@Options],
					"Files"                => [@Files],
					"Comments"             => [@CS],
					"RelatedRecords"       => [@RR],
					"CodeReviewers"        => [@CR],
					"GeckLogin" 		   => $geckLogin,
					"NoSubmitFileFlag"     => $nosubmitFileFlag,
					"Mail sent to"         => [@Mailsent],
				}
			) if @Files;
			@Options = @Files = @CR = @CS = @RR = ();
			$submit_file = 1;
			next;
		}
		
		if ($submit_file == 1) {
			if (/^\s*\=+\s*$/) {
				$submit_file++;
				$mail_sent_to_flag = 1;
			} 
			# if ($submit_file == 2) {
				
				#if ($mail_sent_to_flag) {
				push(@Mailsent, $_);
				#next;
				#}
			# If we encounter a comment or empty string
			if (/^\#/ || !/\S/) {
				# we haven't encountered an option to start doing anything
				next unless $opt_flag || $file_flag;
				# If we're done with options, let's start reading file sections
				if ($opt_flag == 1) {
					$opt_flag = 0;
					$file_flag = 1;
				}
				elsif ($opt_flag > 1) {
					# Addresses the empty line within Options:
					$opt_flag--;
				}
				next;
			} 
			if (/^Options/) {
				# We start reading options
				$opt_flag = 2;
				next;
			}
			if (/^Mail sent to/) {
				$mail_sent_to_flag = 1;
				push(@Mailsent, $_);
				next;
			}
			# Matching beginning of the line to determine the type of the string
			# and placing it in temporary storage
			/^R(R|elated\sRecords):\s*(.*\n)/ && push(@RR, $2) && next;
			/^C(R|ode\sReviewer):\s*(.*\n)/ && push(@CR, $2) && next;
			/^C(S|omments):\s*(.*\n)/ && push(@CS, $2) && next;

			# General text is either files or options info, depending on the
			# value of the option flag
			$opt_flag ? push(@Options, $_) : push(@Files, $_);
			#}
		}
		else {
			$submit_file = 0; # two-line grammar didn't hold
			my $parsedData = parseWithoutSubmitFile(@rippedParagraphs);
			#submit file does not exist flag
			$nosubmitFileFlag = 1;
		}
		}
	
	push(
		@submits,
		{
			"Options"              => [@Options],
			"Files"                => [@Files],
			"Comments"             => [@CS],
			"RelatedRecords"       => [@RR],
			"CodeReviewers"        => [@CR],
			"Mail sent to"         => [@Mailsent],
			"GeckLogin" 		   => $geckLogin,
			"ParsedData"		   => $parsedData,
			"NoSubmitFileFlag"     => $nosubmitFileFlag,
			"Cluster"              => $parsedData->{t},
			"JobID"                => $parsedData->{dollar_},
			"gLogFiles" 		   => $parsedData->{GLOGFILES},
			"gLogSbcheck"          => $parsedData->{GLOGSBCHECK},
		}
	) if @Files;
	return \@submits;
}


# we parse token differently if user makes the submission without submit file
sub parseWithoutSubmitFile (@) {
	my $arg_flag = 0;
	my $parsedData = {};
	my $current_option;
		while (my $line = shift @_) {
		if ($arg_flag == 1) {
		if ($line =~ /^Currently (\$\_=.*)/) {
		local $_;
		eval "$1;";
		$parsedData->{dollar_} = $_;
		$arg_flag = 0;
		}
		elsif ($line =~ /^\s+\-(.*)/) {
		$current_option = $1;
		$parsedData->{$current_option} = undef;
		next;
		}
		elsif ($current_option && $line =~ /^\s+(.*)/) {
		$parsedData->{$current_option} = $1;
		$current_option = undef;
		}
		}
		else {
		if ($line =~ /^Original arguments:/) {
		$arg_flag = 1;
		next;
		}
		}
	}
return $parsedData;
}

Open in new window



Please let me know ASAP if the questions are not clear.

Thanks,
0
 

Author Comment

by:Tolgar
ID: 36818566
@parparov: I think this opens a new topic. So I created a new question as a follow up

ID: 27362613

Thanks,
0
 

Author Comment

by:Tolgar
ID: 36819041
ok I resolved the issue I asked.

Thanks,
0
 

Author Comment

by:Tolgar
ID: 37007867
@parparov: Can you please put comments in each line for the code that I accepted as the solution.

Thanks,
0
 

Author Comment

by:Tolgar
ID: 37018163
@Parparov: Hi, are you gonna be able to put some comments for your code that I accepted as the solution?

Thanks,
0
 

Author Comment

by:Tolgar
ID: 37018678
@parparov: Especially, what does this line do?

eval "$1;";

Open in new window


Thanks,
0
 

Author Comment

by:Tolgar
ID: 37018745
@parparov: In the following link, this useage is not recommended.

http://cpan.uwinnipeg.ca/htdocs/Perl-Critic/Perl/Critic/Policy/BuiltinFunctions/ProhibitStringyEval.pm.html

But, when I use

eval {$1;};

Open in new window


It does not do what I want.

Do you have any idea?

Thanks,
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question