Solved

How to search multiple criteria in a record using Perl regex?

Posted on 2011-02-20
13
1,018 Views
Last Modified: 2012-05-11
I would like to search multiple criteria in a record using Perl regular expression.
For example, I have the following record

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd


I would like to search all records that start with @text and the number line 1. The problem is that there are variable lines between the two criteria of @text and 1.

I started out with the following code but not working...

#!/usr/bin/perl

use strict;
use warnings;

# make it easy to change delimiters to whatever you want
my $delim = '|';

open FH, '<', 'scsi_test' or die $!;

$/='__Data__';


while(<FH>)
{

            if(/(^\@..*[^\n]*)\n(.*[^\n]*)\n(.*[^\n]*)\n(^1\..*[^\n]*)/ms)
            {
               print $1,$4"\n";
            }
}

Problem, its only printing out the first line and not going through the entire log file
0
Comment
Question by:areyouready344
  • 6
  • 5
  • 2
13 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34937892
Is there a __Data__ tag around every set of lines?  If not, one problem is you are slurping in more than one set with each go through the loop.

Will there always be two lines between?  You say "variable lines" but don't specify if that is variable content or variable number of lines (I'm assuming you mean the latter).
0
 

Author Comment

by:areyouready344
ID: 34937932
Yes wilcoxon, there will be variable lines between the criteria lines of ^@ and ^1\. , and yes, each data record now has __Data__
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34937948
If each set of lines now has __Data__, that makes it much easier.

Changing your regex to the below should work:

if(/(^\@..*[^\n]*).*(^1\..*[^\n]*)/ms)
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 

Author Comment

by:areyouready344
ID: 34938093
Tried it but it does not only the @text and 1. lines

it prints out everything...

#!/usr/bin/perl

use strict;
use warnings;

open FH, '<', 'dd' or die $!;


$/='__Data__';


while(<FH>)
{

           if(/(^\@..*[^\n]*).*(^1\..*[^\n]*)/ms)
            {
               print $1,$2,"\n";
            }
}


The output is, it never filters.

@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd


Here's what I like the output to be like

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34938250
while ( <> )
{
	if ( /^\@test/ ) { $testLine = $_; }
	elsif ( /^1\./ )
	{
		print $testLine;
		print $_;
		print "\n";
	}
}

Open in new window


C:\temp>perl foo.pl foo.txt
@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34938354
I missed that you had .* as well as [^\n]* - they are redundant (and the first is causing the problem).

if(/(^\@.[^\n]*).*(^1\.[^\n]*)/ms)

will fix the regex and

print $1,"\n",$2,"\n\n";

will make the output match the format you want.
0
 

Author Comment

by:areyouready344
ID: 34938471
I hoping it would work with the input record separator as your solution works without using the input record separator.
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 34938516
My last comment should work (tested) and still uses the record input separator.

I've included a full copy of the code below (rather than the previous comments on how to change it).
#!/usr/bin/perl

use strict;
use warnings;

open FH, '<', 'dd' or die $!;

$/='__Data__';

while(<FH>)
{
           if(/(^\@.[^\n]*).*(^1\.[^\n]*)/ms)
            {
               print $1,"\n",$2,"\n\n";
            }
}

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34938522
Oops.  One more minor change.  It looks like there's an extra . in the regex (which shouldn't cause any issues) but it should be:

if(/(^\@[^\n]*).*(^1\.[^\n]*)/ms)
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34938565
I retract my solution - wilcoxon's is much better.
0
 

Author Comment

by:areyouready344
ID: 34939593
right on the momey Wilcoxon, how do you know this.... thanks for understanding and solving this issue. Now I know how to filter certain lines in a multiple line record. Now I can build any type of html table on any type of record line. This is powerful.

Thanks again...
0
 

Author Closing Comment

by:areyouready344
ID: 34939598
solution worked great, no problems.
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 34939689
It's just a matter of experience.  You'll get there someday if you keep programming in perl.
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now