Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

How to search multiple criteria in a record using Perl regex?

Posted on 2011-02-20
13
Medium Priority
?
1,024 Views
Last Modified: 2012-05-11
I would like to search multiple criteria in a record using Perl regular expression.
For example, I have the following record

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd


I would like to search all records that start with @text and the number line 1. The problem is that there are variable lines between the two criteria of @text and 1.

I started out with the following code but not working...

#!/usr/bin/perl

use strict;
use warnings;

# make it easy to change delimiters to whatever you want
my $delim = '|';

open FH, '<', 'scsi_test' or die $!;

$/='__Data__';


while(<FH>)
{

            if(/(^\@..*[^\n]*)\n(.*[^\n]*)\n(.*[^\n]*)\n(^1\..*[^\n]*)/ms)
            {
               print $1,$4"\n";
            }
}

Problem, its only printing out the first line and not going through the entire log file
0
Comment
Question by:areyouready344
  • 6
  • 5
  • 2
13 Comments
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34937892
Is there a __Data__ tag around every set of lines?  If not, one problem is you are slurping in more than one set with each go through the loop.

Will there always be two lines between?  You say "variable lines" but don't specify if that is variable content or variable number of lines (I'm assuming you mean the latter).
0
 

Author Comment

by:areyouready344
ID: 34937932
Yes wilcoxon, there will be variable lines between the criteria lines of ^@ and ^1\. , and yes, each data record now has __Data__
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34937948
If each set of lines now has __Data__, that makes it much easier.

Changing your regex to the below should work:

if(/(^\@..*[^\n]*).*(^1\..*[^\n]*)/ms)
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:areyouready344
ID: 34938093
Tried it but it does not only the @text and 1. lines

it prints out everything...

#!/usr/bin/perl

use strict;
use warnings;

open FH, '<', 'dd' or die $!;


$/='__Data__';


while(<FH>)
{

           if(/(^\@..*[^\n]*).*(^1\..*[^\n]*)/ms)
            {
               print $1,$2,"\n";
            }
}


The output is, it never filters.

@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd

__Data__
@test_scsi
passed
stop
1. ddkdkdkdkdkkdkdkd
2. dkdkdkdkdkdkdkd
3. dkdkdkdkdkdkdkd


Here's what I like the output to be like

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34938250
while ( <> )
{
	if ( /^\@test/ ) { $testLine = $_; }
	elsif ( /^1\./ )
	{
		print $testLine;
		print $_;
		print "\n";
	}
}

Open in new window


C:\temp>perl foo.pl foo.txt
@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

@test_scsi
1. ddkdkdkdkdkkdkdkd

Open in new window

0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34938354
I missed that you had .* as well as [^\n]* - they are redundant (and the first is causing the problem).

if(/(^\@.[^\n]*).*(^1\.[^\n]*)/ms)

will fix the regex and

print $1,"\n",$2,"\n\n";

will make the output match the format you want.
0
 

Author Comment

by:areyouready344
ID: 34938471
I hoping it would work with the input record separator as your solution works without using the input record separator.
0
 
LVL 27

Accepted Solution

by:
wilcoxon earned 2000 total points
ID: 34938516
My last comment should work (tested) and still uses the record input separator.

I've included a full copy of the code below (rather than the previous comments on how to change it).
#!/usr/bin/perl

use strict;
use warnings;

open FH, '<', 'dd' or die $!;

$/='__Data__';

while(<FH>)
{
           if(/(^\@.[^\n]*).*(^1\.[^\n]*)/ms)
            {
               print $1,"\n",$2,"\n\n";
            }
}

Open in new window

0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34938522
Oops.  One more minor change.  It looks like there's an extra . in the regex (which shouldn't cause any issues) but it should be:

if(/(^\@[^\n]*).*(^1\.[^\n]*)/ms)
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34938565
I retract my solution - wilcoxon's is much better.
0
 

Author Comment

by:areyouready344
ID: 34939593
right on the momey Wilcoxon, how do you know this.... thanks for understanding and solving this issue. Now I know how to filter certain lines in a multiple line record. Now I can build any type of html table on any type of record line. This is powerful.

Thanks again...
0
 

Author Closing Comment

by:areyouready344
ID: 34939598
solution worked great, no problems.
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 34939689
It's just a matter of experience.  You'll get there someday if you keep programming in perl.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question