Solved

Regular Expression

Posted on 2004-04-26
19
273 Views
Last Modified: 2010-03-04
Hello,

I have the problem that the regular expression (word1|word2|word3)? is not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/ print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ", the second up to the last word of this sentence should be " to the console.

However, this does not happen. I used $& to look at what gets matched and it seems nothing at all.

If I take the question mark out - then the expresson matches, but the problem is, that I also want the the following variation of the above sentence to match:

"$context = "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!"

The reason is, that I want to capture any expressoin which matches the regular expression:

Always(.*)variations!

Now, if this regular expression is precedded by one of the words in the list, then I would like to know about it and capture/print it out.

Do you know the a regular expression to achieve this?

Thanks,
Tim




0
Comment
Question by:tequilla
  • 6
  • 4
  • 4
  • +1
19 Comments
 
LVL 6

Expert Comment

by:christopher sagayam
Comment Utility
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?(simple|easy|plain|simplitic).*?Always(.*?)captured\!/ print($2 . "\n")/iesg;
0
 
LVL 11

Expert Comment

by:lbertacco
Comment Utility
$1, $2,..$9 are set to the first, second,..parenthesized expression, so in you code
$1 is set to the string matching (simple|easy|plain|simplitic)?.*
and
$2 is set to the string matching (.*)

If you want to print both, do
print("$1 - $2\n")

Also, escape with a backslash the exclamation mark as chris18 has done (\!)
0
 

Author Comment

by:tequilla
Comment Utility
@chris18
The problem with your regular expression is, that it does not capture sentences which do NOT contain one of the words in the list.

That's why I tried (word1|word2|word3)? with the question mark at the end. However, this does not work.

@lbertacco
Sorry, there is a mistake in my example. In my real example I'm using $1 and $2 of course and also not a single ! but a \!.
0
 
LVL 6

Expert Comment

by:christopher sagayam
Comment Utility
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?[simple,easy,plain,simplitic]+.*?Always(.*?)captured\!/ print($1-$2 . "\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
for$context(
"This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!",
 "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!",
){
  print "$1\n$2\n" if $context =~ m/(?:.*?(simple|easy|plain|simplitic).*?|)Always(.*?)captured!/;
}


;
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*|)Always(.*?)captured!/
0
 

Author Comment

by:tequilla
Comment Utility
@chris18

The expression [simple,easy,plain,simplitic]+ does NOT ONLY capture the words in the list but others. It also requires one of the words to appear at least once - which is not what I want. Once or not at all would be ok.

@ozo
I don't want another construct. Just one expression not other if statements and so on.

I think the question really is, why (word|word2|word3)? does not work?
0
 
LVL 6

Expert Comment

by:christopher sagayam
Comment Utility
"Once or not at all would be ok."

then this should be fine

 [simple,easy,plain,simplitic]+  
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 6

Expert Comment

by:christopher sagayam
Comment Utility
$context = "This is a very interesting sentence. Always, the second up to the last word of this sentence should be captured!";

$context =~ s/.*?(simple|easy|plain|simplitic|[a-z]+).*?Always(.*)captured!/ print($2 . "\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured!/
0
 

Author Comment

by:tequilla
Comment Utility
@chris18

[simple,easy,plain,simplitic]+ will for example also capture:

elpmis or
ysea or
im or
pldmc

and the + signs means at least once and not not zero ore once.

(simple|easy|plain|simplitic|[a-z]+) I have tried myself. The problem is, that now I will print out mismatching characters/words.

@ozo
I'm looking for a single regular expression /..../ which will do the job.
0
 
LVL 11

Expert Comment

by:lbertacco
Comment Utility
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/print("$1 $2\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
the print in s//print/e will replace  "simple sentence. Always, the second up to the last word of this sentence should be captured!" with "1" the (if the print succeeds) giving "This is a very 1"
Is that what you really want?  I thought not, so I replaced the s with m, but the regular expression I gave will do the job you described.
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
#could this be what you wanted to do?
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/$1$2\n/isg;
print $context;
0
 

Author Comment

by:tequilla
Comment Utility
@ozo, @lbertacco

This already looks better. But strangely it does not work neither. I think it should, but it doesn't.

So let me try to explain the problem again using words. What I'm trying to
achieve can be summarized as follows:

1. I'm looking for a pattern (lets call it A), which I want to capture
and print out to standard output. It reocurs several times in the
text.

2. If I find the pattern A in the text, then I want to know, whether
or not a certain word (lets call it B) out of a word list (lets call
it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded by
a word out of C, then print A;C;. If A is found, but none of the words
out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
Comment Utility
$_ = "This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)";
while( /(?:(computer|apartment|Miro)\D*)?(\d+)\s+Dollar/gis ){
    print "$2;$1;\n";
}
0
 
LVL 11

Assisted Solution

by:lbertacco
lbertacco earned 250 total points
Comment Utility
And again with just 1 statement:

$context =~ s/(?:(computer|apartment|Miro)\D*)?(\d+)\s+Dollar/print("$2;$1;\n")/iesg;

0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now