Solved

Regular Expression

Posted on 2004-04-26
19
279 Views
Last Modified: 2010-03-04
Hello,

I have the problem that the regular expression (word1|word2|word3)? is not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/ print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ", the second up to the last word of this sentence should be " to the console.

However, this does not happen. I used $& to look at what gets matched and it seems nothing at all.

If I take the question mark out - then the expresson matches, but the problem is, that I also want the the following variation of the above sentence to match:

"$context = "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!"

The reason is, that I want to capture any expressoin which matches the regular expression:

Always(.*)variations!

Now, if this regular expression is precedded by one of the words in the list, then I would like to know about it and capture/print it out.

Do you know the a regular expression to achieve this?

Thanks,
Tim




0
Comment
Question by:tequilla
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 4
  • +1
19 Comments
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10916806
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?(simple|easy|plain|simplitic).*?Always(.*?)captured\!/ print($2 . "\n")/iesg;
0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10916900
$1, $2,..$9 are set to the first, second,..parenthesized expression, so in you code
$1 is set to the string matching (simple|easy|plain|simplitic)?.*
and
$2 is set to the string matching (.*)

If you want to print both, do
print("$1 - $2\n")

Also, escape with a backslash the exclamation mark as chris18 has done (\!)
0
 

Author Comment

by:tequilla
ID: 10916938
@chris18
The problem with your regular expression is, that it does not capture sentences which do NOT contain one of the words in the list.

That's why I tried (word1|word2|word3)? with the question mark at the end. However, this does not work.

@lbertacco
Sorry, there is a mistake in my example. In my real example I'm using $1 and $2 of course and also not a single ! but a \!.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10916957
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?[simple,easy,plain,simplitic]+.*?Always(.*?)captured\!/ print($1-$2 . "\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10917005
for$context(
"This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!",
 "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!",
){
  print "$1\n$2\n" if $context =~ m/(?:.*?(simple|easy|plain|simplitic).*?|)Always(.*?)captured!/;
}


;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10917208
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*|)Always(.*?)captured!/
0
 

Author Comment

by:tequilla
ID: 10917215
@chris18

The expression [simple,easy,plain,simplitic]+ does NOT ONLY capture the words in the list but others. It also requires one of the words to appear at least once - which is not what I want. Once or not at all would be ok.

@ozo
I don't want another construct. Just one expression not other if statements and so on.

I think the question really is, why (word|word2|word3)? does not work?
0
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10917230
"Once or not at all would be ok."

then this should be fine

 [simple,easy,plain,simplitic]+  
0
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10917262
$context = "This is a very interesting sentence. Always, the second up to the last word of this sentence should be captured!";

$context =~ s/.*?(simple|easy|plain|simplitic|[a-z]+).*?Always(.*)captured!/ print($2 . "\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10917302
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured!/
0
 

Author Comment

by:tequilla
ID: 10917957
@chris18

[simple,easy,plain,simplitic]+ will for example also capture:

elpmis or
ysea or
im or
pldmc

and the + signs means at least once and not not zero ore once.

(simple|easy|plain|simplitic|[a-z]+) I have tried myself. The problem is, that now I will print out mismatching characters/words.

@ozo
I'm looking for a single regular expression /..../ which will do the job.
0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10922157
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/print("$1 $2\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10922997
the print in s//print/e will replace  "simple sentence. Always, the second up to the last word of this sentence should be captured!" with "1" the (if the print succeeds) giving "This is a very 1"
Is that what you really want?  I thought not, so I replaced the s with m, but the regular expression I gave will do the job you described.
0
 
LVL 84

Expert Comment

by:ozo
ID: 10925109
#could this be what you wanted to do?
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/$1$2\n/isg;
print $context;
0
 

Author Comment

by:tequilla
ID: 10925623
@ozo, @lbertacco

This already looks better. But strangely it does not work neither. I think it should, but it doesn't.

So let me try to explain the problem again using words. What I'm trying to
achieve can be summarized as follows:

1. I'm looking for a pattern (lets call it A), which I want to capture
and print out to standard output. It reocurs several times in the
text.

2. If I find the pattern A in the text, then I want to know, whether
or not a certain word (lets call it B) out of a word list (lets call
it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded by
a word out of C, then print A;C;. If A is found, but none of the words
out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 10925794
$_ = "This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)";
while( /(?:(computer|apartment|Miro)\D*)?(\d+)\s+Dollar/gis ){
    print "$2;$1;\n";
}
0
 
LVL 11

Assisted Solution

by:lbertacco
lbertacco earned 250 total points
ID: 10927751
And again with just 1 statement:

$context =~ s/(?:(computer|apartment|Miro)\D*)?(\d+)\s+Dollar/print("$2;$1;\n")/iesg;

0

Featured Post

[Webinar] Code, Load, and Grow

Managing multiple websites, servers, applications, and security on a daily basis? Join us for a webinar on May 25th to learn how to simplify administration and management of virtual hosts for IT admins, create a secure environment, and deploy code more effectively and frequently.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question