Link to home
Start Free TrialLog in
Avatar of tequilla
tequilla

asked on

Regular Expression

Hello,

I have the problem that the regular expression (word1|word2|word3)? is not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/ print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ", the second up to the last word of this sentence should be " to the console.

However, this does not happen. I used $& to look at what gets matched and it seems nothing at all.

If I take the question mark out - then the expresson matches, but the problem is, that I also want the the following variation of the above sentence to match:

"$context = "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!"

The reason is, that I want to capture any expressoin which matches the regular expression:

Always(.*)variations!

Now, if this regular expression is precedded by one of the words in the list, then I would like to know about it and capture/print it out.

Do you know the a regular expression to achieve this?

Thanks,
Tim




Avatar of Chris S
Chris S
Flag of India image

$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?(simple|easy|plain|simplitic).*?Always(.*?)captured\!/ print($2 . "\n")/iesg;
Avatar of lbertacco
lbertacco

$1, $2,..$9 are set to the first, second,..parenthesized expression, so in you code
$1 is set to the string matching (simple|easy|plain|simplitic)?.*
and
$2 is set to the string matching (.*)

If you want to print both, do
print("$1 - $2\n")

Also, escape with a backslash the exclamation mark as chris18 has done (\!)
Avatar of tequilla

ASKER

@chris18
The problem with your regular expression is, that it does not capture sentences which do NOT contain one of the words in the list.

That's why I tried (word1|word2|word3)? with the question mark at the end. However, this does not work.

@lbertacco
Sorry, there is a mistake in my example. In my real example I'm using $1 and $2 of course and also not a single ! but a \!.
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?[simple,easy,plain,simplitic]+.*?Always(.*?)captured\!/ print($1-$2 . "\n")/iesg;
Avatar of ozo
for$context(
"This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!",
 "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!",
){
  print "$1\n$2\n" if $context =~ m/(?:.*?(simple|easy|plain|simplitic).*?|)Always(.*?)captured!/;
}


;
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*|)Always(.*?)captured!/
@chris18

The expression [simple,easy,plain,simplitic]+ does NOT ONLY capture the words in the list but others. It also requires one of the words to appear at least once - which is not what I want. Once or not at all would be ok.

@ozo
I don't want another construct. Just one expression not other if statements and so on.

I think the question really is, why (word|word2|word3)? does not work?
"Once or not at all would be ok."

then this should be fine

 [simple,easy,plain,simplitic]+  
$context = "This is a very interesting sentence. Always, the second up to the last word of this sentence should be captured!";

$context =~ s/.*?(simple|easy|plain|simplitic|[a-z]+).*?Always(.*)captured!/ print($2 . "\n")/iesg;
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured!/
@chris18

[simple,easy,plain,simplitic]+ will for example also capture:

elpmis or
ysea or
im or
pldmc

and the + signs means at least once and not not zero ore once.

(simple|easy|plain|simplitic|[a-z]+) I have tried myself. The problem is, that now I will print out mismatching characters/words.

@ozo
I'm looking for a single regular expression /..../ which will do the job.
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/print("$1 $2\n")/iesg;
the print in s//print/e will replace  "simple sentence. Always, the second up to the last word of this sentence should be captured!" with "1" the (if the print succeeds) giving "This is a very 1"
Is that what you really want?  I thought not, so I replaced the s with m, but the regular expression I gave will do the job you described.
#could this be what you wanted to do?
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/$1$2\n/isg;
print $context;
@ozo, @lbertacco

This already looks better. But strangely it does not work neither. I think it should, but it doesn't.

So let me try to explain the problem again using words. What I'm trying to
achieve can be summarized as follows:

1. I'm looking for a pattern (lets call it A), which I want to capture
and print out to standard output. It reocurs several times in the
text.

2. If I find the pattern A in the text, then I want to know, whether
or not a certain word (lets call it B) out of a word list (lets call
it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded by
a word out of C, then print A;C;. If A is found, but none of the words
out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial