Solved

Regular Expression

Posted on 2004-04-26
19
280 Views
Last Modified: 2010-03-04
Hello,

I have the problem that the regular expression (word1|word2|word3)? is not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/ print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ", the second up to the last word of this sentence should be " to the console.

However, this does not happen. I used $& to look at what gets matched and it seems nothing at all.

If I take the question mark out - then the expresson matches, but the problem is, that I also want the the following variation of the above sentence to match:

"$context = "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!"

The reason is, that I want to capture any expressoin which matches the regular expression:

Always(.*)variations!

Now, if this regular expression is precedded by one of the words in the list, then I would like to know about it and capture/print it out.

Do you know the a regular expression to achieve this?

Thanks,
Tim




0
Comment
Question by:tequilla
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 4
  • +1
19 Comments
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10916806
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?(simple|easy|plain|simplitic).*?Always(.*?)captured\!/ print($2 . "\n")/iesg;
0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10916900
$1, $2,..$9 are set to the first, second,..parenthesized expression, so in you code
$1 is set to the string matching (simple|easy|plain|simplitic)?.*
and
$2 is set to the string matching (.*)

If you want to print both, do
print("$1 - $2\n")

Also, escape with a backslash the exclamation mark as chris18 has done (\!)
0
 

Author Comment

by:tequilla
ID: 10916938
@chris18
The problem with your regular expression is, that it does not capture sentences which do NOT contain one of the words in the list.

That's why I tried (word1|word2|word3)? with the question mark at the end. However, this does not work.

@lbertacco
Sorry, there is a mistake in my example. In my real example I'm using $1 and $2 of course and also not a single ! but a \!.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10916957
$context = "This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!"


$context =~ s/.*?[simple,easy,plain,simplitic]+.*?Always(.*?)captured\!/ print($1-$2 . "\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10917005
for$context(
"This is a very simple sentence. Always, the second up to the last word of this sentence should be captured!",
 "This is a very STUPID sentence.  Always, the second up to the last word of this sentence should be captured!",
){
  print "$1\n$2\n" if $context =~ m/(?:.*?(simple|easy|plain|simplitic).*?|)Always(.*?)captured!/;
}


;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10917208
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*|)Always(.*?)captured!/
0
 

Author Comment

by:tequilla
ID: 10917215
@chris18

The expression [simple,easy,plain,simplitic]+ does NOT ONLY capture the words in the list but others. It also requires one of the words to appear at least once - which is not what I want. Once or not at all would be ok.

@ozo
I don't want another construct. Just one expression not other if statements and so on.

I think the question really is, why (word|word2|word3)? does not work?
0
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10917230
"Once or not at all would be ok."

then this should be fine

 [simple,easy,plain,simplitic]+  
0
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 10917262
$context = "This is a very interesting sentence. Always, the second up to the last word of this sentence should be captured!";

$context =~ s/.*?(simple|easy|plain|simplitic|[a-z]+).*?Always(.*)captured!/ print($2 . "\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10917302
print "$1$2\n" if $context =~ m/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured!/
0
 

Author Comment

by:tequilla
ID: 10917957
@chris18

[simple,easy,plain,simplitic]+ will for example also capture:

elpmis or
ysea or
im or
pldmc

and the + signs means at least once and not not zero ore once.

(simple|easy|plain|simplitic|[a-z]+) I have tried myself. The problem is, that now I will print out mismatching characters/words.

@ozo
I'm looking for a single regular expression /..../ which will do the job.
0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10922157
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/print("$1 $2\n")/iesg;
0
 
LVL 84

Expert Comment

by:ozo
ID: 10922997
the print in s//print/e will replace  "simple sentence. Always, the second up to the last word of this sentence should be captured!" with "1" the (if the print succeeds) giving "This is a very 1"
Is that what you really want?  I thought not, so I replaced the s with m, but the regular expression I gave will do the job you described.
0
 
LVL 84

Expert Comment

by:ozo
ID: 10925109
#could this be what you wanted to do?
$context =~ s/(?:(simple|easy|plain|simplitic).*)?Always(.*?)captured\!/$1$2\n/isg;
print $context;
0
 

Author Comment

by:tequilla
ID: 10925623
@ozo, @lbertacco

This already looks better. But strangely it does not work neither. I think it should, but it doesn't.

So let me try to explain the problem again using words. What I'm trying to
achieve can be summarized as follows:

1. I'm looking for a pattern (lets call it A), which I want to capture
and print out to standard output. It reocurs several times in the
text.

2. If I find the pattern A in the text, then I want to know, whether
or not a certain word (lets call it B) out of a word list (lets call
it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded by
a word out of C, then print A;C;. If A is found, but none of the words
out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 10925794
$_ = "This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)";
while( /(?:(computer|apartment|Miro)\D*)?(\d+)\s+Dollar/gis ){
    print "$2;$1;\n";
}
0
 
LVL 11

Assisted Solution

by:lbertacco
lbertacco earned 250 total points
ID: 10927751
And again with just 1 statement:

$context =~ s/(?:(computer|apartment|Miro)\D*)?(\d+)\s+Dollar/print("$2;$1;\n")/iesg;

0

Featured Post

[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question