Solved

How can I extract this using Regex in Perl?

Posted on 2012-03-27
11
268 Views
Last Modified: 2012-03-28
I have strings like the following:
1.2x...
super...

For both string, I want to extract the part before the ... so, for the first one I want the string splitted into 1.2x and ...

For the second I want to split the string into super and ...

How should I do this?  Thanks.
0
Comment
Question by:thomaszhwang
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
11 Comments
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37774945
Something like this?

$text = "super...";
$text =~ /(.*?)(\.{3})/s;
print "I want $1 and $2";

Output (after I fixed a bug from my initial post):
I want super and ...
0
 

Author Comment

by:thomaszhwang
ID: 37774958
Does this work for 1.2x...... as well?

I can't test right now, but will do soon.  Thanks.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37774998
Yes.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:thomaszhwang
ID: 37777266
$text =~ /(.*?)(\.{3})/s;

Open in new window


What does the s at the end do?

I tried to do a match on sto... and the result are sto.. and .

This is my code.

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]+)$/;

Open in new window


Any idea?  Thanks.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37778923
You've changed the pattern so that only one . character is captured in the 2nd group. Try:

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]{3})$/;

Open in new window


Because you've changed the \. to [.,:;], it will not just match ... but it will also match the following:
.,:
;;;
:,:
etc

The s is a modifier which means that the . wildcard will match newline characters. This means that if your text was this:

Some text over
multiple lines... and some more

Open in new window


The results would be:
Some text over
multiple lines

Open in new window

and
...

Open in new window


You've also removed the ? from my pattern, which makes the .+ (or .*) non-greedy. Without it, from the text:

Some text over
multiple lines... and some more::: and yet more

Open in new window


The results would be:
Some text over
multiple lines... and some more

Open in new window

and
:::

Open in new window

0
 

Author Comment

by:thomaszhwang
ID: 37778996
Yes, that's actually what I want.  I don't know the exact number of the following dots and it would be nice to match things such as : and ;

So in general, I want to include as many marks as possible that are at the end of the string.

abc.:;;;;;.......                      ->        abc and .:;;;;;.......
1.4323.........................     ->        1.4323 and .........................
abc.                                   ->        abc and .
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37779161
I think you need that ? to make the .+ non-greedy.

To pick up the extra punctuation characters you'll want one more slight adjustment:
($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]{3,})$/;

However, your 3rd case creates a new problem - if you were to capture text prior to just 1 punctuation character, as in:
abc.                                   ->        abc and .

Then you would get this result:
1.4323.........................     ->        1 and .

If you can think of some rule that would consistently differentiate between those 2 cases (such as always ignoring a single . if there's a number straight after it), then we could potentially resolve that. eg

($p1, $p2) = lc($x) =~ /^(.+?)((?!\.\d)[.,:;]+)/;

Note you've also got a $ at the end of the pattern, which forces the punctuation characters to be at the end of the line. Is that really what you want?
0
 

Author Comment

by:thomaszhwang
ID: 37779192
Do you think this gonna work since I do have a word boundary at the end, so as long as I make it non-greedy, it should work fine, right?

($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]+)$/;

Open in new window

0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 37779225
Yes, I think that will work. To be pedantic, a word boundary is a \b whereas the $ matches the end of the string (or line, if you use the m modifier).
0
 

Author Comment

by:thomaszhwang
ID: 37779235
Oh ok, thanks.  The end of the string is what I want.
0
 

Author Closing Comment

by:thomaszhwang
ID: 37779240
Thanks.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question