• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 275
  • Last Modified:

How can I extract this using Regex in Perl?

I have strings like the following:
1.2x...
super...

For both string, I want to extract the part before the ... so, for the first one I want the string splitted into 1.2x and ...

For the second I want to split the string into super and ...

How should I do this?  Thanks.
0
thomaszhwang
Asked:
thomaszhwang
  • 6
  • 5
5 Solutions
 
Terry WoodsIT GuruCommented:
Something like this?

$text = "super...";
$text =~ /(.*?)(\.{3})/s;
print "I want $1 and $2";

Output (after I fixed a bug from my initial post):
I want super and ...
0
 
thomaszhwangAuthor Commented:
Does this work for 1.2x...... as well?

I can't test right now, but will do soon.  Thanks.
0
 
Terry WoodsIT GuruCommented:
Yes.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
thomaszhwangAuthor Commented:
$text =~ /(.*?)(\.{3})/s;

Open in new window


What does the s at the end do?

I tried to do a match on sto... and the result are sto.. and .

This is my code.

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]+)$/;

Open in new window


Any idea?  Thanks.
0
 
Terry WoodsIT GuruCommented:
You've changed the pattern so that only one . character is captured in the 2nd group. Try:

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]{3})$/;

Open in new window


Because you've changed the \. to [.,:;], it will not just match ... but it will also match the following:
.,:
;;;
:,:
etc

The s is a modifier which means that the . wildcard will match newline characters. This means that if your text was this:

Some text over
multiple lines... and some more

Open in new window


The results would be:
Some text over
multiple lines

Open in new window

and
...

Open in new window


You've also removed the ? from my pattern, which makes the .+ (or .*) non-greedy. Without it, from the text:

Some text over
multiple lines... and some more::: and yet more

Open in new window


The results would be:
Some text over
multiple lines... and some more

Open in new window

and
:::

Open in new window

0
 
thomaszhwangAuthor Commented:
Yes, that's actually what I want.  I don't know the exact number of the following dots and it would be nice to match things such as : and ;

So in general, I want to include as many marks as possible that are at the end of the string.

abc.:;;;;;.......                      ->        abc and .:;;;;;.......
1.4323.........................     ->        1.4323 and .........................
abc.                                   ->        abc and .
0
 
Terry WoodsIT GuruCommented:
I think you need that ? to make the .+ non-greedy.

To pick up the extra punctuation characters you'll want one more slight adjustment:
($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]{3,})$/;

However, your 3rd case creates a new problem - if you were to capture text prior to just 1 punctuation character, as in:
abc.                                   ->        abc and .

Then you would get this result:
1.4323.........................     ->        1 and .

If you can think of some rule that would consistently differentiate between those 2 cases (such as always ignoring a single . if there's a number straight after it), then we could potentially resolve that. eg

($p1, $p2) = lc($x) =~ /^(.+?)((?!\.\d)[.,:;]+)/;

Note you've also got a $ at the end of the pattern, which forces the punctuation characters to be at the end of the line. Is that really what you want?
0
 
thomaszhwangAuthor Commented:
Do you think this gonna work since I do have a word boundary at the end, so as long as I make it non-greedy, it should work fine, right?

($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]+)$/;

Open in new window

0
 
Terry WoodsIT GuruCommented:
Yes, I think that will work. To be pedantic, a word boundary is a \b whereas the $ matches the end of the string (or line, if you use the m modifier).
0
 
thomaszhwangAuthor Commented:
Oh ok, thanks.  The end of the string is what I want.
0
 
thomaszhwangAuthor Commented:
Thanks.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now