How can I extract this using Regex in Perl?

I have strings like the following:
1.2x...
super...

For both string, I want to extract the part before the ... so, for the first one I want the string splitted into 1.2x and ...

For the second I want to split the string into super and ...

How should I do this?  Thanks.
thomaszhwangAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Terry WoodsIT GuruCommented:
Something like this?

$text = "super...";
$text =~ /(.*?)(\.{3})/s;
print "I want $1 and $2";

Output (after I fixed a bug from my initial post):
I want super and ...
0
thomaszhwangAuthor Commented:
Does this work for 1.2x...... as well?

I can't test right now, but will do soon.  Thanks.
0
Terry WoodsIT GuruCommented:
Yes.
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

thomaszhwangAuthor Commented:
$text =~ /(.*?)(\.{3})/s;

Open in new window


What does the s at the end do?

I tried to do a match on sto... and the result are sto.. and .

This is my code.

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]+)$/;

Open in new window


Any idea?  Thanks.
0
Terry WoodsIT GuruCommented:
You've changed the pattern so that only one . character is captured in the 2nd group. Try:

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]{3})$/;

Open in new window


Because you've changed the \. to [.,:;], it will not just match ... but it will also match the following:
.,:
;;;
:,:
etc

The s is a modifier which means that the . wildcard will match newline characters. This means that if your text was this:

Some text over
multiple lines... and some more

Open in new window


The results would be:
Some text over
multiple lines

Open in new window

and
...

Open in new window


You've also removed the ? from my pattern, which makes the .+ (or .*) non-greedy. Without it, from the text:

Some text over
multiple lines... and some more::: and yet more

Open in new window


The results would be:
Some text over
multiple lines... and some more

Open in new window

and
:::

Open in new window

0
thomaszhwangAuthor Commented:
Yes, that's actually what I want.  I don't know the exact number of the following dots and it would be nice to match things such as : and ;

So in general, I want to include as many marks as possible that are at the end of the string.

abc.:;;;;;.......                      ->        abc and .:;;;;;.......
1.4323.........................     ->        1.4323 and .........................
abc.                                   ->        abc and .
0
Terry WoodsIT GuruCommented:
I think you need that ? to make the .+ non-greedy.

To pick up the extra punctuation characters you'll want one more slight adjustment:
($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]{3,})$/;

However, your 3rd case creates a new problem - if you were to capture text prior to just 1 punctuation character, as in:
abc.                                   ->        abc and .

Then you would get this result:
1.4323.........................     ->        1 and .

If you can think of some rule that would consistently differentiate between those 2 cases (such as always ignoring a single . if there's a number straight after it), then we could potentially resolve that. eg

($p1, $p2) = lc($x) =~ /^(.+?)((?!\.\d)[.,:;]+)/;

Note you've also got a $ at the end of the pattern, which forces the punctuation characters to be at the end of the line. Is that really what you want?
0
thomaszhwangAuthor Commented:
Do you think this gonna work since I do have a word boundary at the end, so as long as I make it non-greedy, it should work fine, right?

($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]+)$/;

Open in new window

0
Terry WoodsIT GuruCommented:
Yes, I think that will work. To be pedantic, a word boundary is a \b whereas the $ matches the end of the string (or line, if you use the m modifier).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
thomaszhwangAuthor Commented:
Oh ok, thanks.  The end of the string is what I want.
0
thomaszhwangAuthor Commented:
Thanks.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.