Solved

How can I extract this using Regex in Perl?

Posted on 2012-03-27
11
263 Views
Last Modified: 2012-03-28
I have strings like the following:
1.2x...
super...

For both string, I want to extract the part before the ... so, for the first one I want the string splitted into 1.2x and ...

For the second I want to split the string into super and ...

How should I do this?  Thanks.
0
Comment
Question by:thomaszhwang
  • 6
  • 5
11 Comments
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37774945
Something like this?

$text = "super...";
$text =~ /(.*?)(\.{3})/s;
print "I want $1 and $2";

Output (after I fixed a bug from my initial post):
I want super and ...
0
 

Author Comment

by:thomaszhwang
ID: 37774958
Does this work for 1.2x...... as well?

I can't test right now, but will do soon.  Thanks.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37774998
Yes.
0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 

Author Comment

by:thomaszhwang
ID: 37777266
$text =~ /(.*?)(\.{3})/s;

Open in new window


What does the s at the end do?

I tried to do a match on sto... and the result are sto.. and .

This is my code.

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]+)$/;

Open in new window


Any idea?  Thanks.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37778923
You've changed the pattern so that only one . character is captured in the 2nd group. Try:

($p1, $p2) = lc($x) =~ /^(.+)([.,:;]{3})$/;

Open in new window


Because you've changed the \. to [.,:;], it will not just match ... but it will also match the following:
.,:
;;;
:,:
etc

The s is a modifier which means that the . wildcard will match newline characters. This means that if your text was this:

Some text over
multiple lines... and some more

Open in new window


The results would be:
Some text over
multiple lines

Open in new window

and
...

Open in new window


You've also removed the ? from my pattern, which makes the .+ (or .*) non-greedy. Without it, from the text:

Some text over
multiple lines... and some more::: and yet more

Open in new window


The results would be:
Some text over
multiple lines... and some more

Open in new window

and
:::

Open in new window

0
 

Author Comment

by:thomaszhwang
ID: 37778996
Yes, that's actually what I want.  I don't know the exact number of the following dots and it would be nice to match things such as : and ;

So in general, I want to include as many marks as possible that are at the end of the string.

abc.:;;;;;.......                      ->        abc and .:;;;;;.......
1.4323.........................     ->        1.4323 and .........................
abc.                                   ->        abc and .
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 500 total points
ID: 37779161
I think you need that ? to make the .+ non-greedy.

To pick up the extra punctuation characters you'll want one more slight adjustment:
($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]{3,})$/;

However, your 3rd case creates a new problem - if you were to capture text prior to just 1 punctuation character, as in:
abc.                                   ->        abc and .

Then you would get this result:
1.4323.........................     ->        1 and .

If you can think of some rule that would consistently differentiate between those 2 cases (such as always ignoring a single . if there's a number straight after it), then we could potentially resolve that. eg

($p1, $p2) = lc($x) =~ /^(.+?)((?!\.\d)[.,:;]+)/;

Note you've also got a $ at the end of the pattern, which forces the punctuation characters to be at the end of the line. Is that really what you want?
0
 

Author Comment

by:thomaszhwang
ID: 37779192
Do you think this gonna work since I do have a word boundary at the end, so as long as I make it non-greedy, it should work fine, right?

($p1, $p2) = lc($x) =~ /^(.+?)([.,:;]+)$/;

Open in new window

0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 37779225
Yes, I think that will work. To be pedantic, a word boundary is a \b whereas the $ matches the end of the string (or line, if you use the m modifier).
0
 

Author Comment

by:thomaszhwang
ID: 37779235
Oh ok, thanks.  The end of the string is what I want.
0
 

Author Closing Comment

by:thomaszhwang
ID: 37779240
Thanks.
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PERL - Find newest folder 12 139
regex code to filter this ip's? 2 37
Matching a random pattern with one common character 2 62
REReplaceNoCase help 1 33
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question