?
Solved

Perl Remove Portion of String

Posted on 2011-09-22
8
Medium Priority
?
422 Views
Last Modified: 2012-05-12
Given the portion of the perl script below, if the string is found  in $variable the output is this:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==

If not found $variable will contain a single field something  like this:
{MD5}lL5Bo6QZUIcsEuxuABCDEFG1YAgCAZF==

What I'd like to do if the string is found is to remove it so $variable only contains:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF==

 Looking for suggestions on the best way to handle this since it needs to be pretty robust and a wide range of characters could be in there.  

Currently, the second field of the $variable ({MD5}BY2I75mxmgp1DPwsTYKiargac7==)  is a constant value so using that as a match will work.  So for now I'm just looking to remove it if it's there.

In the future it may not be constant and in that case I would need to check if $variable contains two values/fields and remove the second one.  The first one is all that is needed.

elsif ($variable =~ /\{MD5\}BY2I75mxmgp1DPwsTYKiargac7==/i {
      print "Variable is $variable\n;
}

Open in new window

0
Comment
Question by:credog
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +1
8 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36583925
This appears to work for me:


#!/usr/bin/perl

use strict;
use warnings;

my $variable = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==";
print "Variable before: $variable\n";
$variable =~ s/\s\{MD5\}.*//;
print "Variable after: $variable\n";

OUTPUT:
Variable before: {MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==
Variable after: {MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF==

Open in new window

0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 400 total points
ID: 36583930
I've assumed the second value will always have a space and then {MD5}

If that's not the case, perhaps we could just remove everything after the space?
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36584092
Will the fields always end with == or always start with {MD5}?

Here's code that will work for either (with comments).
#!/usr/local/bin/perl

# always use strict and warnings
use strict;
use warnings;

my $variable = '{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==';

# you probably want to comment out all of these except one

# if always ending in ==
$variable =~ s{(?<==).*$}{};

# if always starts with {MD5} - make sure it's not the first one
# I couldn't figure out how to get a look-behind to match any char
$variable =~ s{(.){MD5}.*$}{$1};

# if there is always a space between items
$variable =~ s{\s.*$}{};

print $variable, "\n";

Open in new window

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 

Author Comment

by:credog
ID: 36589441
All options seem to work equally as well, but I'm focusing on
$variable =~ s{(.){MD5}.*$}{$1};

Open in new window

since I think it will always start with {MD5}.  I'm confused on how this works and would appreciate a detailed explanation.  Heres my take, although most likely inaccurate:

1. The brackets (except those around MD5) are just the delimenters you chose to use instead of the standard /.  Not sure why the backets around MD5 do not need to be escaped, but it appears they does not.

2. The
(.){MD5}.*${$1}

Open in new window

section has me confused.  I'm just not sure why this is working.  In the brackets that have $1, I can put {$1} or {} and still get the desired result.  I don't understand how this is removing the second field.

Also, I can remove the $, which I assume is a end of line anchor and still acheive the desired results.  I just want to make sure that it will always remove the second field.

This regx works even if the space is removed between the two feilds, which if great, but I don't see how.  The explanation that I have found on what (.) means doesn't seem to match how this is working.  Thanks
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 1200 total points
ID: 36589624
1) That's exactly why I like using m{} or s{}{} - you only have to escape an unmatched { or } (whereas with /, you always have to escape /).  I also use vim for editing so {} makes for quick jumps around long regexes.

2) (.) captures one character (to make sure it isn't the beginning of the string) and {MD5}.*$ matches (but does not capture) from {MD5} to the end of the string.  The $1 simply puts the character captures (the one just before {MD5}) back into the original string.  In this case, it is probably a space so it's not obvious.  If it will always be a space just before the second {MD5} and you don't care about keeping it then using s{\s{MD5}.*$}{} will work a tiny bit more efficiently.  Yes, $ is the end-of-string marker - if the preceeding part is .* or .+, it will (almost?) always match to the end of the string anyway - I include $ to make it explicit.
0
 

Author Comment

by:credog
ID: 36590013
OK.  Great explanation.  Now that I understand what this does I have added the following to make sure this behaves the same way if by chance $variable has a space in the front for some reason.
sub trim {
    $_[0]=~s/^\s+//;
    $_[0]=~s/\s+$//;
    $_[0]=~s/\s+//g;
    return;
.....
}
elsif ($variable =~ /\{MD5\}BY2I75mxmgp1DPwsTYKiargac7==/i {
      print "Variable is $variable\n;
      trim ($variable);
      $variable=~ s{(.){MD5}.*$}{$1};
      print "Variable is $variable\n;
}

Open in new window

This seems to work pretty well.  It removes any possible leading space so I can be assured the regular expression is not grabbing the wrong field. Also it gets rid of any space in the middle which I do not need and I'm not sure if there could be none, 1 or several spaces.  I don't think I'm missing anything?
0
 
LVL 84

Expert Comment

by:ozo
ID: 36590981
$_[0]=~s/^\s+//; # removes leading spaces
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 400 total points
ID: 36590990
and so does
  $_[0]=~s/\s+//g;
which makes
  $_[0]=~s/^\s+//;
redundant

  $_[0]=~s/\s+$//;
would also be redundant
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question