Solved

Perl Remove Portion of String

Posted on 2011-09-22
8
408 Views
Last Modified: 2012-05-12
Given the portion of the perl script below, if the string is found  in $variable the output is this:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==

If not found $variable will contain a single field something  like this:
{MD5}lL5Bo6QZUIcsEuxuABCDEFG1YAgCAZF==

What I'd like to do if the string is found is to remove it so $variable only contains:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF==

 Looking for suggestions on the best way to handle this since it needs to be pretty robust and a wide range of characters could be in there.  

Currently, the second field of the $variable ({MD5}BY2I75mxmgp1DPwsTYKiargac7==)  is a constant value so using that as a match will work.  So for now I'm just looking to remove it if it's there.

In the future it may not be constant and in that case I would need to check if $variable contains two values/fields and remove the second one.  The first one is all that is needed.

elsif ($variable =~ /\{MD5\}BY2I75mxmgp1DPwsTYKiargac7==/i {
      print "Variable is $variable\n;
}

Open in new window

0
Comment
Question by:credog
  • 2
  • 2
  • 2
  • +1
8 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36583925
This appears to work for me:


#!/usr/bin/perl

use strict;
use warnings;

my $variable = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==";
print "Variable before: $variable\n";
$variable =~ s/\s\{MD5\}.*//;
print "Variable after: $variable\n";

OUTPUT:
Variable before: {MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==
Variable after: {MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF==

Open in new window

0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 36583930
I've assumed the second value will always have a space and then {MD5}

If that's not the case, perhaps we could just remove everything after the space?
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36584092
Will the fields always end with == or always start with {MD5}?

Here's code that will work for either (with comments).
#!/usr/local/bin/perl

# always use strict and warnings
use strict;
use warnings;

my $variable = '{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==';

# you probably want to comment out all of these except one

# if always ending in ==
$variable =~ s{(?<==).*$}{};

# if always starts with {MD5} - make sure it's not the first one
# I couldn't figure out how to get a look-behind to match any char
$variable =~ s{(.){MD5}.*$}{$1};

# if there is always a space between items
$variable =~ s{\s.*$}{};

print $variable, "\n";

Open in new window

0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 

Author Comment

by:credog
ID: 36589441
All options seem to work equally as well, but I'm focusing on
$variable =~ s{(.){MD5}.*$}{$1};

Open in new window

since I think it will always start with {MD5}.  I'm confused on how this works and would appreciate a detailed explanation.  Heres my take, although most likely inaccurate:

1. The brackets (except those around MD5) are just the delimenters you chose to use instead of the standard /.  Not sure why the backets around MD5 do not need to be escaped, but it appears they does not.

2. The
(.){MD5}.*${$1}

Open in new window

section has me confused.  I'm just not sure why this is working.  In the brackets that have $1, I can put {$1} or {} and still get the desired result.  I don't understand how this is removing the second field.

Also, I can remove the $, which I assume is a end of line anchor and still acheive the desired results.  I just want to make sure that it will always remove the second field.

This regx works even if the space is removed between the two feilds, which if great, but I don't see how.  The explanation that I have found on what (.) means doesn't seem to match how this is working.  Thanks
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 300 total points
ID: 36589624
1) That's exactly why I like using m{} or s{}{} - you only have to escape an unmatched { or } (whereas with /, you always have to escape /).  I also use vim for editing so {} makes for quick jumps around long regexes.

2) (.) captures one character (to make sure it isn't the beginning of the string) and {MD5}.*$ matches (but does not capture) from {MD5} to the end of the string.  The $1 simply puts the character captures (the one just before {MD5}) back into the original string.  In this case, it is probably a space so it's not obvious.  If it will always be a space just before the second {MD5} and you don't care about keeping it then using s{\s{MD5}.*$}{} will work a tiny bit more efficiently.  Yes, $ is the end-of-string marker - if the preceeding part is .* or .+, it will (almost?) always match to the end of the string anyway - I include $ to make it explicit.
0
 

Author Comment

by:credog
ID: 36590013
OK.  Great explanation.  Now that I understand what this does I have added the following to make sure this behaves the same way if by chance $variable has a space in the front for some reason.
sub trim {
    $_[0]=~s/^\s+//;
    $_[0]=~s/\s+$//;
    $_[0]=~s/\s+//g;
    return;
.....
}
elsif ($variable =~ /\{MD5\}BY2I75mxmgp1DPwsTYKiargac7==/i {
      print "Variable is $variable\n;
      trim ($variable);
      $variable=~ s{(.){MD5}.*$}{$1};
      print "Variable is $variable\n;
}

Open in new window

This seems to work pretty well.  It removes any possible leading space so I can be assured the regular expression is not grabbing the wrong field. Also it gets rid of any space in the middle which I do not need and I'm not sure if there could be none, 1 or several spaces.  I don't think I'm missing anything?
0
 
LVL 84

Expert Comment

by:ozo
ID: 36590981
$_[0]=~s/^\s+//; # removes leading spaces
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
ID: 36590990
and so does
  $_[0]=~s/\s+//g;
which makes
  $_[0]=~s/^\s+//;
redundant

  $_[0]=~s/\s+$//;
would also be redundant
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question