Solved

Perl Remove Portion of String

Posted on 2011-09-22
8
404 Views
Last Modified: 2012-05-12
Given the portion of the perl script below, if the string is found  in $variable the output is this:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==

If not found $variable will contain a single field something  like this:
{MD5}lL5Bo6QZUIcsEuxuABCDEFG1YAgCAZF==

What I'd like to do if the string is found is to remove it so $variable only contains:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF==

 Looking for suggestions on the best way to handle this since it needs to be pretty robust and a wide range of characters could be in there.  

Currently, the second field of the $variable ({MD5}BY2I75mxmgp1DPwsTYKiargac7==)  is a constant value so using that as a match will work.  So for now I'm just looking to remove it if it's there.

In the future it may not be constant and in that case I would need to check if $variable contains two values/fields and remove the second one.  The first one is all that is needed.

elsif ($variable =~ /\{MD5\}BY2I75mxmgp1DPwsTYKiargac7==/i {
      print "Variable is $variable\n;
}

Open in new window

0
Comment
Question by:credog
  • 2
  • 2
  • 2
  • +1
8 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36583925
This appears to work for me:


#!/usr/bin/perl

use strict;
use warnings;

my $variable = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==";
print "Variable before: $variable\n";
$variable =~ s/\s\{MD5\}.*//;
print "Variable after: $variable\n";

OUTPUT:
Variable before: {MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==
Variable after: {MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF==

Open in new window

0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 36583930
I've assumed the second value will always have a space and then {MD5}

If that's not the case, perhaps we could just remove everything after the space?
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36584092
Will the fields always end with == or always start with {MD5}?

Here's code that will work for either (with comments).
#!/usr/local/bin/perl

# always use strict and warnings
use strict;
use warnings;

my $variable = '{MD5}lL5Bo6QZUIcsEuxusMBXMR1YAgCAZF== {MD5}BY2I75mxmgp1DPwsTYKiargac7==';

# you probably want to comment out all of these except one

# if always ending in ==
$variable =~ s{(?<==).*$}{};

# if always starts with {MD5} - make sure it's not the first one
# I couldn't figure out how to get a look-behind to match any char
$variable =~ s{(.){MD5}.*$}{$1};

# if there is always a space between items
$variable =~ s{\s.*$}{};

print $variable, "\n";

Open in new window

0
 

Author Comment

by:credog
ID: 36589441
All options seem to work equally as well, but I'm focusing on
$variable =~ s{(.){MD5}.*$}{$1};

Open in new window

since I think it will always start with {MD5}.  I'm confused on how this works and would appreciate a detailed explanation.  Heres my take, although most likely inaccurate:

1. The brackets (except those around MD5) are just the delimenters you chose to use instead of the standard /.  Not sure why the backets around MD5 do not need to be escaped, but it appears they does not.

2. The
(.){MD5}.*${$1}

Open in new window

section has me confused.  I'm just not sure why this is working.  In the brackets that have $1, I can put {$1} or {} and still get the desired result.  I don't understand how this is removing the second field.

Also, I can remove the $, which I assume is a end of line anchor and still acheive the desired results.  I just want to make sure that it will always remove the second field.

This regx works even if the space is removed between the two feilds, which if great, but I don't see how.  The explanation that I have found on what (.) means doesn't seem to match how this is working.  Thanks
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 26

Accepted Solution

by:
wilcoxon earned 300 total points
ID: 36589624
1) That's exactly why I like using m{} or s{}{} - you only have to escape an unmatched { or } (whereas with /, you always have to escape /).  I also use vim for editing so {} makes for quick jumps around long regexes.

2) (.) captures one character (to make sure it isn't the beginning of the string) and {MD5}.*$ matches (but does not capture) from {MD5} to the end of the string.  The $1 simply puts the character captures (the one just before {MD5}) back into the original string.  In this case, it is probably a space so it's not obvious.  If it will always be a space just before the second {MD5} and you don't care about keeping it then using s{\s{MD5}.*$}{} will work a tiny bit more efficiently.  Yes, $ is the end-of-string marker - if the preceeding part is .* or .+, it will (almost?) always match to the end of the string anyway - I include $ to make it explicit.
0
 

Author Comment

by:credog
ID: 36590013
OK.  Great explanation.  Now that I understand what this does I have added the following to make sure this behaves the same way if by chance $variable has a space in the front for some reason.
sub trim {
    $_[0]=~s/^\s+//;
    $_[0]=~s/\s+$//;
    $_[0]=~s/\s+//g;
    return;
.....
}
elsif ($variable =~ /\{MD5\}BY2I75mxmgp1DPwsTYKiargac7==/i {
      print "Variable is $variable\n;
      trim ($variable);
      $variable=~ s{(.){MD5}.*$}{$1};
      print "Variable is $variable\n;
}

Open in new window

This seems to work pretty well.  It removes any possible leading space so I can be assured the regular expression is not grabbing the wrong field. Also it gets rid of any space in the middle which I do not need and I'm not sure if there could be none, 1 or several spaces.  I don't think I'm missing anything?
0
 
LVL 84

Expert Comment

by:ozo
ID: 36590981
$_[0]=~s/^\s+//; # removes leading spaces
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
ID: 36590990
and so does
  $_[0]=~s/\s+//g;
which makes
  $_[0]=~s/^\s+//;
redundant

  $_[0]=~s/\s+$//;
would also be redundant
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now