Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 835
  • Last Modified:

Perl Count Occurrences of string in Variable

This is a followup to a recent question (ID: 27323183).  Given a variable that may contain any of the following:
$variable = {MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== {MD5}BY2I75mxmgp1DPwsTYKiarga==
$variable = {MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==

In my previous question I was looking for any occurance of the second field ({MD5}BY2I75mxmgp1DPwsTYKiarga==) and if it was there remove it and keep the first field.  This is working fine.

But in the future the second field may not be a consistant value.  The {MD5} portion will always be there, but the rest of the characters may change.

I would like to search for the number of '{MD5}' occurrences and if it is greater than 1 remove the second one.  I have found a number of ways to do this. But, since I'm unclear exactly what the following options do,  I am looking for recommendations on the best/effecient manner.
while ($variable =~ /\{MD5\}/g) { $count++ }
$count = @{[$variable =~ /\{MD5\}/g]};
$count = () = $variable =~ /\{MD5\}/g;

Open in new window


Not sure if it matters, but the chosen method will be within a couple of nested foreach loops.

Once the count is obtained it will be in a if/elsif/else and if the count is greater than 1 the second one will be removed, something like this:

if ($variable !~ /^\{/) {
        print "Not valid\n";
}
elsif ($count > 1) {
        print "More than 2 exist\n";
        . . . . Do some stuff to clean and remove second
}
else {
        print "Validn";
}

Open in new window

0
credog
Asked:
credog
  • 3
  • 3
  • 2
  • +1
3 Solutions
 
käµfm³d 👽Commented:
You should be able to do something along the lines of a replacement:

$variable =~ s#(\{MD5\}[^{]+).*#\1#;

Open in new window


Which would essentially erase any trailing MD5 occurrences. Here's how:

s         - Regex replace
#         - Pattern delimiter
========  Find This ========
(         - Start of first capture group
/{MD5\}   - Literal {MD5}
[^{]+     - One or more ( + ) of any character NOT ( ^ ) an opening brace ( { )
)         - End of first capture group
.*        - Zero or more ( * ) of any character ( . )
#         - Pattern delimiter
========  Replace With This ========
\1        - The value that was captured by the first ( 1 ) capture group
#         - Pattern delimiter

Open in new window

0
 
käµfm³d 👽Commented:
P.S.

In short, what happens above is that you look for the first {MD5} occurrence, save it in a capture group for later use, then try to match anything that comes after it. The overall match will contain both the first occurrence and anything that may (or may not) come after it. You replace the overall match with what you stored in the first capture group--the first MD5 occurrence. Anything that came after the first occurrence goes bye-bye!

You can copy your initial string to another variable if you are concerned with data loss in the current variable. In the copy, you won't care what gets trimmed; you only care that you got the first MD5 occurrence.
0
 
parparovCommented:
If there are more than 2 occurrences should the third one be removed too?
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
credogAuthor Commented:
parparov: yes the third should be removed as well.

kaufmed:  I currently have a working way to remove all but the first MD5 string, but will give your code a try as well.  Thank you.

What I would like to do at this point is to not even manipulate $variable unless it contains at least two instances of {MD5}.  I came up with the three options above and was curious if one method has any advantages over the other. Or, is there  a better way.  I tried using tr, but that really was not sutable for this.
0
 
parparovCommented:
Here's a way. Please note that trailing space is reserved, tell me if you want to get rid of it.
my $str = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== {MD5}BY2I75mxmgp1DPwsTYKiarga== {MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==";
$str =~ s/^(\{MD5\}.*?)\{MD5\}.*/$1/e;
print "STR $str!\n";
my $str1 = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==";
$str1 =~ s/^(\{MD5\}.*?)\{MD5\}.*/$1/e;
print "STR $str1!\n";

Open in new window

0
 
käµfm³d 👽Commented:
I currently have a working way to remove all but the first MD5 string

I based my suggestion on this statement:

I would like to search for the number of '{MD5}' occurrences and if it is greater than 1 remove the second one.



What I would like to do at this point is to not even manipulate $variable unless it contains at least two instances of {MD5}.
You can ammend the pattern I suggested a tad:

$variable =~ m#(\{MD5\}[^{]+).*?\{MD5\}.*#;

Open in new window


The you should find at least two occurrences. I've left the capture group in place, so you would subsequently have access to a special Perl variable $1, which refers to the text captured by the first capture group--in this case, the first MD5 occurrence.


I tried using tr, but that really was not sutable for this.
I don't see how tr would benefit you here.
0
 
credogAuthor Commented:
Sorry for the confusion, my question was not clear.  All the suggestions seem to work fine and I will be trying them out.  

For now, I just want to count the number of substrings {MD5} in the variable and save the number to another variable.  In my original post I had three options listed:
while ($variable =~ /\{MD5\}/g) { $count++ }
$count = @{[$variable =~ /\{MD5\}/g]};
$count = () = $variable =~ /\{MD5\}/g;

Open in new window


If the $variable contains:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== {MD5}BY2I75mxmgp1DPwsTYKiarga==

Then all three options above will count two occurrenences and store the number 2 in $count.   I am curious if one method is preferable over another to count the substrings {MD5}.  I made the tr comment becouse tr can be used to count characters, but I need to count the substrings.

Since this counting of substrings is used in a couple of nested foreach loops, I just wanted opions on the best method.
0
 
AnacreoCommented:
Pardon me but... is this oversimplifying? if whitespace is reasonably expected between MD5 hashes?

#!/bin/perl
$variable = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== # this is a nice example of an MD5 hash {MD5}BY2I75mxmgp1DPwsTYKiarga== This is a hash too!";
foreach $hash ( split(/\s/,$variable) ) {
  if ($hash !~ /^{MD5}.*==$/) { next; }
  $md5s[$md5count++] = $hash;
}
print join("\n",$md5count, @md5s) . "\n";

/ >./split_var.pl
2
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==
{MD5}BY2I75mxmgp1DPwsTYKiarga==

0
 
parparovCommented:
The third way looks most efficient one.
First is definitely slower, and second one uses more memory.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 3
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now