Solved

Perl Count Occurrences  of string in Variable

Posted on 2011-09-27
9
740 Views
Last Modified: 2012-05-12
This is a followup to a recent question (ID: 27323183).  Given a variable that may contain any of the following:
$variable = {MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== {MD5}BY2I75mxmgp1DPwsTYKiarga==
$variable = {MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==

In my previous question I was looking for any occurance of the second field ({MD5}BY2I75mxmgp1DPwsTYKiarga==) and if it was there remove it and keep the first field.  This is working fine.

But in the future the second field may not be a consistant value.  The {MD5} portion will always be there, but the rest of the characters may change.

I would like to search for the number of '{MD5}' occurrences and if it is greater than 1 remove the second one.  I have found a number of ways to do this. But, since I'm unclear exactly what the following options do,  I am looking for recommendations on the best/effecient manner.
while ($variable =~ /\{MD5\}/g) { $count++ }
$count = @{[$variable =~ /\{MD5\}/g]};
$count = () = $variable =~ /\{MD5\}/g;

Open in new window


Not sure if it matters, but the chosen method will be within a couple of nested foreach loops.

Once the count is obtained it will be in a if/elsif/else and if the count is greater than 1 the second one will be removed, something like this:

if ($variable !~ /^\{/) {
        print "Not valid\n";
}
elsif ($count > 1) {
        print "More than 2 exist\n";
        . . . . Do some stuff to clean and remove second
}
else {
        print "Validn";
}

Open in new window

0
Comment
Question by:credog
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 200 total points
ID: 36710915
You should be able to do something along the lines of a replacement:

$variable =~ s#(\{MD5\}[^{]+).*#\1#;

Open in new window


Which would essentially erase any trailing MD5 occurrences. Here's how:

s         - Regex replace
#         - Pattern delimiter
========  Find This ========
(         - Start of first capture group
/{MD5\}   - Literal {MD5}
[^{]+     - One or more ( + ) of any character NOT ( ^ ) an opening brace ( { )
)         - End of first capture group
.*        - Zero or more ( * ) of any character ( . )
#         - Pattern delimiter
========  Replace With This ========
\1        - The value that was captured by the first ( 1 ) capture group
#         - Pattern delimiter

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 36710933
P.S.

In short, what happens above is that you look for the first {MD5} occurrence, save it in a capture group for later use, then try to match anything that comes after it. The overall match will contain both the first occurrence and anything that may (or may not) come after it. You replace the overall match with what you stored in the first capture group--the first MD5 occurrence. Anything that came after the first occurrence goes bye-bye!

You can copy your initial string to another variable if you are concerned with data loss in the current variable. In the copy, you won't care what gets trimmed; you only care that you got the first MD5 occurrence.
0
 
LVL 9

Expert Comment

by:parparov
ID: 36712248
If there are more than 2 occurrences should the third one be removed too?
0
 

Author Comment

by:credog
ID: 36712566
parparov: yes the third should be removed as well.

kaufmed:  I currently have a working way to remove all but the first MD5 string, but will give your code a try as well.  Thank you.

What I would like to do at this point is to not even manipulate $variable unless it contains at least two instances of {MD5}.  I came up with the three options above and was curious if one method has any advantages over the other. Or, is there  a better way.  I tried using tr, but that really was not sutable for this.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 9

Expert Comment

by:parparov
ID: 36712756
Here's a way. Please note that trailing space is reserved, tell me if you want to get rid of it.
my $str = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== {MD5}BY2I75mxmgp1DPwsTYKiarga== {MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==";
$str =~ s/^(\{MD5\}.*?)\{MD5\}.*/$1/e;
print "STR $str!\n";
my $str1 = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==";
$str1 =~ s/^(\{MD5\}.*?)\{MD5\}.*/$1/e;
print "STR $str1!\n";

Open in new window

0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 36712825
I currently have a working way to remove all but the first MD5 string

I based my suggestion on this statement:

I would like to search for the number of '{MD5}' occurrences and if it is greater than 1 remove the second one.



What I would like to do at this point is to not even manipulate $variable unless it contains at least two instances of {MD5}.
You can ammend the pattern I suggested a tad:

$variable =~ m#(\{MD5\}[^{]+).*?\{MD5\}.*#;

Open in new window


The you should find at least two occurrences. I've left the capture group in place, so you would subsequently have access to a special Perl variable $1, which refers to the text captured by the first capture group--in this case, the first MD5 occurrence.


I tried using tr, but that really was not sutable for this.
I don't see how tr would benefit you here.
0
 

Author Comment

by:credog
ID: 36713268
Sorry for the confusion, my question was not clear.  All the suggestions seem to work fine and I will be trying them out.  

For now, I just want to count the number of substrings {MD5} in the variable and save the number to another variable.  In my original post I had three options listed:
while ($variable =~ /\{MD5\}/g) { $count++ }
$count = @{[$variable =~ /\{MD5\}/g]};
$count = () = $variable =~ /\{MD5\}/g;

Open in new window


If the $variable contains:
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== {MD5}BY2I75mxmgp1DPwsTYKiarga==

Then all three options above will count two occurrenences and store the number 2 in $count.   I am curious if one method is preferable over another to count the substrings {MD5}.  I made the tr comment becouse tr can be used to count characters, but I need to count the substrings.

Since this counting of substrings is used in a couple of nested foreach loops, I just wanted opions on the best method.
0
 
LVL 4

Assisted Solution

by:Anacreo
Anacreo earned 75 total points
ID: 36713356
Pardon me but... is this oversimplifying? if whitespace is reasonably expected between MD5 hashes?

#!/bin/perl
$variable = "{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC== # this is a nice example of an MD5 hash {MD5}BY2I75mxmgp1DPwsTYKiarga== This is a hash too!";
foreach $hash ( split(/\s/,$variable) ) {
  if ($hash !~ /^{MD5}.*==$/) { next; }
  $md5s[$md5count++] = $hash;
}
print join("\n",$md5count, @md5s) . "\n";

/ >./split_var.pl
2
{MD5}lL5Bo6QZUIcsEuxusMBXMR1YgC==
{MD5}BY2I75mxmgp1DPwsTYKiarga==

0
 
LVL 9

Accepted Solution

by:
parparov earned 225 total points
ID: 36713374
The third way looks most efficient one.
First is definitely slower, and second one uses more memory.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

It is a general practice to get rid of old user profiles on a computer  in a LAN environment. As I have been working with a company in a LAN environment where users move from one place to some other place at times. This will make many user profil…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now