?
Solved

Check for all duplicates problem, Array and Hash HELP !

Posted on 2006-05-22
5
Medium Priority
?
379 Views
Last Modified: 2008-02-01
Ok, day 3 now and im stuck again.
I have looked every where and tried quite a few examples and cannot count how many
duplicate values are in an array, so every where I looked said you shoud use a hash,
still I cant get it.

I use a 1 regexp to popultate this list, like
@dupecheck1,"$2,$3,$6,$8,$1"; ###   $2= date $3=time $6=ip  $8=badword $1=sesionnum Regexp looks for incorrect!

The other regexp,
@dupecheck2,"$2,$3,$6,$8,$1"; #$2= date $3=time $6=ip  $8=badword $1=sesionnum Regexp looks for Bad Usernames

@dupcheck =(@dupecheck1,@dupcheck2);  ## Only did it this way for trouble shootng !!!
#  So now I can Add to hash %user the badword from above $8 now in the array as dupecheck[3],

foreach $item(@dupecheck){
if ($dupecheck[3]){
$user{$dupecheck[3]}++;
}
$counter++; #end foreach $item(@dupecheck)
}

$duplicates = 0;# declare var and set to 0

#for each name in the hash "user", if the count is greater than 1 it is a duplicate so print
foreach $a(keys %user){
if ($user{$a}>1){
print "count for duplicate user [$a] is [$user{$a}]\n";

$duplicates = ($duplicates+($user{$a}-1)); # add to total duplicate count (minus 1 because 3 entries means only 2 duplicates)
}#end of if
}#end of foreach

$percent = (100/$maincounter)*$duplicates;#calculate percentage from results above

#Lets print our results:
print "total count = [$maincounter]\n";
print "total number of duplicates = [$duplicates]\n";
print "percentage of duplicates = [$percent]\n";


ouput is .....

count for duplicate user [5/16/2006,9:34:49,221.141.0.194,Administrator,(000013)] is [122]
total count = [122]
total number of duplicates = [121]
percentage of duplicates = [99.1803278688525]

Now the problem is that in in this instance incorrect! is ignored, If I
change @dupcheck =(@dupecheck1,@dupcheck2); to
           @dupcheck =(@dupecheck2,@dupcheck1);

then the oppisite happens, Administrator is ignored ...

so I found that this example only will find matches on what ever filled $dupecheck[3] the first time


OK, so if you have made it this far
I need to check for all duplicates,  see if the dupes are over an amount $trigger
and if over populate an array with unique ips.

I dont care if we (You ;) use a %hash to begin with, but please show the right way to populate it using the regexp captures..

This one is worth the points to me because I need this yesterday, I have spent way too much time on it already.
If you can please comment the code  ;)

GHBoom



0
Comment
Question by:ghboom
4 Comments
 
LVL 85

Assisted Solution

by:ozo
ozo earned 1000 total points
ID: 16738418
I use a 1 regexp to popultate this list, like
@dupecheck1,"$2,$3,$6,$8,$1"; ###   $2= date $3=time $6=ip  $8=badword $1=sesionnum Regexp looks for incorrect!

I don't understand.  There is no regexp there, and a varible and a string in void context does not populate a list.

You can check for duplicates in @dupecheck with
foreach $item(@dupecheck){
  $user{$item}++;  
}

$duplicates = 0;# declare var and set to 0

#for each name in the hash "user", if the count is greater than 1 it is a duplicate so print
foreach $a(keys %user){
if ($user{$a}>1){
print "count for duplicate user [$a] is [$user{$a}]\n";

$duplicates = ($duplicates+($user{$a}-1)); # add to total duplicate count (minus 1 because 3 entries means only 2 duplicates)
}#end of if
0
 
LVL 4

Accepted Solution

by:
ps15 earned 1000 total points
ID: 16738419
maybe you wanted to something like:

@dupcheck =(\@dupecheck1,\@dupcheck2);  # Array of an Array

foreach $item (@dupecheck){ #for each array element
      if (${$item}[3]){
            $user{${$item}[3]}++;
      }
      $maincounter++; #end foreach $item(@dupecheck)
}


your example has $user{$dupecheck[3]}++; but $dupecheck[3] stays the same throughout the foreach loop since @dupecheck isn't changed !
0
 
LVL 17

Expert Comment

by:mjcoyne
ID: 16739776
I was going to suggest the method Ozo suggests.  Looking at your code a bit further, you must be parsing something (presumably like 5/16/2006,9:34:49,221.141.0.194,Administrator,(000013)) with a regular expression earlier on, and putting these values in an array, which you then want to check for duplicates?

Would it not be easier to put these pieces directly into a hash that could track duplicates for you, rather than putting them in an array first and then iterating over the array and into a hash?
0
 

Author Comment

by:ghboom
ID: 17076844
I found a solution, using both replies that fit the bill.

To be honest, I thought I had accepted this already,
My Bad for the delay !!!

:)

Thanks for helping..
GHBoom
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question