Link to home
Start Free TrialLog in
Avatar of Mike_Siegel
Mike_Siegel

asked on

Passing arrays in Perl

I'm trying to return an array of "shared" members between two arrays (strings).

Now my subroutine seems to work, but I can't read the return array.

I suspect I'm passing a pointer to an array rather than an array, or something similar... but this is practically right from the textbook:
http://www.unix.org.ua/orelly/perl/cookbook/ch10_06.htm 

Any advice would be great, not much of a programmer!

here's the call:
 
@sharemem = SharedMembers(\@grparray1,\@grparray2);
print "Shared mem @sharedmem[0]\n";

the subroutine:

sub SharedMembers {
   my($arr1, $arr2) = @_;
   my @arr3;
   my $n = 0;

   for ($i=0;$i < @$arr1; $i++)
   {
       
       for ($j=0; $j < @$arr2; $j++)
          {
      
            if($arr1->[$i] eq $arr2->[$j])
            {
                $arr3[$n] = ($arr3[$n],$arr1->[$i]);
                $n++;
               print "Match on $arr1->[$i] $arr2->[$j]\n";
            }
          }
   }
   print "arr3x0 @arr3[0]\n";
   return @arr3;
}

the output:

Match on Local Admins Local Admins
Match on XXX Class B XXX Class B
Match on Domain Users Domain Users
Match on XXX All XXX All
Match on xxx xxx
Match on xxxc xxxc
arr3x0 Local Admins
Shared mem

What I'm looking for is
Shared mem Local Admins :(
SOLUTION
Avatar of manav_mathur
manav_mathur

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mike_Siegel
Mike_Siegel

ASKER

Wow, I'm an idiot.  Thanks!
Also, for your next assignment, I suggest looking into how the 'for' or 'foreach' construct can be used to avoid explicit manipulation of array indexes.

http://www.perl.com/doc/FMTEYEWTK/style/slide22.html
I hate B grades.

Manav
Avatar of ozo
use strict;
#and
use warnings;
#are both useful for catching errors like that.

              $arr3[$n] = ($arr3[$n],$arr1->[$i]);
              $n++;
looks strange.
why are you writng push@arr3,$arr1->[$i]; in such an unusual way?
For the record (and I add these notes after the question has been answered so other readers can learn from what's been given and then alternative (hopefully better) techniques), you can do cleaner eliminations/duplicates using hashes.

If you stick your array into a hash -- where each array element is a hash key (set the value to be 1 for simplicity) -- then you instantly remove duplicates from the array. Do this for both arrays and you can then loop through the keys of one hash and see if that key exists in the other hash. If it does that key exists in both hashes, store it.


sub find_dups($$){
  my %hash1 = map { $_=>1 } @{$_[0]};
  my %hash2 = map { $_=>1 } @{$_[1]};
  my @dups;
  for(keys(%hash1)){
    $hash2{$_} && push(@dups, $_);
  }

  return @dups;
}

my @shared = find_dups(\@arr1, \@arr2);

The @{$_[0]} means dereference the array-reference in the first (0th) parameter of the sub routine's parameter list. E.g. convert it back into an array.

The map { $_ => 1 } code is an in-line version of:

my %hash;
for(@array){
 $hash{$_} = 1;
}

It works because map() passes each element of the input array (the above is effectively map({$_ => 1}, @array); ) to the given anonymous sub-routine. Compare it to   sort { $b <=> $a }  for reverse numerical sorting if it helps to see how its used. $_ is set to each element of the array in turn and the output of the anonymous sub-routine (coderef) is stored in $_. Think of it as a pipeline - you put something in, it jiggles it around, and you get something out. The order doesn't change and for every element you put in you get one out, but you can change the contents of the element. In this case we're replacing $_ with $_ => 1 which Perl understands to mean  $_, 1 - I.e. two elements. This means that an input array of, say   a b c d becomes   a, 1, b, 1, c, 1, d, 1.

When you assign an array to a hash variable Perl takes pairs of elements from the array for the key - value. E.g.

 my %hash = qw(a b c d);

Is the same as
 my %hash = (a=>'b', c=>'d');

We want to have our array (a b c d) as the keys though so by inserting the '1's we end up with:

  my %hash = (a=>1, b=>1, c=>1, d=>1);

all in the compact form of   map { $_ => 1}  qw(a b c d);

Note you can also write the first two lines more compactly as:

  my %hash1 = map { $_=>1 } @{+shift};
  my %hash2 = map { $_=>1 } @{+shift};

Assuming your arrayrefs are in the parameter array - you need the + symbol or Perl will see @{shift} as @shift which is a variable which doesn't exist. The + forces it to see it as a function.

The for() loop itself can also be condensed from the above as you want to return the values you find as duplicates:

sub find_dups($$){
  my %hash1 = map { $_=>1 } @{+shift};
  my %hash2 = map { $_=>1 } @{+shift};
  return grep { $_ } map { exists($hash2{$_}) ? $_ : 0} keys(%hash1);
}

This again uses the map() function to alter the value of $_. This time $_ is each element from keys() of %hash1 - E.g. its each key in the list. This time the map is saying if I found $_ (each key from %hash1 in turn) in %hash2 then return $_. Otherwise return 0. This means that you'll get a list like:   a, 0, c, d  (assuming 'b' isn't in array #2). Since this is the array which comes out of map() we need to remove the 0's which is what the grep() function is doing. Its similar to map() and sort() only it will return the element if the return value of the anonymous sub-routine is 1 and nothing otherwise. Effectively we're using it to strip out the 0's and leave just the valid keys. This array is then returned.

You can compact this even more by replacing the map with a more complex grep():

  return grep { $hash2{$_} } keys(%hash1);

$hash2{$_} will return 1 for the key $_ if it exists in %hash2 (because that's its value, you can use exists($hash2{$_}) to just see if it exists irrespective of its value). This means that if it exists grep() will pass through the value that was passed into it (the key to look up) and if it doesn't grep won't return anything.

If you'e looking at this carefully you'll also notice that its wasting resources converting the first array into a hash only to take the keys() of it. Since what you're getting out are the keys you put in you could just avoid the conversion and simply do:

(I've flipped the order so we're now taking the elements of the second array and looking for them in the first)

sub find_dups($$){
  my %hash1 = map { $_=>1 } @{+shift};
  return grep { $hash1{$_} }  @{+shift};
}

Which is much more compact than any of the previous examples (including the one which was accepted).


% perl -Mstrict -Mwarnings
sub find_dups($$){
  my %hash1 = map { $_=>1 } @{+shift};
  return grep { $hash1{$_} }  @{+shift};
}

print join(', ', find_dups([qw(a c d e f g h)], [qw(g f b e)])), "\n";

Gives => g, f, e


NOTE: This will leave duplicates if the exist in the second array. E.g.;

print join(', ', find_dups([qw(a c d e f g h)], [qw(g f b e e)])), "\n";

Gives => g, f, e, e

If you're worried about that, use the two hashes method.

Note also, if you're doing the reverse of this - trying to find elements only in one array - then it becomes more cumbersome again. You can't simply reverse the lookup logic in the grep() because all that tells you is to return elements from array2 not in array1. E.g.
(WRONG EXAMPLE)

sub find_dups($$){
  my %hash1 = map { $_=>1 } @{+shift};
  return grep { !$hash1{$_} }  @{+shift};
}
print join(', ', find_dups([qw(a c d e f g h)], [qw(g f b e e)])), "\n";

Returns 'b'

Doing that you're only halfway there because while you do get the elements in array2 that aren't in array1, you don't get to know what elements exist in array1 which don't exist in array2 (in this case a, c, d, and h). Now since we're already making the comparison to see what is there we know that if we find the element from array2 in array1 (the hash) it exists in both and needs to be removed from both. We know how to ignore those elements (the grep()) from array2 so if we could just remove them from array1 (the hash) what we'd be left with is the elements of array2 which don't appear in the hash, and a hash containing the elements which don't appear in array2. I.e. the elements which don't appear in both arrays.

Fortunately we can do that pretty simply. If the element from array2 exists in the hash we delete it from the hash, because it appears in both places. We can do that using the delete() function. Happily this function returns the value of the key that was deleted so if we remove an element from our hash using delete() we'll get the value of that key - the value 1. E.g.

my %hash = (a=>1, b=>1, c=>1, d=>1);
print delete($hash{a});

Returns '1'

This is great because grep() allows us to filter values based on true/false values. And we know from our 'halfway' point that we can do:

  grep { !$hash1{$_} }

To filter out elements from array2 which DO exist in the hash (which was built from array1). So we can simply replace that with:

  grep { !delete($hash1{$_}) }

And if the element from array2 ($_) exists in the hash then it is deleted *and* it is filtered *out of* the list. This means that when we're finished filtering each elements of array2 through the grep to we have not only removed elements which appear in both lists from array2, but also from the hash. E.g. (the hash has a, b, d, and the array b, c and d, so unique elements are a (in the hash) and c (in array2)

my %hash1 = (a=>1, b=>1, d=>1);
print join(', ',   grep { !delete($hash1{$_}) } qw(b c d)), "\n",
        join(', ', keys(%hash1)), "\n";

Returns:
c
a


Which tells us we have the right information. All we need to do is collect it together and return it:

sub find_uniques($$){
  my %hash1 = map { $_=>1 } @{+shift};
  my @notin2 = grep { !delete($hash1{$_}) }  @{+shift};
  return @notin2, keys(%hash1);
}
print join(', ', find_uniques([qw(a c d e f g h)], [qw(g f b e)])), "\n";

Which returns:
b, h, a, c, d

As expected.

BUT NOTE: Duplicates in the second array screw things up. Take for example the following, with 2 e's:

print join(', ', find_uniques([qw(a c d e f g h)], [qw(g f b e e)])), "\n";

Returns:
b, e, h, a, c, d

Which seems wrong, but is actually right for the code we have. Why? Because on the first 'e' we delete 'e' from the hash. On the second 'e' we find that it doesn't exist in the hash and is therefore "unique" to array2. This is unlikely to be what you want so there are two fixes. You can go back to using two hashes which causes the array passed into the grep() to contain unique elements (because assigning 'e' => 1 to a hash assigns the value of 1 to the key 'e', it doesn't matter how many times you do it, the key is the same (the letter 'e') and so the value is whatever the last value is, since we're always using the same value it removes duplicates in the array - termed "uniquifying" the array):

sub find_uniques($$){
  my %hash1 = map { $_=>1 } @{+shift};
  my %hash2 = map { $_=>1 } @{+shift};
  my @notin2 = grep { !delete($hash1{$_}) } keys(%hash2);
  return @notin2, keys(%hash1);
}
print join(', ', find_uniques([qw(a c d e f g h)], [qw(g f b e e e e)])), "\n";

Which returns b, h, a, c, d as expected.


btw for any newbies reading this don't be like "wow I could never do that". This (hopefully) shows how Perl lets you use a little piece of knowledge and piece it together with other pieces of knowledge to make a better program (a better balance of speed/performance/compactness/readability).

And of course, there is no right way to do it though, as the saying goes. Using multiple hashes may look cleaner, but uses more memory. Deleting from a hash may not save memory and maybe slower because of the operations involved in seeking and rewriting the data (theoretically the program could require more memory to do the delete because it has to mark the old key-value pair as unused and then find enough space for the new key-value pair)

Wow, I really wish I had time to spout all of this stuff before someone else answered the post. I bet it would be worth some points or whatever we get here :) Shame I'm busy doing this for a living really, but hopefully it will help someone searching for an answer to this problem (or maybe just help you understand how things can be improved on - which isn't to say the original versions where bad)

Colin.