ggjones
asked on
How to create an array of uniques from two arrays, that also addresses transposed strings
I have two arrays with the following values:
Array 1:
Blue ball
Red ball
Small green ball
Big orange ball
Array 2:
Blue ball
Red ball
Small green ball
Ball red
Ball blue
Big orange ball
Blue ball
Purple ball
Pink Ball
I need to output an Array3 that includes only uniques (Purple ball, Pink Ball). This excludes dupes(eg: blue ball) AND transposed dupes (eg:ball blue)
many thanks,
GJ
Array_merge will combine, array_unique will remove duplicates. However this won't deal with our transposed issues. To do this I think you could then explode each value on spaces, then I dunno. But that gets you pretty close. Maybe I'll thin of the rest.
ASKER
thanks for replying Aaron.
The "transposed issues" are addressed here:
https://www.experts-exchange.com/questions/27398480/How-to-use-'array-unique-on-transposed-strings.html
The challenge, essentially, is to modify this solution to include 2 arrays. array_merge makes sense... but I think something is missing....
GJ
.
Are you interested in case-sensitivity?
ASKER
Hi Ray... no, in most cases I'm insensitive, heh, heh.
Case issues are handled down-stream. No need to go there for this.
regards,
GJ
Case issues are handled down-stream. No need to go there for this.
regards,
GJ
Thanks. How about permutations:
Small green ball
Small ball green
ball Small green
ball green Small
What are the rules you want to apply here?
Small green ball
Small ball green
ball Small green
ball green Small
What are the rules you want to apply here?
Also, on the issue of case-sensitivity consider this..
Blue ball (vs) ball Blue -- these are simply rearranged. But the capitalization is questionable in the context of natural language.
Blue ball (vs) Ball blue -- these are rearranged and the capitalization is sensible in an English-language sort of way. But if case truly does not matter a better test would come from this, where everything is normalized to one case.
BLUE BALL (vs) BALL BLUE
Most of PHP's string and array functions are case-sensitive, so I think it is important to be clear on the rules about the case of the strings.
Thanks, ~Ray
Blue ball (vs) ball Blue -- these are simply rearranged. But the capitalization is questionable in the context of natural language.
Blue ball (vs) Ball blue -- these are rearranged and the capitalization is sensible in an English-language sort of way. But if case truly does not matter a better test would come from this, where everything is normalized to one case.
BLUE BALL (vs) BALL BLUE
Most of PHP's string and array functions are case-sensitive, so I think it is important to be clear on the rules about the case of the strings.
Thanks, ~Ray
ASKER
... now we get into an area of linguistic complexity that is several degrees beyond "ball blue" , and that I would dearly love to address - thank you for teasing out the problem.
Here are three examples that immediately jump out:
1) Contractors - Plumbers and Plumbing, Plumbers Plumbing Contractors, Plumbing Contractors
2) Flowers Plants and Trees Artificial, Flowers Plants Trees Artificial
3) Tanning Salons, Tanning Salon
Rules to apply? well, clearly these pairs are duplicates-in-meaning. But how to extrapolate pattern-recognition to the realm of meaning is a real challenge, isn't it...
.
Here are three examples that immediately jump out:
1) Contractors - Plumbers and Plumbing, Plumbers Plumbing Contractors, Plumbing Contractors
2) Flowers Plants and Trees Artificial, Flowers Plants Trees Artificial
3) Tanning Salons, Tanning Salon
Rules to apply? well, clearly these pairs are duplicates-in-meaning. But how to extrapolate pattern-recognition to the realm of meaning is a real challenge, isn't it...
.
ASKER
... thanks Ray.
regarding case, my data is of random case, so I apply this prior to output:
ucwords(strtolower($theStr ing]).
GJ
regarding case, my data is of random case, so I apply this prior to output:
ucwords(strtolower($theStr
GJ
OK, good. With that transformation we can make some progress. Plurals are a bit more complex. But case-sensitivity can be neutralized. I might skip the ucfirst() and go with strtoupper() since MySQL queries are by default case-insensitive (but PHP is case-sensitive). In PHP RAY is not the same as Ray, but in MySQL Ray and RaY and rAY are the same unless you use the BOOLEAN attribute.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I think array_diff followed by array_merge is sufficient to get the uniques...
so first do array_diff with array1, array 2 as input [sequece in important]
than array_diff with array2, array 1 as input [sequence is important]
than array_merge array1 , array 2
Checkout these links before implementing it ...
http://php.net/manual/en/function.array-diff.php
http://in2.php.net/manual/en/function.array-merge.php
so first do array_diff with array1, array 2 as input [sequece in important]
than array_diff with array2, array 1 as input [sequence is important]
than array_merge array1 , array 2
Checkout these links before implementing it ...
http://php.net/manual/en/function.array-diff.php
http://in2.php.net/manual/en/function.array-merge.php
ASKER
... thank you very much gentleman; I will sift through this today, and figure out how best to apply the logic to my code.
Brian, the preg_replace statements you proposed to manage plurals - very elegant, by the way, in terms of coverage - should these be inserted immediately after line 33 - " $normalisedString = implode( " ", $exp );" ?
Ray, the introduction to me of metaphone() opens up all sorts of possibilities for other applications as well. In terms of managing plurals for this case, could you elaborate a bit more please? I assume the metaphone($string) call would be inserted in each of the initial for-loops, and then the returned value stored ?? ... or would the initial string simply be replaced, and then converted back for the new array of uniques?
regards,
GJ
Brian, the preg_replace statements you proposed to manage plurals - very elegant, by the way, in terms of coverage - should these be inserted immediately after line 33 - " $normalisedString = implode( " ", $exp );" ?
Ray, the introduction to me of metaphone() opens up all sorts of possibilities for other applications as well. In terms of managing plurals for this case, could you elaborate a bit more please? I assume the metaphone($string) call would be inserted in each of the initial for-loops, and then the returned value stored ?? ... or would the initial string simply be replaced, and then converted back for the new array of uniques?
regards,
GJ
"
Brian, the preg_replace statements you proposed to manage plurals - very elegant, by the way, in terms of coverage - should these be inserted immediately after line 33 - " $normalisedString = implode( " ", $exp );" ?"
Yes. They go between the creation of the normalisedString and its insertion in the array.
Best of luck
BP
Brian, the preg_replace statements you proposed to manage plurals - very elegant, by the way, in terms of coverage - should these be inserted immediately after line 33 - " $normalisedString = implode( " ", $exp );" ?"
Yes. They go between the creation of the normalisedString and its insertion in the array.
Best of luck
BP
Ray said "@Brian: I like your solution and had I stayed up last night I might have come up with something like that."
We all got to sleep sometime Ray...
:-D
We all got to sleep sometime Ray...
:-D
ASKER
Hi Ray... I'm finding some anomalous behavior.
The output array does not include uniques from array2, if their indices are less than or equal to the highest Array1 index.
I cant figure out the reason though...
GJ
Array1
(
[0] => Blue ball
[1] => Small green ball
[2] => Purple ball
[3] => Ball red
[4] => Ball blue
[5] => Big orange ball
[6] => Blue ball
[7] => Brown ball
)
Array2
(
[0] => Blue ball
[1] => Red ball
[2] => Pink Ball
[3] => white Ball
[4] => ball Small green
[5] => Small green ball
[6] => Ball red
[7] => Ball black
[8] => Big orange ball
[9] => Ball blue
[10] => Small ball green
[11] => Blue ball
)
ArrayOut_Actual
(
[0] => Purple ball
[1] => Brown ball
)
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
[] => Ball black
[] => Pink Ball
[] => white Ball
[] => Ball black
)
ASKER
Hi Brian...
I'm getting an odd result. I'm unclear what it represents; it certainly is not the uniques though!
Any ideas why this should be??
GJ
$myarray1 = array(
'Blue ball',
'Small green ball',
'Purple ball',
'Ball red',
'Ball blue',
'Big orange ball',
'Blue ball',
'Brown ball'
);
$myarray2 = array(
'Blue ball',
'Red ball',
'Pink Ball',
'white Ball',
'ball Small green',
'Small green ball',
'Ball red',
'Ball black',
'Big orange ball',
'Ball blue',
'Small ball green',
'Blue ball'
);
ArrayOut_Actual
(
[0] => Blue ball
[1] => Small green ball
[2] => Pink Ball
[3] => white Ball
[4] => Big orange ball
[5] => Ball black
[6] => Ball red
)
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
[] => Ball black
[] => Pink Ball
[] => white Ball
[] => Ball black
)
have you tried this ...
first do array_diff with array1, array 2 as input [sequece in important]
than array_diff with array2, array 1 as input [sequence is important]
than array_merge array1 , array 2
Checkout these links before implementing it ...
http://php.net/manual/en/function.array-diff.php
http://in2.php.net/manual/en/function.array-merge.php
first do array_diff with array1, array 2 as input [sequece in important]
than array_diff with array2, array 1 as input [sequence is important]
than array_merge array1 , array 2
Checkout these links before implementing it ...
http://php.net/manual/en/function.array-diff.php
http://in2.php.net/manual/en/function.array-merge.php
ASKER
Hi Ray... Brian...
If you could spare a moment... this anomalous behavior has me stumpted!
cheers,
GJ
If you could spare a moment... this anomalous behavior has me stumpted!
cheers,
GJ
I cannot see why you are expecting this result
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
[] => Ball black
[] => Pink Ball
[] => white Ball
[] => Ball black
)
Why should the blue, orange and green balls be omitted?
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
[] => Ball black
[] => Pink Ball
[] => white Ball
[] => Ball black
)
Why should the blue, orange and green balls be omitted?
Also you have 'Ball black' twice...
ASKER
Brian... you are of course correct; I have much in common with a bag of hammers.
But that is not all, oh no, that is not all.
I have also failed to articulate the problem correctly.
The third Array is supposed to include the values of Array1 that are NOT in Array2. I think Ray got it correct after all; sorry Ray.
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
)
Talk about cognitive dissonance. I'm not even sure what I was thinking. A momentary lapse? Heh, probably insight is what is momentary!
Thanks for correcting me Brian, and for all of your effort.
cheers,
GJ
So where are we here? I am confused as to what (if anything) I need to be doing....
:-O
:-O
ASKER
Brian....
Your approach and Ray's appear to be quite different. I'm curious as to efficiency of the methods with respect to speed/performance.
In my testing, I'm looping through 100 records at a time.... so, 300 arrays each with 5 to 10 values.
I would be curious to try each of your respective approaches to see if there is a discernible performance difference.
Would you be able to tweak your output so that :
The third Array includes the values of Array1 that are NOT in Array2.
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
)
regards,
GJ
Your approach and Ray's appear to be quite different. I'm curious as to efficiency of the methods with respect to speed/performance.
In my testing, I'm looping through 100 records at a time.... so, 300 arrays each with 5 to 10 values.
I would be curious to try each of your respective approaches to see if there is a discernible performance difference.
Would you be able to tweak your output so that :
The third Array includes the values of Array1 that are NOT in Array2.
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
)
regards,
GJ
$myarray1 = array(
'Blue ball',
'Small green ball',
'Purple ball',
'Ball red',
'Ball blue',
'Big orange ball',
'Blue ball',
'Brown ball'
);
$myarray2 = array(
'Blue ball',
'Red ball',
'Pink Ball',
'white Ball',
'ball Small green',
'Small green ball',
'Ball red',
'Ball black',
'Big orange ball',
'Ball blue',
'Small ball green',
'Blue ball'
);
ArrayOut_Should_be
(
[0] => Purple ball
[1] => Brown ball
)
OK, modified code below. However, if you read my EE profile you will see that efficiency is the least of my concerns unless it makes itself a problem.
Output
Array
(
[0] => Purple ball
[1] => Brown ball
)
<?php
class CleanUp {
protected $result;
protected $originals;
function __construct() {
$this->result = array();
$this->originals = array();
}
function add( $arr ) {
// Process the array
//
foreach( $arr as $index => $string ) {
// Convert the string to lower case and spilt it
//
$exp = explode(" ", strtolower( $string ) );
// Sort the resulting array and convert it back to a string
//
asort( $exp );
$normalisedString = implode( " ", $exp );
// Check if the new string has already been seen before. If not record it and its index
//
$i = array_search( $normalisedString, $this->result );
if ( $i === false ) {
$this->result [$index] = $normalisedString;
$this->originals [$index] = $string;
}
}
}
function remove( $arr ) {
// Process the array
//
foreach( $arr as $string ) {
// Convert the string to lower case and spilt it
//
$exp = explode(" ", strtolower( $string ) );
// Sort the resulting array and convert it back to a string
//
asort( $exp );
$normalisedString = implode( " ", $exp );
// Check if the new string has already been seen before. If not record it and its index
//
$i = array_search( $normalisedString, $this->result );
if ( $i !== false )
unset( $this->result[$i] );
}
}
function cleanArray() {
// Array processed and all duplicates removed. Build the new results array
//
$newArr = array();
foreach($this->result as $index => $string )
$newArr [] = $this->originals [$index];
return $newArr;
}
}
$myarray1 = array(
'Blue ball',
'Small green ball',
'Purple ball',
'Ball red',
'Ball blue',
'Big orange ball',
'Blue ball',
'Brown ball'
);
$myarray2 = array(
'Blue ball',
'Red ball',
'Pink Ball',
'white Ball',
'ball Small green',
'Small green ball',
'Ball red',
'Ball black',
'Big orange ball',
'Ball blue',
'Small ball green',
'Blue ball'
);
// Instantiate the class
//
$clean = new CleanUp();
$clean->add( $myarray1 );
$clean->remove( $myarray2 );
echo "<pre>";
print_r( $clean->cleanArray() );
echo "</pre>";
Output
Array
(
[0] => Purple ball
[1] => Brown ball
)
Instantiating a class is a performance heavy operation, so I have modified the class so it only needs instantiating once and can then be reset with an initialisation method. Try the original (above) and then this (below).
I ran a little test of my own over 10,000 iterations and got this
10,000 instances - 1.93467 seconds
1 instance, 10,000 initialisations - 1.85216 seconds
so my second method saved 0.08251 seconds over 10,000 iterations or 8 microseconds per iteration.
This is why I do not worry much about efficiency.
I ran a little test of my own over 10,000 iterations and got this
10,000 instances - 1.93467 seconds
1 instance, 10,000 initialisations - 1.85216 seconds
so my second method saved 0.08251 seconds over 10,000 iterations or 8 microseconds per iteration.
This is why I do not worry much about efficiency.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks Brian. Having tested it across a wide variety of scenarios, it has held up well. As you say, performance is not really an issue. The only things I have added so far are the code to handle plurals that you also contributed, and a check to ensure that none of the input arrays are null.
I did some comparisons with Ray's method, and his method chocked... on pizza :-)
$array1 = array(
'pizza',
'pizza restaurants',
'restaurants pizza'
);
$array2 = array(
'Restaurants',
'Pizza Restaurant'
);
outputted:
Array
(
[0] => pizza
[1] => pizza
[2] => pizza
[3] => restaurants pizza
[4] => restaurants
[5] => restaurants
[6] => pizza restaurant
)
many thanks to both of you for helping me out on this task.
regards,
GJ
Glad you are sorted. I wonder if Ray likes Pizza???
:-)
:-)