# I need help to find frequency of patterns in an array of an array of number sets

I have found out how to get the frequency of numbers inside on just a single array, for example:

\$array1 =  array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13);

The frequency of patterns are:  (3, 5)  and  (5, 48, 4, 7, 13)
both with a frequency of 3.

I got that with the function attached.

What I want is the frequency of how many times those patterns occur between several arrays, for example:

\$array2 = array(
array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
);

I want to know the highest frequency of a single pattern that appears in each array starting with 2 numbers and going to 10 numbers.

my result should yield  10 arrays in an array.

[0]=>array([0]=>3, [1]=>6)
[1]=>array([0]=>22, [1]=>31[2]=>56)..

....etc all the way to 10 arrays the 10th having 10 positions.

The I just cant seem to make it all work with multi-level arrays.

``````function getFrequences2(\$input, \$minSequenceSize = 2) {
\$sequences = array();

\$last_offset = 0;
\$last_offset_len = 0;

\$len = count(\$input);
for (\$i=0; \$i<\$len; \$i++) {
for (\$j=\$i+\$minSequenceSize; \$j<\$len; \$j++) {
if (\$input[\$i] == \$input[\$j]) {
\$offset = 1;
\$sub = array(\$input[\$i]);
while (\$i + \$offset < \$j && \$j + \$offset < \$len) {
if (\$input[\$i + \$offset] == \$input[\$j + \$offset]) {
array_push(\$sub, \$input[\$i + \$offset]);
} else {
break;
}
\$offset++;
}

\$sub_len = count(\$sub);
if (\$sub_len >= \$minSequenceSize) {
// \$sub must contain more elements than the last sequence found
// otherwise we will count the same sequence twice
if (\$last_offset + \$last_offset_len >= \$i + \$sub_len) {
// we already saw this sequence... ignore
continue;
} else {
// save offset and sub_len for future check
\$last_offset = \$i;
\$last_offset_len = \$sub_len;
}

foreach (\$sequences as & \$sequence) {
\$sequence_len = count(\$sequence['values']);
if (\$sequence_len == \$sub_len && \$sequence['values'] == \$sub) {
//echo "Found add-full ".var_export(\$sub, true)." at \$i and \$j...\n";
\$sequence['frequence']++;
break 2;
} else {
if (\$sequence_len > \$sub_len) {
\$end = \$sequence_len - \$sub_len;
\$values = \$sequence['values'];
\$slice_len = \$sub_len;
\$test = \$sub;
} else {
\$end = \$sub_len - \$sequence_len;
\$values = \$sub;
\$slice_len = \$sequence_len;
\$test = \$sequence['values'];
}
for (\$k=0; \$k<=\$end; \$k++) {
if (array_slice(\$values, \$k, \$slice_len) == \$test) {
//echo "Found add-part ".implode(',',\$sub)." which is part of ".implode(',',\$values)." at \$i and \$j...\n";
\$sequence['values'] = \$values;
\$sequence['frequence']++;
break 3;
}
}
}
}

//echo "Found new ".implode(',',\$sub)." at \$i and \$j...\n";
array_push(\$sequences, array('values' => \$sub, 'frequence' => 2));
break;
}
}
}
}

return \$sequences;
};
``````
###### Who is Participating?

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Author Commented:
Basically if you have 10 sets of 20 numbers each, (22, 56) come up the most frequent, thats what i want to know.

and the same for 3 digits and 4 and 5... all the way to 10.  The numbers in each set will not repeat if that helps.  22 will not appear twice in the same set of integers.
0
Commented:
What is this application for?  I'm not exactly new to programming and I have not seen something like this question before.  Where does this data originate?
0
Author Commented:
the # sets are uploaded into the database manually each with an ID key and a date/time

The data will come from a query

the table is called tblNumbers:
[id]
[date]
[numString] - varchar  40

[1] [date/time] [03 05 11 14 16 22 23 27 37 42 49 50 52 54 55 56 59 63 74 77]
[2] [date/time] [03 08 12 23 30 31 35 39 41 45 46 55 57 62 63 65 66 71 77 79]
[3] [date/time] [08 09 12 19 20 24 32 34 39 42 47 49 52 53 58 63 65 69 72 76]

so one the function can get the frequency of patterns, I will want to input a start date and end date of which to calculate the pattern frequency

so the ? I want to ask the data set is,  between jan 1st  and march 23rd  what  2# combinations have the highest frequency.  which 3# combinations, which 4# combinations....... all the way to 10#

if that helps,
let me know, I will get back sooner than last time.

thanks
0
Author Commented:
some constants:

all # string numbers will be between 1 and 80
no # string will have a duplicate #

This is a lottery application I am doing with a friend.
0
Commented:
OK, I think we can do something with that.  But in a VARCHAR(40) field, this will be truncated because it is longer than 40 characters.
03 05 11 14 16 22 23 27 37 42 49 50 52 54 55 56 59 63 74 77

Is the actual format of the numString field always a set of two digits separated by blanks?

In the test data posted above it looks like something with 63 would be one of the most frequent two-digit combos, maybe 03-63.  But that would be the same as 03-77, right?

0
Author Commented:
sorry that varchar was supposed to be higher 100 will do it.

the actual numString field will always be 20 2 number combinations.

I can separate them with anything in the string before uploading to the database, if that helps matters, space comma semi-colin, whatever is easiest.

you are correct with the test data,
I would say there will be a tie frequency especially in the lower combonations, maybe return a list of all of the combinations with the highest frequency.

example:
both  03 77 and 03 66
happen twice
if 2 is the highest frequency then we would get an array with both combos
if there is a set with 3 instances then we would just get the one with the frequency of 3.

0
Author Commented:
Is this capability not possible with PHP?
0
Author Commented:
Anybody have any ideas on this?
0
Commented:
Can't you just "glue" the arrays thogether and work with the data just as you would in one dimensional array ?
0
Commented:
Not tested, try it

``````<?php

function getFrequences2(\$input, \$minSequenceSize = 2) {

\$test = array();

foreach(\$input as \$key => \$val){
\$test = array_merge(\$test,\$val);
}

//print_r(\$test);
\$input = \$test;

\$sequences = array();

\$last_offset = 0;
\$last_offset_len = 0;

\$len = count(\$input);
for (\$i=0; \$i<\$len; \$i++) {
for (\$j=\$i+\$minSequenceSize; \$j<\$len; \$j++) {
if (\$input[\$i] == \$input[\$j]) {
\$offset = 1;
\$sub = array(\$input[\$i]);
while (\$i + \$offset < \$j && \$j + \$offset < \$len) {
if (\$input[\$i + \$offset] == \$input[\$j + \$offset]) {
array_push(\$sub, \$input[\$i + \$offset]);
} else {
break;
}
\$offset++;
}

\$sub_len = count(\$sub);
if (\$sub_len >= \$minSequenceSize) {
// \$sub must contain more elements than the last sequence found
// otherwise we will count the same sequence twice
if (\$last_offset + \$last_offset_len >= \$i + \$sub_len) {
// we already saw this sequence... ignore
continue;
} else {
// save offset and sub_len for future check
\$last_offset = \$i;
\$last_offset_len = \$sub_len;
}

foreach (\$sequences as & \$sequence) {
\$sequence_len = count(\$sequence['values']);
if (\$sequence_len == \$sub_len && \$sequence['values'] == \$sub) {
//echo "Found add-full ".var_export(\$sub, true)." at \$i and \$j...\n";
\$sequence['frequence']++;
break 2;
} else {
if (\$sequence_len > \$sub_len) {
\$end = \$sequence_len - \$sub_len;
\$values = \$sequence['values'];
\$slice_len = \$sub_len;
\$test = \$sub;
} else {
\$end = \$sub_len - \$sequence_len;
\$values = \$sub;
\$slice_len = \$sequence_len;
\$test = \$sequence['values'];
}
for (\$k=0; \$k<=\$end; \$k++) {
if (array_slice(\$values, \$k, \$slice_len) == \$test) {
//echo "Found add-part ".implode(',',\$sub)." which is part of ".implode(',',\$values)." at \$i and \$j...\n";
\$sequence['values'] = \$values;
\$sequence['frequence']++;
break 3;
}
}
}
}

//echo "Found new ".implode(',',\$sub)." at \$i and \$j...\n";
array_push(\$sequences, array('values' => \$sub, 'frequence' => 2));
break;
}
}
}
}

return \$sequences;
}

\$array1 =  array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13);
\$array2 = array(
array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
);

\$test = getFrequences2(\$array2,4);
print_r(\$test);

?>
``````
0

Experts Exchange Solution brought to you by