• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 423
  • Last Modified:

I need help to find frequency of patterns in an array of an array of number sets

I have found out how to get the frequency of numbers inside on just a single array, for example:

$array1 =  array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13);

The frequency of patterns are:  (3, 5)  and  (5, 48, 4, 7, 13)
both with a frequency of 3.

I got that with the function attached.

What I want is the frequency of how many times those patterns occur between several arrays, for example:

$array2 = array(
                   array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
                    array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
                    array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
                    array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
);
       
I want to know the highest frequency of a single pattern that appears in each array starting with 2 numbers and going to 10 numbers.

my result should yield  10 arrays in an array.

[0]=>array([0]=>3, [1]=>6)
[1]=>array([0]=>22, [1]=>31[2]=>56)..


....etc all the way to 10 arrays the 10th having 10 positions.

The I just cant seem to make it all work with multi-level arrays.

please help!


         

function getFrequences2($input, $minSequenceSize = 2) {
  $sequences = array();

  $last_offset = 0;
  $last_offset_len = 0;

  $len = count($input);
  for ($i=0; $i<$len; $i++) {
     for ($j=$i+$minSequenceSize; $j<$len; $j++) {
        if ($input[$i] == $input[$j]) {
           $offset = 1;
           $sub = array($input[$i]);
           while ($i + $offset < $j && $j + $offset < $len) {
              if ($input[$i + $offset] == $input[$j + $offset]) {
                 array_push($sub, $input[$i + $offset]);
              } else {
                 break;
              }
              $offset++;
           }

           $sub_len = count($sub);
           if ($sub_len >= $minSequenceSize) {
              // $sub must contain more elements than the last sequence found
              // otherwise we will count the same sequence twice
              if ($last_offset + $last_offset_len >= $i + $sub_len) {
                 // we already saw this sequence... ignore
                 continue;
              } else {
                 // save offset and sub_len for future check
                 $last_offset = $i;
                 $last_offset_len = $sub_len;
              }

              foreach ($sequences as & $sequence) {
                 $sequence_len = count($sequence['values']);
                 if ($sequence_len == $sub_len && $sequence['values'] == $sub) {
                    //echo "Found add-full ".var_export($sub, true)." at $i and $j...\n";
                    $sequence['frequence']++;
                    break 2;
                 } else {
                    if ($sequence_len > $sub_len) {
                       $end = $sequence_len - $sub_len;
                       $values = $sequence['values'];
                       $slice_len = $sub_len;
                       $test = $sub;
                    } else {
                       $end = $sub_len - $sequence_len;
                       $values = $sub;
                       $slice_len = $sequence_len;
                       $test = $sequence['values'];
                    }
                    for ($k=0; $k<=$end; $k++) {
                       if (array_slice($values, $k, $slice_len) == $test) {
                          //echo "Found add-part ".implode(',',$sub)." which is part of ".implode(',',$values)." at $i and $j...\n";
                          $sequence['values'] = $values;
                          $sequence['frequence']++;
                          break 3;
                       }
                    }
                 }
              }

              //echo "Found new ".implode(',',$sub)." at $i and $j...\n";
              array_push($sequences, array('values' => $sub, 'frequence' => 2));
              break;
           }
        }
     }
  }

  return $sequences;
};

Open in new window

0
bdgbrick
Asked:
bdgbrick
  • 7
  • 2
  • 2
1 Solution
 
bdgbrickAuthor Commented:
Basically if you have 10 sets of 20 numbers each, (22, 56) come up the most frequent, thats what i want to know.

and the same for 3 digits and 4 and 5... all the way to 10.  The numbers in each set will not repeat if that helps.  22 will not appear twice in the same set of integers.
0
 
Ray PaseurCommented:
What is this application for?  I'm not exactly new to programming and I have not seen something like this question before.  Where does this data originate?
0
 
bdgbrickAuthor Commented:
the # sets are uploaded into the database manually each with an ID key and a date/time

The data will come from a query

the table is called tblNumbers:
[id]
[date]
[numString] - varchar  40

[1] [date/time] [03 05 11 14 16 22 23 27 37 42 49 50 52 54 55 56 59 63 74 77]
[2] [date/time] [03 08 12 23 30 31 35 39 41 45 46 55 57 62 63 65 66 71 77 79]
[3] [date/time] [08 09 12 19 20 24 32 34 39 42 47 49 52 53 58 63 65 69 72 76]


so one the function can get the frequency of patterns, I will want to input a start date and end date of which to calculate the pattern frequency

so the ? I want to ask the data set is,  between jan 1st  and march 23rd  what  2# combinations have the highest frequency.  which 3# combinations, which 4# combinations....... all the way to 10#

if that helps,
if you need more info
let me know, I will get back sooner than last time.

thanks
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
bdgbrickAuthor Commented:
some constants:

all # string numbers will be between 1 and 80
no # string will have a duplicate #

This is a lottery application I am doing with a friend.
0
 
Ray PaseurCommented:
OK, I think we can do something with that.  But in a VARCHAR(40) field, this will be truncated because it is longer than 40 characters.
03 05 11 14 16 22 23 27 37 42 49 50 52 54 55 56 59 63 74 77

Is the actual format of the numString field always a set of two digits separated by blanks?

In the test data posted above it looks like something with 63 would be one of the most frequent two-digit combos, maybe 03-63.  But that would be the same as 03-77, right?

0
 
bdgbrickAuthor Commented:
sorry that varchar was supposed to be higher 100 will do it.  

the actual numString field will always be 20 2 number combinations.

I can separate them with anything in the string before uploading to the database, if that helps matters, space comma semi-colin, whatever is easiest.


you are correct with the test data,
I would say there will be a tie frequency especially in the lower combonations, maybe return a list of all of the combinations with the highest frequency.

example:
both  03 77 and 03 66
happen twice
if 2 is the highest frequency then we would get an array with both combos
if there is a set with 3 instances then we would just get the one with the frequency of 3.



0
 
bdgbrickAuthor Commented:
Is this capability not possible with PHP?
0
 
bdgbrickAuthor Commented:
Anybody have any ideas on this?
0
 
Lukasz ChmielewskiCommented:
Can't you just "glue" the arrays thogether and work with the data just as you would in one dimensional array ?
0
 
Lukasz ChmielewskiCommented:
Not tested, try it

<?php
    
function getFrequences2($input, $minSequenceSize = 2) {

  $test = array();

  foreach($input as $key => $val){
    $test = array_merge($test,$val);
  }
  
  //print_r($test);
  $input = $test;

  $sequences = array();

  $last_offset = 0;
  $last_offset_len = 0;

  $len = count($input);
  for ($i=0; $i<$len; $i++) {
     for ($j=$i+$minSequenceSize; $j<$len; $j++) {
        if ($input[$i] == $input[$j]) {
           $offset = 1;
           $sub = array($input[$i]);
           while ($i + $offset < $j && $j + $offset < $len) {
              if ($input[$i + $offset] == $input[$j + $offset]) {
                 array_push($sub, $input[$i + $offset]);
              } else {
                 break;
              }
              $offset++;
           }

           $sub_len = count($sub);
           if ($sub_len >= $minSequenceSize) {
              // $sub must contain more elements than the last sequence found
              // otherwise we will count the same sequence twice
              if ($last_offset + $last_offset_len >= $i + $sub_len) {
                 // we already saw this sequence... ignore
                 continue;
              } else {
                 // save offset and sub_len for future check
                 $last_offset = $i;
                 $last_offset_len = $sub_len;
              }

              foreach ($sequences as & $sequence) {
                 $sequence_len = count($sequence['values']);
                 if ($sequence_len == $sub_len && $sequence['values'] == $sub) {
                    //echo "Found add-full ".var_export($sub, true)." at $i and $j...\n";
                    $sequence['frequence']++;
                    break 2;
                 } else {
                    if ($sequence_len > $sub_len) {
                       $end = $sequence_len - $sub_len;
                       $values = $sequence['values'];
                       $slice_len = $sub_len;
                       $test = $sub;
                    } else {
                       $end = $sub_len - $sequence_len;
                       $values = $sub;
                       $slice_len = $sequence_len;
                       $test = $sequence['values'];
                    }
                    for ($k=0; $k<=$end; $k++) {
                       if (array_slice($values, $k, $slice_len) == $test) {
                          //echo "Found add-part ".implode(',',$sub)." which is part of ".implode(',',$values)." at $i and $j...\n";
                          $sequence['values'] = $values;
                          $sequence['frequence']++;
                          break 3;
                       }
                    }
                 }
              }

              //echo "Found new ".implode(',',$sub)." at $i and $j...\n";
              array_push($sequences, array('values' => $sub, 'frequence' => 2));
              break;
           }
        }
     }
  }

  return $sequences;
}


$array1 =  array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13);
$array2 = array(
                   array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
                    array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
                    array(3, 5, 1, 3, 5, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13, 32, 5, 48, 4, 7, 13),
                    array(4, 5, 7, 76, 34, 35, 48, 22, 80, 22, 15, 48, 4, 7, 13, 55, 3, 5, 65, 4, 7, 13),
);

$test = getFrequences2($array2,4);
print_r($test);

?>

Open in new window

0
 
bdgbrickAuthor Commented:
Thanks, I glued the arrays together like you suggested.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

  • 7
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now