Link to home
Start Free TrialLog in
Avatar of 4_Dwarves
4_DwarvesFlag for Australia

asked on

pattern recognition

I have created what seems to be a random 'number' generater in MATLAB that creates a 15 digit number each time it runs.  Each digit of the 15 digit number has a possible 28 different characters (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c ......, r).  This means that the total number of possible combinations for the 15 digit number is 28^15 (at least from my logic its is, is this correct?).

Now what i am trying to achieve is to determine if there are any patterns present in the numbers that my program produces (note that I am not trying to determine if the same number is generated more than once as I know this will happen, I am trying to determine if there is a pattern in the sequence that the numbers are generated).  Does anyone know of any pattern detection algorithms that could do this or is there another way?

Thanks for any help
Dopey Dwarf.
Avatar of ozo
ozo
Flag of United States of America image

Avatar of phoffric
phoffric

>>  the total number of possible combinations for the 15 digit number is 28^15 (... is this correct?).

I believe you mean the total number of possible permutations (this is where ordering counts). Combinations is where order does not count. For permutations, where each character slot can have 28 values, and where you have 15 slots, then, yes, there are 28^15 permutations.

To better understand this, suppose instead that each slot allowed only 10 distinct characters, say 0,1,...9. Then if there were 15 slots, there would be 10^15 possible distinct permutations, which is one more than the maximum decimal number you can have for a 15 digit number.
How about assigning a positive number from -14 .. +13 for the 28 possible characters.
Let randNumArray be an array of your 15 slots.
Let rand2NumArray be randNumArray  ||  randNumArray (i.e., an array of 30 slots).

Slide the 15 slot randNumArray over the 30 slot rand2NumArray and compute the sum of products of the corresponding 15 slots in both arrays to produce 15 sums. Plot the 15 sums. Pick out the significant positive peaks; and check those for a possible pattern.
Instead of -14 .. +13, assign larger numbers leaving out 0 for the 28 chars; maybe -1400 ... + 1400 (incrementing by 100).
This sum of products probably works best with a larger number of slots, say 100 to 500. This is an O(N²) operation which is best suited for array processing DSP processors.
The above assumed that circular patterns are also acceptable.

Here is a video link on searching for patterns in data. Unfortunately, at the moment, the first of the two videos are without sound (but has slides). At least there was sound in the 2nd lecture.
    http://videolectures.net/aop05_mannila_ffpd/

You can define 14 pointers that point respectively to the first char (which is a string of 15 chars), the 2nd char, ..., up to the 14th char (which is a string of 2 chars). Each pointer represents a string. You can sort these 14 strings alphanumerically. Now compare adjacent strings to find the patterns of two or more characters.
Avatar of 4_Dwarves

ASKER

I appoligise for the late reaply.

phoffric:      I think i have confused you (and myself) a little bit with this.  I am generating a number with 15 slots (digits), each slot has a possible 28 characters.  Each slot is individual in the way that if a number is produced by the generator the more than one slot may contain the same character e.g.

no.1            29ajs5j9r0fkdln      (repeated characters)
no.2            0pq92jacb3m21kl      (no repeated characters)
no.3            882oliphamnc773
no.4            111111111111111      (in theory this number could be produced)

Therefore I dont think I was trying to explain the total number of permutations however in all honnesty I really don't know.
Also I dont really understand what you mean by circular (and non-circular) patterns, could you try and explain?

Thank you for your input.
OK, when you said combinations and I reacted by saying permutations, I should not have done that. We're not doing either. We're just filling up slots randomly. Sorry about the added confusion.

And we agreed that there are 28^15 distinct possible outcomes. We can think of the 15 slots as representing a number of radix 28. If the radix were 10 (our ordinary decimal system) and had 15 digits, then we would have 10^15 distinct possible decimal numbers that could fill those 15 slots.

I assumed that by a "pattern" you meant a sequence of two or more chars that repeat. But you annotate your example "no. 1" with "(repeated characters)". True, the '9' appears twice as well as the 'j', but do you consider that a pattern? And is that significant to you? You annotate your example "no. 2" with "(no repeated characters)"; however, the '2' appears twice, so I do not understand why the annotations are different between these two examples. Likewise, by my assumed definition of a pattern, there is also no pattern in

Do the patterns have to be mutually exclusive or can they overlap?

What I proposed in http:#32694060 assumed that the patterns were allowed to be taken from overlapping slots. It also only looked for non-circular patterns. In the box below shows the 14 strings (that the 14 pointers point to) for the "random" 15 char string: "123412341234abc". After sorting these 14 strings, the longest running string has 8 chars, "12341234", which begins in the first and 5th slot.

If the patterns must be mutually exclusive (i.e., no slots in one pattern are in another pattern), then the longest running sequence has only 4 chars (e.g., "1234" or "2341" or "3412" or "4123").
=================
UNSORTED STRINGS:
=================
123412341234abc
23412341234abc
3412341234abc
412341234abc
12341234abc
2341234abc
341234abc
41234abc
1234abc
234abc
34abc
4abc
abc
bc
=================
SORTED STRINGS:
=================
123412341234abc
12341234abc
1234abc
23412341234abc
2341234abc
234abc
3412341234abc
341234abc
34abc
412341234abc
41234abc
4abc
abc
bc

Open in new window

>> I don't really understand what you mean by circular (and non-circular) patterns

Here is a clarification by example. Consider the 15 char string: 12363412abcd134
Using the above pointer scheme, you will find that the longest running sequence pattern is of length 2 (e.g., "12" and "34", and they both appear twice in the string).

If you allow for a circular pattern sequence, think of a clock having 15 slots instead of 12. In each slot put a char from the string, and then scan for patterns in a clock-wise direction. By wrapping around from the end to the beginning of the string, the previous "34" pattern becomes longer - it is now "3412".

>> no.4            111111111111111
For patterns allowing overlapping slots, then the longest pattern sequence has a length of 14, where the two patterns start at the first and second slots (and the pattern is 14 1's).

For non-overlapping slot requirement, the sequence pattern length is half, only 7. One sequence starts at slot 1, and the other sequence starts at either slot 8 or 9.
FYI - please ignore my first four posts. Start with my post http:#32694060. Again, they assume that patterns found in overlapping slots are acceptable to you.
ok, first things first, i am an idiot.. example no.2 should not have had repetitions in it, correction: 0pq92jacb3m41kl.

the patterns that i am looking for are not within the 15 digit numbers themselve i dont think but more in a sense of comparing each 15 digit number that is produced.

it is not important for each 15 digit number to be different from the last however it is important that if a specific 15 digit number does appear twice that it does so in a random fashion and not in a pattern.  for example, if 1000 15 digit numbers are produced, the same number could appear in the following positions 1, 32, 90, 508, 510, 731, 999 i.e. randomly. however if the same number was repeated in positions 1, 100, 200, 300, 400, etc, this would be seen as a pattern. (just for clarity, by position 1 i mean the first number produced, position 100 would be the 100th numberr produced and so on).

does this make sense to you?
thank you again i really appreciate all of your help.
Dopey Dwarf

I misunderstood your OP - sorry.

Whether a string of 15 chars has repetitions in it has no consequence on anything as far as your issue is concerned, right?

>> the same number could appear in the following positions 1, 32, 90, 508, 510, 731, 999 i.e. randomly.
If I were to take you literally, then your "random number generator" isn't really random, since
P(a particular string) = 28^(-15) is a number that I don't know how to pronounce. If I thought I had a true "random number generator" or even a poor pseudo random number generator (prng), I would be very surprised to find the same occurrence of a string seven times in the first 1000 numbers generated. In fact I would be surprised if the same 28 char string occurred seven times in the first trillion numbers generated by a prng.

So, it doesn't look like you are working with even a poor prng. But maybe there is something about your string generator that you are not mentioning. From your single example of a what you call a bad pattern, you express a strong dislike of periodicity.

Taking you literally with your good a bad examples, then does a spectral analysis on your "random data" (not random if I take you literally) provide you with a spectrum that stands out when you have your periodic occurrence of your repeating string?

If there is more application information that you would like to share, then maybe a better test can be offered. Did you look at the links in the first post to see if any of them are useful?
>>Whether a string of 15 chars has repetitions in it has no consequence on anything as far as your issue is concerned, right?

Correct!

>> If I were to take you literally, then your "random number generator" isn't really random, since
P(a particular string) = 28^(-15) is a number that I don't know how to pronounce. If I thought I had a true "random number generator" or even a poor pseudo random number generator (prng), I would be very surprised to find the same occurrence of a string seven times in the first 1000 numbers generated.

This was simpley and example, the program has been run to produce over 50,000 15 digit numbers without any repeats.

>>Taking you literally with your good a bad examples, then does a spectral analysis on your "random data" (not random if I take you literally) provide you with a spectrum that stands out when you have your periodic occurrence of your repeating string?

I'm pretty sure this is what i want to do only i have no idea how to do it.
Yesterday i developed a way of plotting the frequency that each char appears (theoretically if generation is random then every char has equal likelyhood of appearing) however this is not really showing me anything since the number of 15 char strings that i'm producing (so far has been run to produce 200,000) is very small with respect to the total possible combinations (that number again 28^(15)).  also this is not plotting the likelyhood of each char but the actual occurances and so if truely random some numbers are bound to appear more times.

thanks
Dopey Dwarf
Did you look at the various references I gave for finding pattens in random sequence generators?
ASKER CERTIFIED SOLUTION
Avatar of phoffric
phoffric

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Provided a lot of help.  Helped me work out exactly what i am trying to achieve.
What exactly are you trying to achieve?