Hello, attached is a portion of a huge text file
The file only contains four letters:A,G,C,T. I want to find a sequence in the file which has the maximum match. The definition of the maximum match is the total letters of the occurrence devided by the total number of letters in the file.
. Suppose the above portion is a file, AAG is a sequence. The AAG occuries 4 times, it indicates the total sequence has 12 letters. And the total file has 28 letters, thus the match rate is equal to 12/28.
The question is that we don't know which pattern has the maximum mapping rate except the entire file itself. For a single letter, the rate is about 1/4 when the size of the file increases because of probability.
It involves regular expression and algorithm etc. Thank for any input.