How to find the longest common substing from 2 strings?
Posted on 2011-05-02
I was trying to find the longest common substring from 2 different strings, like...the longest common substring between AAAAAAAAAACCCTCCCCCC and TTTTTTTAAAAACCCCGGGG is AAAAACCC.
The first thing that comes to my mind, is, I take the shorter string, and keep comparing the first letter in the shorter string, to the first letter in the larger string, and if the letters don't match, we can simply remove that letter from the larger string, and try again, but if they do match, we move on to the next string and go on. At any point, if the n+1 letter doesn't match, we keep the string we found, and start fresh...
But... this seems to be a brute force method... is there a way we can do this in linear time?
One idea that comes to mind, is that we take the first letter in string1, and start from the first occurrence of that letter in string2, but a few problems that arise from that approach, is that I may accidentally delete the largest string, or, it might not help at all if both strings start with the same letter...
Another idea that comes to mind... would it be faster if I just cycled through all the permutations of the larger string, and just subtract the two strings and store the results in an integer array? Like...
a b c d d d d e f f
a d u d d d d e l f
0 !0 !0 0 0 0 0 0 !0 0
Basically, I'd just check for the longest string of 0's, and since I would keep one string constant, I would know exactly what the largest string is...
But I don't think either of these ways can get the job done in linear time... could anyone help me find an algorithm that performs this task in O(mn) time? Where m and n are the lengths of the two strings?
Thanks in advance!