How to find the longest common substring?

errang
errang used Ask the Experts™
on
Hey,

       I asked this question a little while back, and got a working solution with the conventional way... Which is, building a table and looking at a diagonal sequence of numbers.  But, I was wondering if it was possible to do it using a constant amount of memory.. as in, without the table.  Is that even possible?

The only way I can think of doing it is, take the shorter string of the 2, chop it up into all possible substrings, and then see if any of the substrings exist in the bigger of the 2 strings... but this method would take an enormous amount of time...

Is there a slightly more efficient method?  Google searches return methods that use a constant time, but not constant memory...

Appreciate any help on this!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2014
Top Expert 2015
Commented:
When building a suffix tree, multiple occurences of a substring in a text are caught and recorded. If the two input strings are concatentated, then when the tree of these two strings are

Here is a video lecture given by Esko Ukkonen, University of Helsinki. His algorithm builds the suffix tree in O(n) time.
    http://videolectures.net/aop05_ukkonen_sthmt/
Here are the slides for his video:  Esko-Ukkonen-Suffix-Tree-Erice20.pptx

His algorithm is an improved extension of Edward McCreight's 1976 algorithm. Here are slides on Edward McCreight's 1976 algorithm
     http://www.math.tau.ac.il/~haimk/seminar00/iddo-McCreight_slides.ppt

If you wish to get some background on tries and suffix trees, here are two course video lectures that should be useful (from the India Institute of Technology):

Lecture - 17 Case Study: Searching for Patterns
    http://www.youtube.com/watch?v=Zj_er99KMb8&feature=related

Lecture - 18 Tries
    http://www.youtube.com/watch?v=uhAUk63tLRM&p=BF3763AF2E1C572F
The previous lecture 18 on Tries gives good background on tries, compressed tries, and decent introduction to suffix trees (although some parts were not intelligible - either speaking too fast, or stopped speaking English). But that lecture should help in reading the article posted in the other Question.

Before considering longest common substring, consider string searching times. KMP finds a string in O(m+n) time, but if n is huge (e.g., complete works of Shakespeare), then O(m+n) is too long especially if there are multiple queries.

Using suffix tree, the search time for each query is O(m) after the tree is built. But the limiting factor is how long does it take to build the suffix tree. It was Esko Ukkonen who succeeded in building the tree in O(n) time, even though the number of substrings is O(n^2).

So, using the Ukkonen algorithm, then the first query to find a matching string takes O(m+n) time, but subsequent queries takes only O(m) times once the suffix tree representing the string of n symbols has been built.
First sentence got chopped off..

When building a suffix tree, multiple occurences of a substring in a text are caught and recorded. If the two input strings are concatentated (with you keeping track of the demarcation), then as you build the suffix tree from your two strings, then whenever a substring matches that is in both input strings, you keep track of the largest occurrence.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial