Link to home
Start Free TrialLog in
Avatar of 2ooth
2ooth

asked on

Using gapped suffix arrays for approximate string matching

Hi there,

I have been tasked with implementingin an algorithm that builds an approximate index on a set of text (eg: a collection of websites) and perform some operations related to it (eg: Search, Find & Replace).

The suggested technique to implement this is to use gapped suffix arrays. I have searched online for more detail but with no luck. I understand what suffix arrays are about, but not the "gapped" piece.

Can anyone kindly assist me or point me to a web link with more information on this topic?

Thanks very much,
C
ASKER CERTIFIED SOLUTION
Avatar of sarabande
sarabande
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of 2ooth
2ooth

ASKER

Thank you Sara and Tommy for your prompt responses.

I will have a read-through of the wiki links shortly.

Tommy,

Looking at the research paper, I agree that the gapped suffix arrays technique is probably not the path I should go down with, in the interest of time. Can you kindly mention the more standard methods that are definitely proven to work? Perhaps its worth my time doing my ground work on alternate "working" methods :)

Thanks again..

C
Regular Suffix arrays would work. The original paper (Suffix arrays: A new method for on-line string searches) was published in 1989 and has over 1000 citations. There is a good Wikipedia page for suffix arrays (it is listed in Sara's first link under indexing methods too). I would start there.
Avatar of 2ooth

ASKER

Ok fantastic, thanks so much!

Will give that a go and let you know if I need further input.