Try using diff_match() function from here:
http://shaunwagner.com/use
Main Topics
Browse All Topicshow can I get the longest matching substring from a large list?
string = "david letterman kiefer sutherland"
list:
------
david
late show with david letterman
... (> 10,000 names)
so I want to match "david letterman" to show in the list "late show with david letterman" then take off "kiefer sutherland" for the episode name.
any ideas?
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Try using diff_match() function from here:
http://shaunwagner.com/use
Assuming your substring always starts at the beginning of the string you're searching for.
Assuming you have your list in an array of strings, say @strings, and the string you're searching for in $search
Try this:
// this function does the substring calculation
function common_substring_size($str
{
// find length of longest common substring first
$L = array();
$length1 = strlen($string);
$length2 = strlen($search);
$longest = 0;
for ($i = $length1; $i >= 0; $i--)
{
$j = 0
while ($j < $length2 && $i + $j < $length1 && $string[$i + $j] == $search[$j]) $j++
$longest = max($longest, $j)
}
return $longest;
}
// the rest of the code
for( $i = 0; $i <= count($strings); $i ++) {
$lengths[$i] = common_substring_size($str
}
$longest_total = 0;
for( $i = 0; $i <= count($lengths); $i++) {
if ($longest_total < $lengths[i]) {
$longest_total = $lengths[i]
$selected_index = $i
}
}
// Insert a check if $longest_total is above your threshhold value for a non-match, say 5 symbols, if you want...
$leftover = substr($search, $longest_total])
// Now you have the rest of the search string in $leftover
// and the matching entry index in $selected_index
// But be warned, it will not be instant, especially with a large set of strings.
the show name should be at the begining. so if "kiefer sutherland" were in the list it shouldn't match, but if "david letterman kiefer" were then it would be better than just matching "david letterman"
this will be done across thousands of names, so yes if pre-processing is faster than we should do it.
itcdr, if the performance is really important and you will be searching over the same set of strings many times, consider building suffix trees (http://en.wikipedia.org/w
string="justice episode name"
1) look for any that contain "justice","episode", or "name"
2) look for ones that match the closest at the begining or have the most words in common. now left with
justice_2006
or
justice_league_unlimited
3) dates aren't too important plus the string doesn't contain "league" or "unlimited" so the winner is "justice_2006"
basically I have about 8,000 titles containing show name and episode name or number in no standard format. these need to filed under one of 12,000 show names. and new titles will come in each day. does that make sense?
The only thing i can think of exept from redesigning your database is doing a fulltext search and sort the results by relevancy and limit 1
something like ..
SELECT *, MATCH(columnname) AGAINST("david letterman kiefer sutherland" IN BOOLEAN MODE) as score FROM tablename WHERE MATCH(columnname) AGAINST("david letterman kiefer sutherland bla bla" IN BOOLEAN MODE) ORDER BY score DESC LIMIT 1
See
http://dev.mysql.com/doc/r
http://dev.mysql.com/doc/r
Business Accounts
Answer for Membership
by: JamesCsslPosted on 2007-07-02 at 14:39:39ID: 19407132
To get the longest string, you should probably sort your test strings from longest to shortest.