?
Solved

php difference in strings

Posted on 2012-08-22
6
Medium Priority
?
401 Views
Last Modified: 2012-08-24
$string1='hello, how are you'
$string2='very different'
$string3='hi, how are you'
$string4='another string'


using php, is there a way to compare $string1 against $string2, $string3, $string4 and see that $string1 is similar to $string3
0
Comment
Question by:rgb192
6 Comments
 
LVL 11

Expert Comment

by:Slimshaneey
ID: 38320928
You can use the similar_text() function, or use more involved ones like levenshtein algorithm which gives the edit distance (The number of edits needed to make a string the same).


Others include soundex(): Returns the four-character soundex key of a word, which should be the same as the key for any similar-sounding word.
metaphone(): Similar to soundex, and possibly more effective for you. It's more accurate than soundex() as it knows the basic rules of English pronunciation. The metaphone generated keys are of variable length.
0
 
LVL 11

Expert Comment

by:Slimshaneey
ID: 38320929
Here is the definition for similar_text():

http://php.net/manual/en/function.similar-text.php
0
 
LVL 9

Expert Comment

by:rinfo
ID: 38324262
Firstly it is need to be cleared as to what he means by similar.
Is it similar sounding - soundex has to be used , or is string a lot like compared string - similarText  would get a percent of similarity. Or does he simply means string1 is equal to
string3.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 38324469
Using a combination of metaphone(), soundex() and levenshtein() you can make computations that are going to be somewhat illuminating.  Similar_text() can be helpful, too.  Please see:
http://www.laprbass.com/RAY_temp_rgb192.php

This is by no means an exhaustive treatise.   A few years ago, I wrote a white paper for the US Justice Department on name identification after a suspected terrorist got past airport security because the security data bases could not recognize his Arabic name on the no-fly list.  In that exercise, I found that if you take the soundex() and metaphone() values of the name (even a phonetically spelled name from Arabic) you had a good indicator of the name.  If you reversed the phonetic name and took the soundex() and metaphone() values you had another good indicator.  If you then took these strings and did the same process to the names on the bad-guy list, and computed the levenshtein() distance between the soundex() and metaphone() values, you almost always got a very strong indicator of matching names.  Try it with phonetic variants on this fellow:
http://en.wikipedia.org/wiki/Cat_Stevens

Some possible variants might include Yousef Islam, Yusuf Islam, Yusef Islaam, Yousif Isla'am, YUSUF Islam, etc.  Make up as many as you like, so long as there is a reasonable English-language pronunciation that sounds mostly like the name.  You may be surprised how strong the PHP language can be at identifying similar string values.

<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";

// COPIED FROM THE POST AT EE
$string[1]='hello, how are you';
$string[2]='very different';
$string[3]='hi, how are you';
$string[4]='another string';

$string[] = 'Yousef Islam';
$string[] = 'Yusuf Islam';
$string[] = 'Yusef Islaam';
$string[] = 'Yousif Isla\'am';
$string[] = 'YUSUF Islam';

// COMPARISONS USING SOUNDEX+LEVENSHTEIN
foreach ($string as $x)
{
    // COMPUTE THE SOUNDEX KEY
    $sx = soundex($x);
    echo PHP_EOL . "TESTING <b>$x</b> WITH SOUNDEX() $sx";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $sy = soundex($y);
        $sl = levenshtein($sx, $sy);
        echo PHP_EOL
        . "SOUNDEX() $sx"
        . " IS $sl DISTANCE FROM $sy";
    }
}
echo PHP_EOL;

// COMPARISONS USING METAPHONE+LEVENSHTEIN
foreach ($string as $x)
{
    // COMPUTE THE METAPHONE KEY
    $sx = metaphone($x);
    echo PHP_EOL . "TESTING <b>$x</b> WITH METAPHONE() $sx";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $sy = metaphone($y);
        $sl = levenshtein($sx, $sy);
        echo PHP_EOL
        . "METAPHONE() $sx"
        . " IS $sl DISTANCE FROM $sy";
    }
}
echo PHP_EOL;

// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
    echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'
        ;
    }
}
echo PHP_EOL;

Open in new window

HTH, ~Ray
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 38324502
Note the case-sensitive comparisons.  You might want to normalize to upper case.   Just a thought... ~Ray
0
 

Author Closing Comment

by:rgb192
ID: 38328454
this code sample answered my question completely and taught me well
thanks


I have a similar question
gathering data from mysql instead of $string[]

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_27840917.html
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses
Course of the Month14 days, 12 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question