Link to home
Start Free TrialLog in
Avatar of rgb192
rgb192Flag for United States of America

asked on

mysql similar text (character comparison)

COMPARISONS USING SIMILAR_TEXT()
 http://php.net/manual/en/function.similar-text.php



these are all the comparison methods that Ray found for me

<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";

// GETTING THE TITLES INTO THE ARRAY
$sql = 'SELECT title FROM my_table ORDER BY title';
$res = mysql_query($sql) or die("FAIL: $sql <br/> " . mysql_error());
while ($row = mysql_fetch_assoc($res))
{
    $string[] = $row['title'];
}

// SHOWING THE ARRAY OF TITLES
print_r($string);

// COMPARISONS USING SOUNDEX+LEVENSHTEIN
foreach ($string as $x)
{
    // COMPUTE THE SOUNDEX KEY
    $sx = soundex($x);
    echo PHP_EOL . "TESTING <b>$x</b> WITH SOUNDEX() $sx";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $sy = soundex($y);
        $sl = levenshtein($sx, $sy);
        echo PHP_EOL
        . "SOUNDEX() $sx"
        . " IS $sl DISTANCE FROM $sy";
    }
}
echo PHP_EOL;

// COMPARISONS USING METAPHONE+LEVENSHTEIN
foreach ($string as $x)
{
    // COMPUTE THE METAPHONE KEY
    $sx = metaphone($x);
    echo PHP_EOL . "TESTING <b>$x</b> WITH METAPHONE() $sx";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $sy = metaphone($y);
        $sl = levenshtein($sx, $sy);
        echo PHP_EOL
        . "METAPHONE() $sx"
        . " IS $sl DISTANCE FROM $sy";
    }
}
echo PHP_EOL;

// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
    echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'
        ;
    }
}
echo PHP_EOL;

Open in new window




is there a way to do this in mysql
with less or no php
compare fields from the same column
similar by characters

I do not want SOUNDEX, METAPHONE, LEVENSHTEIN

for this question, I only want
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'

<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";

// GETTING THE TITLES INTO THE ARRAY
$sql = 'SELECT title FROM my_table ORDER BY title';
$res = mysql_query($sql) or die("FAIL: $sql <br/> " . mysql_error());
while ($row = mysql_fetch_assoc($res))
{
    $string[] = $row['title'];
}

// SHOWING THE ARRAY OF TITLES
print_r($string);

// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
    echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'
        ;
    }
}
echo PHP_EOL;

Open in new window

Avatar of Jagadishwor Dulal
Jagadishwor Dulal
Flag of Nepal image

I think you want to check in database table you can use like operator for :

Like
SELECT title FROM my_table WHERE title LIKE='%string%' or title LIKE='%string%' or title LIKE='%string';

Open in new window


in each field you can check using LIKE operator

Find the reference here:

http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html
http://www.w3schools.com/sql/sql_like.asp
SOLUTION
Avatar of Cornelia Yoder
Cornelia Yoder
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I do not want SOUNDEX, METAPHONE, LEVENSHTEIN
Why not?
Avatar of rgb192

ASKER

>>I do not know of any equivalent to the php similar_text() function in MySQL.  LIKE only matches exact characters with a wildcard ability, but would not find  abba and baab at all.

I do not have exact matches



>>If your comparison field were very short, say 4 characters, you could write your own version of it using SUBSTRING() but I think it would be pretty difficult and cumbersome.

I am using longer than 4 characters















>>why not SOUNDEX, METAPHONE, LEVENSHTEIN

I am basing this question off of Rays example
php SOUNDEX, METAPHONE, LEVENSHTEIN were not relevent to what I was looking for
because in Ray's example I understand the output but it is not useful to this project

TESTING hello, how are you WITH SOUNDEX() H460
SOUNDEX() H460 IS 0 DISTANCE FROM H460

TESTING hello, how are you WITH METAPHONE() HLHRY
METAPHONE() HLHRY IS 0 DISTANCE FROM HLHRY



php similiar_text() is what I want

I want similarity percentage based on
 . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'



TESTING hello, how are you WITH SIMILAR_TEXT()
SIMILAR_TEXT() hello, how are you HAS 18 CHARACTERS IN COMMON WITH hello, how are you (100%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH very different (25%)
SIMILAR_TEXT() hello, how are you HAS 14 CHARACTERS IN COMMON WITH hi, how are you (85%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH another string (25%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousef Islam (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yusuf Islam (14%)
SIMILAR_TEXT() hello, how are you HAS 3 CHARACTERS IN COMMON WITH Yusef Islaam (20%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousif Isla'am (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH YUSUF Islam (14%)



maybe mysql can use SOUNDEX, METAPHONE, LEVENSHTEIN to get this output
Maybe if you can step back from the technical details and tell us what you are trying to achieve from a higher-level perspective -- then there might be a well-understood design pattern.  From this low level it seems like we're trying to use a wrench to drive screws.  In other words, nobody does what you're asking about.  There might be a better way if we could understand your objectives.
Avatar of rgb192

ASKER

I want to find if 2 fields from same column are >80 simlar
.... from same column are >80 simlar(sic) 

Open in new window

But under what criteria?  This is the thing we are trying to discern.  What similarities are you looking for?  Sound-alike?  Character count?  If you can articulate the rules we can almost certainly help.  If not, there is nothing we can do!
Avatar of rgb192

ASKER

Character count
We're not communicating here, but maybe we can get closer.  Character count is not a high-level problem definition.  It is a very low-level problem definition and it misses the big picture, which is what we are trying to discern.

What does your application do?  Who uses it?  How do they use it?  Those are the high-level problem definitions that, if understood, would probably enable us to help you.  Please tell us more about these things.  Maybe you have a web site with a FAQ page?  If so, please post a link so we can see what is really going on there.

Here is a link that is about character count.
http://www.artfulsoftware.com/infotree/queries.php#552
Avatar of rgb192

ASKER

we have products
product description column (50 characters or more)



many products that have similar product descriptions

some are exactly the same
and some are the same but have extra spaces
and some are very similar

from this we can suggest a similar product
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rgb192

ASKER

Is there a way to complete a similar task using only mysql because i can modify the task.  Using php i think i am forced to create tables with flags
Avatar of rgb192

ASKER

I wish I could implement this plan.
seems like a good idea to explode, implode, sort