[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4381
  • Last Modified:

mysql similar text (character comparison)

COMPARISONS USING SIMILAR_TEXT()
 http://php.net/manual/en/function.similar-text.php



these are all the comparison methods that Ray found for me

<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";

// GETTING THE TITLES INTO THE ARRAY
$sql = 'SELECT title FROM my_table ORDER BY title';
$res = mysql_query($sql) or die("FAIL: $sql <br/> " . mysql_error());
while ($row = mysql_fetch_assoc($res))
{
    $string[] = $row['title'];
}

// SHOWING THE ARRAY OF TITLES
print_r($string);

// COMPARISONS USING SOUNDEX+LEVENSHTEIN
foreach ($string as $x)
{
    // COMPUTE THE SOUNDEX KEY
    $sx = soundex($x);
    echo PHP_EOL . "TESTING <b>$x</b> WITH SOUNDEX() $sx";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $sy = soundex($y);
        $sl = levenshtein($sx, $sy);
        echo PHP_EOL
        . "SOUNDEX() $sx"
        . " IS $sl DISTANCE FROM $sy";
    }
}
echo PHP_EOL;

// COMPARISONS USING METAPHONE+LEVENSHTEIN
foreach ($string as $x)
{
    // COMPUTE THE METAPHONE KEY
    $sx = metaphone($x);
    echo PHP_EOL . "TESTING <b>$x</b> WITH METAPHONE() $sx";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $sy = metaphone($y);
        $sl = levenshtein($sx, $sy);
        echo PHP_EOL
        . "METAPHONE() $sx"
        . " IS $sl DISTANCE FROM $sy";
    }
}
echo PHP_EOL;

// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
    echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'
        ;
    }
}
echo PHP_EOL;

Open in new window




is there a way to do this in mysql
with less or no php
compare fields from the same column
similar by characters

I do not want SOUNDEX, METAPHONE, LEVENSHTEIN

for this question, I only want
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'

<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";

// GETTING THE TITLES INTO THE ARRAY
$sql = 'SELECT title FROM my_table ORDER BY title';
$res = mysql_query($sql) or die("FAIL: $sql <br/> " . mysql_error());
while ($row = mysql_fetch_assoc($res))
{
    $string[] = $row['title'];
}

// SHOWING THE ARRAY OF TITLES
print_r($string);

// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
    echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";

    // COMPARE TO THE OTHER STRINGS
    foreach ($string as $y)
    {
        $ss = similar_text($x, $y, $sp);
        echo PHP_EOL
        . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'
        ;
    }
}
echo PHP_EOL;

Open in new window

0
rgb192
Asked:
rgb192
2 Solutions
 
Jagadishwor DulalBraces MediaCommented:
I think you want to check in database table you can use like operator for :

Like
SELECT title FROM my_table WHERE title LIKE='%string%' or title LIKE='%string%' or title LIKE='%string';

Open in new window


in each field you can check using LIKE operator

Find the reference here:

http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html
http://www.w3schools.com/sql/sql_like.asp
0
 
Cornelia YoderArtistCommented:
I do not know of any equivalent to the php similar_text() function in MySQL.  LIKE only matches exact characters with a wildcard ability, but would not find  abba and baab at all.

If your comparison field were very short, say 4 characters, you could write your own version of it using SUBSTRING() but I think it would be pretty difficult and cumbersome.
0
 
Ray PaseurCommented:
I do not want SOUNDEX, METAPHONE, LEVENSHTEIN
Why not?
0
Restore individual SQL databases with ease

Veeam Explorer for Microsoft SQL Server delivers an easy-to-use, wizard-driven interface for restoring your databases from a backup. No expert SQL background required. Web interface provides a complete view of all available SQL databases to simplify the recovery of lost database

 
rgb192Author Commented:
>>I do not know of any equivalent to the php similar_text() function in MySQL.  LIKE only matches exact characters with a wildcard ability, but would not find  abba and baab at all.

I do not have exact matches



>>If your comparison field were very short, say 4 characters, you could write your own version of it using SUBSTRING() but I think it would be pretty difficult and cumbersome.

I am using longer than 4 characters















>>why not SOUNDEX, METAPHONE, LEVENSHTEIN

I am basing this question off of Rays example
php SOUNDEX, METAPHONE, LEVENSHTEIN were not relevent to what I was looking for
because in Ray's example I understand the output but it is not useful to this project

TESTING hello, how are you WITH SOUNDEX() H460
SOUNDEX() H460 IS 0 DISTANCE FROM H460

TESTING hello, how are you WITH METAPHONE() HLHRY
METAPHONE() HLHRY IS 0 DISTANCE FROM HLHRY



php similiar_text() is what I want

I want similarity percentage based on
 . "SIMILAR_TEXT() $x"
        . " HAS $ss CHARACTERS IN COMMON WITH $y "
        . '('
        . number_format($sp, 0)
        . '%)'



TESTING hello, how are you WITH SIMILAR_TEXT()
SIMILAR_TEXT() hello, how are you HAS 18 CHARACTERS IN COMMON WITH hello, how are you (100%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH very different (25%)
SIMILAR_TEXT() hello, how are you HAS 14 CHARACTERS IN COMMON WITH hi, how are you (85%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH another string (25%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousef Islam (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yusuf Islam (14%)
SIMILAR_TEXT() hello, how are you HAS 3 CHARACTERS IN COMMON WITH Yusef Islaam (20%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousif Isla'am (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH YUSUF Islam (14%)



maybe mysql can use SOUNDEX, METAPHONE, LEVENSHTEIN to get this output
0
 
Ray PaseurCommented:
Maybe if you can step back from the technical details and tell us what you are trying to achieve from a higher-level perspective -- then there might be a well-understood design pattern.  From this low level it seems like we're trying to use a wrench to drive screws.  In other words, nobody does what you're asking about.  There might be a better way if we could understand your objectives.
0
 
rgb192Author Commented:
I want to find if 2 fields from same column are >80 simlar
0
 
Ray PaseurCommented:
.... from same column are >80 simlar(sic) 

Open in new window

But under what criteria?  This is the thing we are trying to discern.  What similarities are you looking for?  Sound-alike?  Character count?  If you can articulate the rules we can almost certainly help.  If not, there is nothing we can do!
0
 
rgb192Author Commented:
Character count
0
 
Ray PaseurCommented:
We're not communicating here, but maybe we can get closer.  Character count is not a high-level problem definition.  It is a very low-level problem definition and it misses the big picture, which is what we are trying to discern.

What does your application do?  Who uses it?  How do they use it?  Those are the high-level problem definitions that, if understood, would probably enable us to help you.  Please tell us more about these things.  Maybe you have a web site with a FAQ page?  If so, please post a link so we can see what is really going on there.

Here is a link that is about character count.
http://www.artfulsoftware.com/infotree/queries.php#552
0
 
rgb192Author Commented:
we have products
product description column (50 characters or more)



many products that have similar product descriptions

some are exactly the same
and some are the same but have extra spaces
and some are very similar

from this we can suggest a similar product
0
 
Ray PaseurCommented:
OK, here is my thumbnail sketch of how you might do this.

1. Normalize the product description to all upper case and eliminate anything more than a single blank between words
2. explode() the product description into an array
3. sort() the array
4. implode() the array back into a string
5. With the normalized string, consider using soundex(), metaphone(), and levenshtein() to find similar strings.

There really isn't any direct, scientific way to win this battle -- you have to use some trial and error that considers the number of product you have and the names they bring to the table.
0
 
rgb192Author Commented:
Is there a way to complete a similar task using only mysql because i can modify the task.  Using php i think i am forced to create tables with flags
0
 
rgb192Author Commented:
I wish I could implement this plan.
seems like a good idea to explode, implode, sort
0

Featured Post

Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now