rgb192
asked on
mysql similar text (character comparison)
COMPARISONS USING SIMILAR_TEXT()
http://php.net/manual/en/function.similar-text.php
these are all the comparison methods that Ray found for me
is there a way to do this in mysql
with less or no php
compare fields from the same column
similar by characters
I do not want SOUNDEX, METAPHONE, LEVENSHTEIN
for this question, I only want
$ss = similar_text($x, $y, $sp);
echo PHP_EOL
. "SIMILAR_TEXT() $x"
. " HAS $ss CHARACTERS IN COMMON WITH $y "
. '('
. number_format($sp, 0)
. '%)'
http://php.net/manual/en/function.similar-text.php
these are all the comparison methods that Ray found for me
<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";
// GETTING THE TITLES INTO THE ARRAY
$sql = 'SELECT title FROM my_table ORDER BY title';
$res = mysql_query($sql) or die("FAIL: $sql <br/> " . mysql_error());
while ($row = mysql_fetch_assoc($res))
{
$string[] = $row['title'];
}
// SHOWING THE ARRAY OF TITLES
print_r($string);
// COMPARISONS USING SOUNDEX+LEVENSHTEIN
foreach ($string as $x)
{
// COMPUTE THE SOUNDEX KEY
$sx = soundex($x);
echo PHP_EOL . "TESTING <b>$x</b> WITH SOUNDEX() $sx";
// COMPARE TO THE OTHER STRINGS
foreach ($string as $y)
{
$sy = soundex($y);
$sl = levenshtein($sx, $sy);
echo PHP_EOL
. "SOUNDEX() $sx"
. " IS $sl DISTANCE FROM $sy";
}
}
echo PHP_EOL;
// COMPARISONS USING METAPHONE+LEVENSHTEIN
foreach ($string as $x)
{
// COMPUTE THE METAPHONE KEY
$sx = metaphone($x);
echo PHP_EOL . "TESTING <b>$x</b> WITH METAPHONE() $sx";
// COMPARE TO THE OTHER STRINGS
foreach ($string as $y)
{
$sy = metaphone($y);
$sl = levenshtein($sx, $sy);
echo PHP_EOL
. "METAPHONE() $sx"
. " IS $sl DISTANCE FROM $sy";
}
}
echo PHP_EOL;
// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";
// COMPARE TO THE OTHER STRINGS
foreach ($string as $y)
{
$ss = similar_text($x, $y, $sp);
echo PHP_EOL
. "SIMILAR_TEXT() $x"
. " HAS $ss CHARACTERS IN COMMON WITH $y "
. '('
. number_format($sp, 0)
. '%)'
;
}
}
echo PHP_EOL;
is there a way to do this in mysql
with less or no php
compare fields from the same column
similar by characters
I do not want SOUNDEX, METAPHONE, LEVENSHTEIN
for this question, I only want
$ss = similar_text($x, $y, $sp);
echo PHP_EOL
. "SIMILAR_TEXT() $x"
. " HAS $ss CHARACTERS IN COMMON WITH $y "
. '('
. number_format($sp, 0)
. '%)'
<?php // RAY_temp_rgb192.html
error_reporting(E_ALL);
echo "<pre>";
// GETTING THE TITLES INTO THE ARRAY
$sql = 'SELECT title FROM my_table ORDER BY title';
$res = mysql_query($sql) or die("FAIL: $sql <br/> " . mysql_error());
while ($row = mysql_fetch_assoc($res))
{
$string[] = $row['title'];
}
// SHOWING THE ARRAY OF TITLES
print_r($string);
// COMPARISONS USING SIMILAR_TEXT() BUT SEE THE NOTES HERE BEFORE YOU USE IT!
// http://php.net/manual/en/function.similar-text.php#109507
// COMPARISONS USING SIMILAR_TEXT
foreach ($string as $x)
{
echo PHP_EOL . "TESTING <b>$x</b> WITH SIMILAR_TEXT()";
// COMPARE TO THE OTHER STRINGS
foreach ($string as $y)
{
$ss = similar_text($x, $y, $sp);
echo PHP_EOL
. "SIMILAR_TEXT() $x"
. " HAS $ss CHARACTERS IN COMMON WITH $y "
. '('
. number_format($sp, 0)
. '%)'
;
}
}
echo PHP_EOL;
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I do not want SOUNDEX, METAPHONE, LEVENSHTEINWhy not?
ASKER
>>I do not know of any equivalent to the php similar_text() function in MySQL. LIKE only matches exact characters with a wildcard ability, but would not find abba and baab at all.
I do not have exact matches
>>If your comparison field were very short, say 4 characters, you could write your own version of it using SUBSTRING() but I think it would be pretty difficult and cumbersome.
I am using longer than 4 characters
>>why not SOUNDEX, METAPHONE, LEVENSHTEIN
I am basing this question off of Rays example
php SOUNDEX, METAPHONE, LEVENSHTEIN were not relevent to what I was looking for
because in Ray's example I understand the output but it is not useful to this project
TESTING hello, how are you WITH SOUNDEX() H460
SOUNDEX() H460 IS 0 DISTANCE FROM H460
TESTING hello, how are you WITH METAPHONE() HLHRY
METAPHONE() HLHRY IS 0 DISTANCE FROM HLHRY
php similiar_text() is what I want
I want similarity percentage based on
. "SIMILAR_TEXT() $x"
. " HAS $ss CHARACTERS IN COMMON WITH $y "
. '('
. number_format($sp, 0)
. '%)'
TESTING hello, how are you WITH SIMILAR_TEXT()
SIMILAR_TEXT() hello, how are you HAS 18 CHARACTERS IN COMMON WITH hello, how are you (100%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH very different (25%)
SIMILAR_TEXT() hello, how are you HAS 14 CHARACTERS IN COMMON WITH hi, how are you (85%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH another string (25%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousef Islam (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yusuf Islam (14%)
SIMILAR_TEXT() hello, how are you HAS 3 CHARACTERS IN COMMON WITH Yusef Islaam (20%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousif Isla'am (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH YUSUF Islam (14%)
maybe mysql can use SOUNDEX, METAPHONE, LEVENSHTEIN to get this output
I do not have exact matches
>>If your comparison field were very short, say 4 characters, you could write your own version of it using SUBSTRING() but I think it would be pretty difficult and cumbersome.
I am using longer than 4 characters
>>why not SOUNDEX, METAPHONE, LEVENSHTEIN
I am basing this question off of Rays example
php SOUNDEX, METAPHONE, LEVENSHTEIN were not relevent to what I was looking for
because in Ray's example I understand the output but it is not useful to this project
TESTING hello, how are you WITH SOUNDEX() H460
SOUNDEX() H460 IS 0 DISTANCE FROM H460
TESTING hello, how are you WITH METAPHONE() HLHRY
METAPHONE() HLHRY IS 0 DISTANCE FROM HLHRY
php similiar_text() is what I want
I want similarity percentage based on
. "SIMILAR_TEXT() $x"
. " HAS $ss CHARACTERS IN COMMON WITH $y "
. '('
. number_format($sp, 0)
. '%)'
TESTING hello, how are you WITH SIMILAR_TEXT()
SIMILAR_TEXT() hello, how are you HAS 18 CHARACTERS IN COMMON WITH hello, how are you (100%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH very different (25%)
SIMILAR_TEXT() hello, how are you HAS 14 CHARACTERS IN COMMON WITH hi, how are you (85%)
SIMILAR_TEXT() hello, how are you HAS 4 CHARACTERS IN COMMON WITH another string (25%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousef Islam (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yusuf Islam (14%)
SIMILAR_TEXT() hello, how are you HAS 3 CHARACTERS IN COMMON WITH Yusef Islaam (20%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH Yousif Isla'am (13%)
SIMILAR_TEXT() hello, how are you HAS 2 CHARACTERS IN COMMON WITH YUSUF Islam (14%)
maybe mysql can use SOUNDEX, METAPHONE, LEVENSHTEIN to get this output
Maybe if you can step back from the technical details and tell us what you are trying to achieve from a higher-level perspective -- then there might be a well-understood design pattern. From this low level it seems like we're trying to use a wrench to drive screws. In other words, nobody does what you're asking about. There might be a better way if we could understand your objectives.
ASKER
I want to find if 2 fields from same column are >80 simlar
.... from same column are >80 simlar(sic)
But under what criteria? This is the thing we are trying to discern. What similarities are you looking for? Sound-alike? Character count? If you can articulate the rules we can almost certainly help. If not, there is nothing we can do!
ASKER
Character count
We're not communicating here, but maybe we can get closer. Character count is not a high-level problem definition. It is a very low-level problem definition and it misses the big picture, which is what we are trying to discern.
What does your application do? Who uses it? How do they use it? Those are the high-level problem definitions that, if understood, would probably enable us to help you. Please tell us more about these things. Maybe you have a web site with a FAQ page? If so, please post a link so we can see what is really going on there.
Here is a link that is about character count.
http://www.artfulsoftware.com/infotree/queries.php#552
What does your application do? Who uses it? How do they use it? Those are the high-level problem definitions that, if understood, would probably enable us to help you. Please tell us more about these things. Maybe you have a web site with a FAQ page? If so, please post a link so we can see what is really going on there.
Here is a link that is about character count.
http://www.artfulsoftware.com/infotree/queries.php#552
ASKER
we have products
product description column (50 characters or more)
many products that have similar product descriptions
some are exactly the same
and some are the same but have extra spaces
and some are very similar
from this we can suggest a similar product
product description column (50 characters or more)
many products that have similar product descriptions
some are exactly the same
and some are the same but have extra spaces
and some are very similar
from this we can suggest a similar product
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Is there a way to complete a similar task using only mysql because i can modify the task. Using php i think i am forced to create tables with flags
ASKER
I wish I could implement this plan.
seems like a good idea to explode, implode, sort
seems like a good idea to explode, implode, sort
Like
Open in new window
in each field you can check using LIKE operator
Find the reference here:
http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html
http://www.w3schools.com/sql/sql_like.asp