[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 440
  • Last Modified:

Distinct Records

Hi;
This is the query I use to search my table and return distinct titles but I still have titles which look exact the same or are 99.9% the same.  Is there anyway to eliminate the 100% duplicate records and address records which are 99.9% close?

SELECT DISTINCT a.title AS title, s.type AS type, a.id AS id
FROM rss_articles a LEFT JOIN rss_sources s ON (s.id = source_id)
WHERE a.type!='trashed' AND ((a.`timestamp` > UNIX_TIMESTAMP(NOW())) - 86400) AND (MATCH (title,description) AGAINST ('keyword' IN BOOLEAN MODE))
ORDER BY a.id DESC
SELECT DISTINCT a.title AS title, s.type AS type, a.id AS id
FROM rss_articles a LEFT JOIN rss_sources s ON (s.id = source_id)
WHERE a.type!='trashed' AND ((a.`timestamp` > UNIX_TIMESTAMP(NOW())) - 86400) AND (MATCH (title,description) AGAINST ('keyword' IN BOOLEAN MODE)) 
ORDER BY a.id DESC

Open in new window

0
babak62
Asked:
babak62
  • 7
  • 4
  • 2
  • +1
1 Solution
 
tigin44Commented:


you should use the SOUNDEX function to eliminate the closely matching duplicates.
0
 
babak62Author Commented:
Can you please tell me how to incorporate that in this query?
0
 
SharathData EngineerCommented:
Can you post such duplicate records from your query.
0
Restore individual SQL databases with ease

Veeam Explorer for Microsoft SQL Server delivers an easy-to-use, wizard-driven interface for restoring your databases from a backup. No expert SQL background required. Web interface provides a complete view of all available SQL databases to simplify the recovery of lost database

 
babak62Author Commented:
yes I can but in my table there are more than 14000 rescords and among them maybe 20% of different combinations then do I have to comeup with all the combinations for all the queries?  One more thing this would eliminate the like type sentences but how about the exact duplicates which I can not eliminate with DISTINCT?

These sentences are in persian and I do not know if they comeup on your browser with DISTINCT I get 2 of these

¿¿¿ ¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿ ¿¿ ¿¿ ¿¿¿¿¿ ¿¿¿¿¿ ¿¿¿
0
 
babak62Author Commented:
I put it in the code are maybe it shows up there
¿¿¿ ¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿ ¿¿ ¿¿ ¿¿¿¿¿ ¿¿¿¿¿ ¿¿¿

Open in new window

0
 
SharathData EngineerCommented:
I could not able to see your data.
0
 
babak62Author Commented:
I know that is in persian and can not be visable here but even without date my question is that is there anyother way but distinct funciton to eliminate the duplicate records since the DISTINCT does not eliminate the duplicate ones.
0
 
Ray PaseurCommented:
You might try GROUP BY.  Sorry that is only a guess.  I have not used Persian characters in a data base.

PHP functions like soundex(), metaphone() are likely to be useless with non-Western languages.  However the levenshtein function might be able to help you.  It is computationally intensive, but with only 14,000 rows it would probably be OK.
http://us.php.net/manual/en/function.levenshtein.php
0
 
babak62Author Commented:
thanks for the tip and it looks interesting there are 2 downsides to this approach
1) this table will grow to 100K + titles
2) I looked at the examples but did not see a case relavant to my case also compare title field to itself.

would you please tell me how to incorporate this funtion in my case?
0
 
Ray PaseurCommented:
I am a little confused by this: "... eliminate the duplicate records since the DISTINCT does not eliminate the duplicate ones."

The intent of SELECT DISTINCT is to acquire one row for each distinct column value.  It is not designed to eliminate any rows - just to keep one in the results set.
0
 
babak62Author Commented:
Sorry for the wording by elimination I meant to not show the result in the result page I could better use the exclude word in this case.
0
 
Ray PaseurCommented:
SELECT DISTINCT should work correctly.  Since we cannot see the data I cannot be sure that two data elements are the same, but if they are binary equivalents they should work correctly with SELECT DISTINCT.
0
 
babak62Author Commented:
Is there anyway to compare to see if they are binary equivalents ?
0
 
Ray PaseurCommented:
Yes, I believe that you can do something like what is described here:
http://forums.mysql.com/read.php?103,156527,198794#msg-198794
0

Featured Post

Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

  • 7
  • 4
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now