Solved

Distinct Records

Posted on 2011-02-19
14
430 Views
Last Modified: 2012-05-11
Hi;
This is the query I use to search my table and return distinct titles but I still have titles which look exact the same or are 99.9% the same.  Is there anyway to eliminate the 100% duplicate records and address records which are 99.9% close?

SELECT DISTINCT a.title AS title, s.type AS type, a.id AS id
FROM rss_articles a LEFT JOIN rss_sources s ON (s.id = source_id)
WHERE a.type!='trashed' AND ((a.`timestamp` > UNIX_TIMESTAMP(NOW())) - 86400) AND (MATCH (title,description) AGAINST ('keyword' IN BOOLEAN MODE))
ORDER BY a.id DESC
SELECT DISTINCT a.title AS title, s.type AS type, a.id AS id
FROM rss_articles a LEFT JOIN rss_sources s ON (s.id = source_id)
WHERE a.type!='trashed' AND ((a.`timestamp` > UNIX_TIMESTAMP(NOW())) - 86400) AND (MATCH (title,description) AGAINST ('keyword' IN BOOLEAN MODE)) 
ORDER BY a.id DESC

Open in new window

0
Comment
Question by:babak62
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
  • 2
  • +1
14 Comments
 
LVL 26

Expert Comment

by:tigin44
ID: 34934832


you should use the SOUNDEX function to eliminate the closely matching duplicates.
0
 

Author Comment

by:babak62
ID: 34934863
Can you please tell me how to incorporate that in this query?
0
 
LVL 41

Expert Comment

by:Sharath
ID: 34934918
Can you post such duplicate records from your query.
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 

Author Comment

by:babak62
ID: 34935079
yes I can but in my table there are more than 14000 rescords and among them maybe 20% of different combinations then do I have to comeup with all the combinations for all the queries?  One more thing this would eliminate the like type sentences but how about the exact duplicates which I can not eliminate with DISTINCT?

These sentences are in persian and I do not know if they comeup on your browser with DISTINCT I get 2 of these

¿¿¿ ¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿ ¿¿ ¿¿ ¿¿¿¿¿ ¿¿¿¿¿ ¿¿¿
0
 

Author Comment

by:babak62
ID: 34935080
I put it in the code are maybe it shows up there
¿¿¿ ¿¿¿¿¿¿¿ ¿¿¿¿¿¿¿ ¿¿ ¿¿ ¿¿¿¿¿ ¿¿¿¿¿ ¿¿¿

Open in new window

0
 
LVL 41

Expert Comment

by:Sharath
ID: 34935759
I could not able to see your data.
0
 

Author Comment

by:babak62
ID: 34935953
I know that is in persian and can not be visable here but even without date my question is that is there anyother way but distinct funciton to eliminate the duplicate records since the DISTINCT does not eliminate the duplicate ones.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34937795
You might try GROUP BY.  Sorry that is only a guess.  I have not used Persian characters in a data base.

PHP functions like soundex(), metaphone() are likely to be useless with non-Western languages.  However the levenshtein function might be able to help you.  It is computationally intensive, but with only 14,000 rows it would probably be OK.
http://us.php.net/manual/en/function.levenshtein.php
0
 

Author Comment

by:babak62
ID: 34938633
thanks for the tip and it looks interesting there are 2 downsides to this approach
1) this table will grow to 100K + titles
2) I looked at the examples but did not see a case relavant to my case also compare title field to itself.

would you please tell me how to incorporate this funtion in my case?
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34938790
I am a little confused by this: "... eliminate the duplicate records since the DISTINCT does not eliminate the duplicate ones."

The intent of SELECT DISTINCT is to acquire one row for each distinct column value.  It is not designed to eliminate any rows - just to keep one in the results set.
0
 

Author Comment

by:babak62
ID: 34938875
Sorry for the wording by elimination I meant to not show the result in the result page I could better use the exclude word in this case.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34938929
SELECT DISTINCT should work correctly.  Since we cannot see the data I cannot be sure that two data elements are the same, but if they are binary equivalents they should work correctly with SELECT DISTINCT.
0
 

Author Comment

by:babak62
ID: 34938948
Is there anyway to compare to see if they are binary equivalents ?
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 34942708
Yes, I believe that you can do something like what is described here:
http://forums.mysql.com/read.php?103,156527,198794#msg-198794
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question