Link to home
Start Free TrialLog in
Avatar of aguawebdesign
aguawebdesignFlag for United States of America

asked on

SQL - Selecting a distinct combination of columns

The following SQL query is pulling duplicate records for each tid column.  The problem is that each tid has multiple (non-distinct) entries in the cert_date field.  How can I filter out only the oldest instance (based on cert_date) for each tid?
SELECT DISTINCT EXAM_Test.tid, course.course_number, course.title, course.contract_hours, user_account.fname, user_account.lname, user_account.aarc_number, user_account.state, user_course.cert_date FROM course
INNER JOIN EXAM_Question ON course.course_id = EXAM_Question.course_id
INNER JOIN EXAM_TestQuestions ON EXAM_Question.qid = EXAM_TestQuestions.qid
INNER JOIN EXAM_Test ON EXAM_TestQuestions.tid = EXAM_Test.tid
INNER JOIN user_account ON EXAM_Test.uid = user_account.uid
INNER JOIN user_course ON user_account.uid = user_course.uid
WHERE aarc = 1 AND pass_fail = 1 AND cert_date >= 2008-01-01 AND cert_date >= 2008-12-31
ORDER BY course_number ASC, cert_date DESC

Open in new window

Avatar of ragoran
ragoran
Flag of Canada image

Add this inner join:

Inner join (select UID, min(Cert_Date) as OldestCertDate from User_Course group by UID) Oldest on oldest.uid = user_couse.uid and odlest.oldestcertdate = user_course.cert_date
Avatar of aguawebdesign

ASKER

I had to modify your code since it had a couple of typos...  but even so it isn't working.  It's now filtering out too many results.  Some records that should be appearing are missing.


Avatar of Sharath S

Post some sample set from your tables and expected result. It will be done with proper Grouping and JOINs.
Avatar of folderol
folderol

since you appear to need only the oldest cert_date, you don't need to join the table AND a subquery, just the subquery will do.

Your WHERE clause appears to me to be malformed, both equalities were >=, shouldn't the second be <= ?  I made the change by using the between keyword.
The dates as strings must be enclosed within single quotes, else sql will not implicitly convert them to datetimes.

Tom.
SELECT DISTINCT 
EXAM_Test.tid, course.course_number, course.title, course.contract_hours, 
user_account.fname, user_account.lname, user_account.aarc_number, user_account.state, 
user_course.cert_date 
FROM course
INNER JOIN EXAM_Question ON course.course_id = EXAM_Question.course_id
INNER JOIN EXAM_TestQuestions ON EXAM_Question.qid = EXAM_TestQuestions.qid
INNER JOIN EXAM_Test ON EXAM_TestQuestions.tid = EXAM_Test.tid
INNER JOIN user_account ON EXAM_Test.uid = user_account.uid
INNER JOIN 
(Select min(cert_date) as first_cert, uid from user_course group by uid)
as oldest
ON user_account.uid = oldest.uid
WHERE aarc = 1 AND pass_fail = 1 AND first_cert between '2008-01-01' AND '2008-12-31'
ORDER BY course_number ASC, first_cert DESC

Open in new window

folderol:

I realized my WHERE clause had some issues after I posted it... thanks for correcting.

Your suggested query returned two errors:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "user_course.cert_date" could not be bound.
Msg 145, Level 15, State 1, Line 1
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

In reference to above, we can remove the first_cert from the ORDER BY clause - it's not really needed once the duplicates are removed.

I don't know how to fix the first error:  The multi-part identifier "user_course.cert_date" could not be bound.
ASKER CERTIFIED SOLUTION
Avatar of folderol
folderol

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The correction is on line 4.
folderol,

Unfortunately your query is filtering out too many records.  I filtered both queries (my query with the duplicates and yours) for a specific course_number to minimize the results set to something I could eyeball.  After de-duping the results from my original recordset, I get 55 records.  But your query only returns 38 of those 55 records.

Any ideas?
Is it possible for you to provide some sample set as i asked you before. This is not difficult one if you use JOINs and GROUP properly.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I was afraid of that - I ran into another design issue with this database as well.  It was designed years ago by another developer and is not optimized for data mining.

I have come up with a work around that involves modifying one of the tables that will work for future - I may just go this route and manually de-dup the results using Excel for now.

Thanks for your help.
Sharath_123,

specifically what do you want to see?  the data is being pulled from 5 different tables.  do you want to see how the tables are setup?  because i don't think posting sample data from the query will be enough without knowing how the tables are configured.  if you want to take a look, let me know specifically what you would like to see.
I don't want sample data from the query. Provide sample data from all your table and expected result if possible. It would be good if you can provide how the tables are configured.
I've talked with my client and we've decided on another workaround (adding a field to a table) that is much quicker and will solve the issue moving forward.  They've decided to de-dup the older data manually...