Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 910
  • Last Modified:

SQL - Selecting a distinct combination of columns

The following SQL query is pulling duplicate records for each tid column.  The problem is that each tid has multiple (non-distinct) entries in the cert_date field.  How can I filter out only the oldest instance (based on cert_date) for each tid?
SELECT DISTINCT EXAM_Test.tid, course.course_number, course.title, course.contract_hours, user_account.fname, user_account.lname, user_account.aarc_number, user_account.state, user_course.cert_date FROM course
INNER JOIN EXAM_Question ON course.course_id = EXAM_Question.course_id
INNER JOIN EXAM_TestQuestions ON EXAM_Question.qid = EXAM_TestQuestions.qid
INNER JOIN EXAM_Test ON EXAM_TestQuestions.tid = EXAM_Test.tid
INNER JOIN user_account ON EXAM_Test.uid = user_account.uid
INNER JOIN user_course ON user_account.uid = user_course.uid
WHERE aarc = 1 AND pass_fail = 1 AND cert_date >= 2008-01-01 AND cert_date >= 2008-12-31
ORDER BY course_number ASC, cert_date DESC

Open in new window

0
aguawebdesign
Asked:
aguawebdesign
  • 7
  • 4
  • 3
  • +1
2 Solutions
 
ragoranCommented:
Add this inner join:

Inner join (select UID, min(Cert_Date) as OldestCertDate from User_Course group by UID) Oldest on oldest.uid = user_couse.uid and odlest.oldestcertdate = user_course.cert_date
0
 
aguawebdesignAuthor Commented:
I had to modify your code since it had a couple of typos...  but even so it isn't working.  It's now filtering out too many results.  Some records that should be appearing are missing.


0
 
SharathData EngineerCommented:

Post some sample set from your tables and expected result. It will be done with proper Grouping and JOINs.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
folderolCommented:
since you appear to need only the oldest cert_date, you don't need to join the table AND a subquery, just the subquery will do.

Your WHERE clause appears to me to be malformed, both equalities were >=, shouldn't the second be <= ?  I made the change by using the between keyword.
The dates as strings must be enclosed within single quotes, else sql will not implicitly convert them to datetimes.

Tom.
SELECT DISTINCT 
EXAM_Test.tid, course.course_number, course.title, course.contract_hours, 
user_account.fname, user_account.lname, user_account.aarc_number, user_account.state, 
user_course.cert_date 
FROM course
INNER JOIN EXAM_Question ON course.course_id = EXAM_Question.course_id
INNER JOIN EXAM_TestQuestions ON EXAM_Question.qid = EXAM_TestQuestions.qid
INNER JOIN EXAM_Test ON EXAM_TestQuestions.tid = EXAM_Test.tid
INNER JOIN user_account ON EXAM_Test.uid = user_account.uid
INNER JOIN 
(Select min(cert_date) as first_cert, uid from user_course group by uid)
as oldest
ON user_account.uid = oldest.uid
WHERE aarc = 1 AND pass_fail = 1 AND first_cert between '2008-01-01' AND '2008-12-31'
ORDER BY course_number ASC, first_cert DESC

Open in new window

0
 
aguawebdesignAuthor Commented:
folderol:

I realized my WHERE clause had some issues after I posted it... thanks for correcting.

Your suggested query returned two errors:
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "user_course.cert_date" could not be bound.
Msg 145, Level 15, State 1, Line 1
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

0
 
aguawebdesignAuthor Commented:
In reference to above, we can remove the first_cert from the ORDER BY clause - it's not really needed once the duplicates are removed.

I don't know how to fix the first error:  The multi-part identifier "user_course.cert_date" could not be bound.
0
 
folderolCommented:
Sorry, my bad.  I saw that error in the code before I hit submit, I don't know how I managed not to edit it.

Tom.
SELECT DISTINCT 
EXAM_Test.tid, course.course_number, course.title, course.contract_hours, 
user_account.fname, user_account.lname, user_account.aarc_number, user_account.state, 
oldest.first_cert
FROM course
INNER JOIN EXAM_Question ON course.course_id = EXAM_Question.course_id
INNER JOIN EXAM_TestQuestions ON EXAM_Question.qid = EXAM_TestQuestions.qid
INNER JOIN EXAM_Test ON EXAM_TestQuestions.tid = EXAM_Test.tid
INNER JOIN user_account ON EXAM_Test.uid = user_account.uid
INNER JOIN 
(Select min(cert_date) as first_cert, uid from user_course group by uid)
as oldest
ON user_account.uid = oldest.uid
WHERE aarc = 1 AND pass_fail = 1 AND first_cert between '2008-01-01' AND '2008-12-31'
ORDER BY course_number ASC

Open in new window

0
 
folderolCommented:
The correction is on line 4.
0
 
aguawebdesignAuthor Commented:
folderol,

Unfortunately your query is filtering out too many records.  I filtered both queries (my query with the duplicates and yours) for a specific course_number to minimize the results set to something I could eyeball.  After de-duping the results from my original recordset, I get 55 records.  But your query only returns 38 of those 55 records.

Any ideas?
0
 
SharathData EngineerCommented:
Is it possible for you to provide some sample set as i asked you before. This is not difficult one if you use JOINs and GROUP properly.
0
 
folderolCommented:
I don't think we can help you there, that's not a syntax or structure issue, that's a design issue.  You are in the best position to figure out what it is about the 17 excluded rows that needs to be accounted for in the query to make it work as planned.
0
 
aguawebdesignAuthor Commented:
I was afraid of that - I ran into another design issue with this database as well.  It was designed years ago by another developer and is not optimized for data mining.

I have come up with a work around that involves modifying one of the tables that will work for future - I may just go this route and manually de-dup the results using Excel for now.

Thanks for your help.
0
 
aguawebdesignAuthor Commented:
Sharath_123,

specifically what do you want to see?  the data is being pulled from 5 different tables.  do you want to see how the tables are setup?  because i don't think posting sample data from the query will be enough without knowing how the tables are configured.  if you want to take a look, let me know specifically what you would like to see.
0
 
SharathData EngineerCommented:
I don't want sample data from the query. Provide sample data from all your table and expected result if possible. It would be good if you can provide how the tables are configured.
0
 
aguawebdesignAuthor Commented:
I've talked with my client and we've decided on another workaround (adding a field to a table) that is much quicker and will solve the issue moving forward.  They've decided to de-dup the older data manually...  
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 7
  • 4
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now