asked on

Select distinct records from Join statement

On the following query I am looking for distinct records. Many of the records are duplicates. How can I select only the unique, distinct records. The whole row needs to be compared to other rows. I am having a brain fart. Would it be by adding group by H.Agency, H.ActivityDate, H.Classification,H.BasedOn, H.HourTimeFrom, H.HourTimeTo, H.Hours, H.Duration, .Frequency, H.ActivityType, H.ActivityOther, H.Objectives, H.Strategy, H.Need, H.NeedOther, H.Outcome, A.ActivityName, H.Narrative, H.EntryTime, H.Fiscal, H.Unregistered
at the end?

SELECT H.Agency, H.ActivityDate, H.Classification,H.BasedOn, H.HourTimeFrom,
 H.HourTimeTo, H.Hours, H.Duration, H.Frequency, H.ActivityType, H.ActivityOther,
  H.Objectives, H.Strategy, H.Need, H.NeedOther,
 H.Outcome, A.ActivityName, H.Narrative, H.EntryTime, H.Fiscal, H.Unregistered
FROM tblOrgHours H 
	INNER JOIN tblOrgActivities A 
		ON H.ActivityID = A.ActivityID 
WHERE H.Unregistered > 0

Open in new window

Raja Jegan R

you can use DISTINCT to get unique records. If you want to get unique records at selected column level, then provide us some sample data and expected data to help you out..

SELECT DISTINCT H.Agency, H.ActivityDate, H.Classification,H.BasedOn, H.HourTimeFrom,
 H.HourTimeTo, H.Hours, H.Duration, H.Frequency, H.ActivityType, H.ActivityOther,
  H.Objectives, H.Strategy, H.Need, H.NeedOther,
 H.Outcome, A.ActivityName, H.Narrative, H.EntryTime, H.Fiscal, H.Unregistered
FROM tblOrgHours H 
	INNER JOIN tblOrgActivities A 
		ON H.ActivityID = A.ActivityID 
WHERE H.Unregistered > 0

Open in new window

al4629740

ASKER

can I use group by?

ASKER CERTIFIED SOLUTION

Raja Jegan R

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

al4629740

ASKER

And that will get the same results?

What does without aggregation mean?

Mark Wills

select distinct <data>

is equivalent to

select <data>
group by <data>

it mostly depends on the where clause and joins as to how / allowing SQL optimizer to exclude rows from the comparison.

Essentially, DISTINCT will gather all information, do any cast or converts (etc) and then decide DISTINCT
Group By can (not always) filter out some data before having to decide the result set.

If you look at the logical processing order of execution :
1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10 ORDER BY
11. TOP

You see DISTINCT happening late in the cycle - so - if you cant eliminate unwanted information first, it has a potential truckload of work to decide distinct.

In your example, there is not much difference and there are no aggregations - so DISTINCT is probably fine

if ever in doubt, eliminate the unwanted stuff first with a CTE
with cte as
(select <data> where conditions and manipulations occur and exclusions )
select distinct <refined data>
from CTE

aggregation means you want to use SUM() or COUNT() an element not contained in your group by, and distinct wont allow aggregations per se - that is when you need to use grouping (and grouping can include window functions using the OVER() clause)

but you have already selected your answer :) I just wish I remembered to hit submit when I started answering....

Raja Jegan R

>> What does without aggregation mean?

As Mark mentioned above, Aggregation operations refers to Min(), MAX(), SUM(), COUNT(), etc., Since your query doesn't have any of the aggregations, you can use either DISTINCT or GROUP BY. If you have aggregations, then you might need to use GROUP BY only..