Link to home
Start Free TrialLog in
Avatar of Elisys
Elisys

asked on

MS SQL - Remove Duplicate Values in a Column & Display All Columns (Not DISTINCT)

Dear Experts,

In my query result I need to show results from a SQL alarms history table for 5 columns where Col1 = Date&Time, Col2 = Label, Col3 = Priority, Col4 = Status, Col5 = Cycle. I am using the following syntax which shows everything.

SELECT TOP 50 App_DateTime, Label, Priority, Enum_Status, Cycle FROM History

However because an alarm is recorded every time it cycles (ON/OFF) it is recorded as a new entry into the database with a unique Date&Time. This produces duplicate column values for Label, Priority and Status. I want to display only the maximum Cycle count for each alarm entry and not all the Cycle values below that for the same alarm. Therefore I want to eliminate duplicate entries in the Label column.

The problem I have in using 'GROUP BY Label' is that when I apply aggregates to all the other SELECT fields to allow the GROUP BY to function to work then I only get a single row result. I am expecting to see a list of different alarms together with their date, time, label, priority and status and maximum cycle value.

I am new to SQL so may well be approaching this the wrong way. I have searched around and tried using SELECT DISTINCT but that will eliminate a row only if all its fields are the same. Clearly this wont work for me since I am trying to eliminate duplicate values from a column and not duplicate rows.

Thanks, Gary
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Elisys
Elisys

ASKER

Hi Scott, thanks for your answer. You wouldn't believe how long I have been trying to put this together. A brief explanation of how the query works would be very much appreciated for us SQL novices. G
I actually do believe it :-).  It was a real pain to do certain things like this before ROW_NUMBER() came along.

Basically ROW_NUMBER() provides an inline ordering that resets to 1 on each PARTITION break.

For example, say you had a list of states and cities.  You could PARTITION BY state ORDER BY city, and you'd get a result like:
1 CA Sacramento
2 CA San Diego
3 CA San Francisco
1 TN Memphis --note that the state change causes the number to reset
2 TN Nashville --
1 WY ...


Now, let's go back to:
PARTITION BY Label ORDER BY Cycle DESC
We can see that ROW_NUMBER/row_num 1 will always be the highest cycle number for each label, which is what we want to list, so we restrict the output to:
row_num = 1

This is wonderfully useful/flexible.  For example, to get the top 3, you could use:
row_num BETWEEN 1 AND 3

How sweet is that!