asked on

SQL Statement - Using Distinct

I have a SQL statement using DISTINCT - see below - that is not working as we thought. The Distinct doesn't need to be on the "A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET" How do I write this statement that using DISTINCE on all other fields except the amount fields. Thank you!

SELECT DISTINCT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.TRXDATE) AS PERIODID, B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 AS DEPARTMENT, B.ACTNUMBR_3 AS PROVIDER
, B.ACTNUMST, A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET

Jim Horn

DISTINCT works on all columns in the SELECT clause.

>How do I write this statement that using DISTINCT on all other fields except the amount fields.
The easiest way would be to GROUP BY all columns except for the ones you want. Keep in mind that this will result in duplicate rows for all columns except for amount

SELECT Column1, Column2, Column3, Sum(Amount) as the_amount
FROM YourTable
GROUP BY Column1, Column2, Column3

Open in new window

Btw, two other things...

Most people aren't going to remember what a and b stands for, so I recommend using a table alias that is more intuitive to what the table is, like s for sales, o for orders, c for customers, etc.

Recommend avoiding the function in the SELECT clause, as it can fire once for each row, which is going to slow down query processing. Can expand on that later if you want.

Beartlaoi

What you are asking for is an aggregate and since you have calculated columns we should use a subquery.

SELECT OPENYEAR, TRXDATE, PERIODID, DIVISION, DEPARTMENT, PROVIDER, ACTNUMST,
   SUM(dbtamt) as DEBITAMT, SUM(crdamt) as CRDTAMNT, SUM(thenet) as NET
   FROM (
        SELECT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.TRXDATE) AS PERIODID, B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 AS DEPARTMENT, B.ACTNUMBR_3 AS PROVIDER
      , B.ACTNUMST, A.DEBITAMT AS dbtamt, A.CRDTAMNT as crdamt, A.DEBITAMT - A.CRDTAMNT AS thenet
        FROM ....
) subq
GROUP BY OPENYEAR, TRXDATE, PERIODID, DIVISION, DEPARTMENT, PROVIDER, ACTNUMST

Open in new window

EugeneZ

what is the PK - unique column in your A. table

you can use Distinct , group by ,etc
as per by: jimhorn post Group by looks like a better tool
try:

SELECT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.TRXDATE) AS PERIODID, B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 AS DEPARTMENT, B.ACTNUMBR_3 AS PROVIDER
, B.ACTNUMST,
Sum(A.DEBITAMT), Sum(A.CRDTAMNT), Sum(A.DEBITAMT - A.CRDTAMNT) AS NET
from your table A
Group by A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.TRXDATE) , B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 , B.ACTNUMBR_3
, B.ACTNUMST

PortletPaul

>>Distinct doesn't need to be on the "A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET"

Ask yourself: How does the query know this?

No Points please
--------------
Following is part of an article I'm writing, not yet published, hope you like it
--------------
...
The first thing to recognize about SELECT DISTINCT is that overall it’s a bit boring. It’s isn’t a function so if you are using a query like this:

SELECT DISTINCT (column1), column2, ... column40

You will be disappointed to learn that those parentheses around column1 do NOT work to “make column1 unique, then by magic decide how to simplify the other columns”.

And: With or without the parentheses, SELECT DISTINCT does not know that you really want it to concentrate on column1 or (column1 and column2) ignoring the other columns.

SELECT DISTINCT is a “row operator”, not a function, and not magic. It will evaluate ALL of the columns that you have listed and remove rows that have exactly the same values in all columns so that each row returned is different to every other row that is returned. That might mean for example that a value containing an exclamation in column27 and a value that does not have an exclamation in column27 causes 2 rows to be returned even if every other column has the same values.

col1 col2 col3 col4 col5 col6 col7 col8 col9 cl10 cl11 cl12 ... col27
aaaa bbbb cccc dddd eeee ffff gggg hhhh iiii jjjj kkkk llll ... blah! <<I'm unique!
aaaa bbbb cccc dddd eeee ffff gggg hhhh iiii jjjj kkkk llll ... blah  << me too!

Open in new window

Small note: I personally would not describe this as “having duplicates”, I prefer to describe this as “unwanted repetition”.

So, what to do?

First step: Concentrate on the columns that MUST be unique, e.g.

select
column1 --<< MUST be unique columns here
from atable
group by
column1 --<< MUST be unique columns here

Second step: Decide what can be done with the other columns,
e.g. accept the maximum values of each

select
column1
, MAX(column2), MAX(column3), ... MAX(column27) ...
from atable
group by
column1 --<< MUST be unique columns here

Note: A query using GROUP BY is not nearly as boring as poor old “SELECT DISTINCT” and GROUP BY does offer quite a number of interesting options such as MAX(), MIN(), AVERAGE() etc.

...

pstre

ASKER

To Eugene2: I don't have a PK. Not sure what you are referring to..

PortletPaul

>>Not sure what you are referring to..
Believe the question is "do you have a Primary Key (PK) an table A"

& Mmmm, just had another look and noticed you are using a scalar function. I'd suggest minimizing the number of calls made to that function by running it after the group by, like this:

SELECT
        OPENYEAR
      , TRXDATE
      , dbo.pdc_GetFiscalPeriod(derived.TRXDATE) AS PERIODID
      , DIVISION
      , DEPARTMENT
      , PROVIDER
      , ACTNUMST
      , DEBITAMT
      , CRDTAMNT
      , NET
FROM (
        SELECT
                A.OPENYEAR
              , A.TRXDATE
              , B.ACTNUMBR_1                 AS DIVISION
              , B.ACTNUMBR_2                 AS DEPARTMENT
              , B.ACTNUMBR_3                 AS PROVIDER
              , B.ACTNUMST
              , SUM(A.DEBITAMT)              AS DEBITAMT
              , SUM(A.CRDTAMNT)              AS CRDTAMNT
              , SUM(A.DEBITAMT - A.CRDTAMNT) AS NET
        FROM your_table AS A
        JOIN other_table AS B ON A.x = B.y
        GROUP BY A.OPENYEAR
               , A.TRXDATE
               , B.ACTNUMBR_1
               , B.ACTNUMBR_2
               , B.ACTNUMBR_3
               , B.ACTNUMST
     ) AS derived

Open in new window

EugeneZ

pstre:

did you try my code?

regarding "Not sure what you are referring to.. "
check my post
it is about your table records uniqueness

let say: like identity column - it is always unique ( normally)

ASKER CERTIFIED SOLUTION

pstre

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Jim Horn

If dex_row_id is some kind of key, then I believe PortletPaul's comment here would have led you to that conclusion.

PortletPaul

pstre

Seeking advice from a colleague who knows your data model was a smart move.

You might however ponder how we could know that a field called [Dex_Row_ID] existed (we just didn't know that) so it would not be possible for us to suggest it.

Still, it was my hope you would reconsider a reliance on "select distinct" - over time maybe you will learn the wisdom of only using "select distinct" when it is appropriate. In particular, for this query, you might want to re-consider using it when also calling user defined functions.

Regards, Paul.

pstre

ASKER

None of the solutions provided worked. A Sr. IT Analyst in my area know our code well and he was able to add Dex_Row_ID in the SQL statement to resolve the issue. Thank you!