pstre
asked on
SQL Statement - Using Distinct
I have a SQL statement using DISTINCT - see below - that is not working as we thought. The Distinct doesn't need to be on the "A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET" How do I write this statement that using DISTINCE on all other fields except the amount fields. Thank you!
SELECT DISTINCT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A. TRXDATE) AS PERIODID, B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 AS DEPARTMENT, B.ACTNUMBR_3 AS PROVIDER
, B.ACTNUMST, A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET
SELECT DISTINCT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.
, B.ACTNUMST, A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET
What you are asking for is an aggregate and since you have calculated columns we should use a subquery.
SELECT OPENYEAR, TRXDATE, PERIODID, DIVISION, DEPARTMENT, PROVIDER, ACTNUMST,
SUM(dbtamt) as DEBITAMT, SUM(crdamt) as CRDTAMNT, SUM(thenet) as NET
FROM (
SELECT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.TRXDATE) AS PERIODID, B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 AS DEPARTMENT, B.ACTNUMBR_3 AS PROVIDER
, B.ACTNUMST, A.DEBITAMT AS dbtamt, A.CRDTAMNT as crdamt, A.DEBITAMT - A.CRDTAMNT AS thenet
FROM ....
) subq
GROUP BY OPENYEAR, TRXDATE, PERIODID, DIVISION, DEPARTMENT, PROVIDER, ACTNUMST
what is the PK - unique column in your A. table
you can use Distinct , group by ,etc
as per by: jimhorn post Group by looks like a better tool
try:
SELECT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A. TRXDATE) AS PERIODID, B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 AS DEPARTMENT, B.ACTNUMBR_3 AS PROVIDER
, B.ACTNUMST,
Sum(A.DEBITAMT), Sum(A.CRDTAMNT), Sum(A.DEBITAMT - A.CRDTAMNT) AS NET
from your table A
Group by A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A. TRXDATE) , B.ACTNUMBR_1 AS DIVISION, B.ACTNUMBR_2 , B.ACTNUMBR_3
, B.ACTNUMST
you can use Distinct , group by ,etc
as per by: jimhorn post Group by looks like a better tool
try:
SELECT A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.
, B.ACTNUMST,
Sum(A.DEBITAMT), Sum(A.CRDTAMNT), Sum(A.DEBITAMT - A.CRDTAMNT) AS NET
from your table A
Group by A.OPENYEAR, A.TRXDATE, dbo.pdc_GetFiscalPeriod(A.
, B.ACTNUMST
>>Distinct doesn't need to be on the "A.DEBITAMT, A.CRDTAMNT, A.DEBITAMT - A.CRDTAMNT AS NET"
Ask yourself: How does the query know this?
No Points please
--------------
Following is part of an article I'm writing, not yet published, hope you like it
--------------
...
The first thing to recognize about SELECT DISTINCT is that overall it’s a bit boring. It’s isn’t a function so if you are using a query like this:
And: With or without the parentheses, SELECT DISTINCT does not know that you really want it to concentrate on column1 or (column1 and column2) ignoring the other columns.
SELECT DISTINCT is a “row operator”, not a function, and not magic. It will evaluate ALL of the columns that you have listed and remove rows that have exactly the same values in all columns so that each row returned is different to every other row that is returned. That might mean for example that a value containing an exclamation in column27 and a value that does not have an exclamation in column27 causes 2 rows to be returned even if every other column has the same values.
e.g. accept the maximum values of each
Ask yourself: How does the query know this?
No Points please
--------------
Following is part of an article I'm writing, not yet published, hope you like it
--------------
...
The first thing to recognize about SELECT DISTINCT is that overall it’s a bit boring. It’s isn’t a function so if you are using a query like this:
SELECT DISTINCT (column1), column2, ... column40
You will be disappointed to learn that those parentheses around column1 do NOT work to “make column1 unique, then by magic decide how to simplify the other columns”. And: With or without the parentheses, SELECT DISTINCT does not know that you really want it to concentrate on column1 or (column1 and column2) ignoring the other columns.
SELECT DISTINCT is a “row operator”, not a function, and not magic. It will evaluate ALL of the columns that you have listed and remove rows that have exactly the same values in all columns so that each row returned is different to every other row that is returned. That might mean for example that a value containing an exclamation in column27 and a value that does not have an exclamation in column27 causes 2 rows to be returned even if every other column has the same values.
col1 col2 col3 col4 col5 col6 col7 col8 col9 cl10 cl11 cl12 ... col27
aaaa bbbb cccc dddd eeee ffff gggg hhhh iiii jjjj kkkk llll ... blah! <<I'm unique!
aaaa bbbb cccc dddd eeee ffff gggg hhhh iiii jjjj kkkk llll ... blah << me too!
Small note: I personally would not describe this as “having duplicates”, I prefer to describe this as “unwanted repetition”.
So, what to do?
First step: Concentrate on the columns that MUST be unique, e.g.
select
column1 --<< MUST be unique columns here
from atable
group by
column1 --<< MUST be unique columns here
Second step: Decide what can be done with the other columns, column1 --<< MUST be unique columns here
from atable
group by
column1 --<< MUST be unique columns here
e.g. accept the maximum values of each
select
column1
, MAX(column2), MAX(column3), ... MAX(column27) ...
from atable
group by
column1 --<< MUST be unique columns here
Note: A query using GROUP BY is not nearly as boring as poor old “SELECT DISTINCT” and GROUP BY does offer quite a number of interesting options such as MAX(), MIN(), AVERAGE() etc.
...
column1
, MAX(column2), MAX(column3), ... MAX(column27) ...
from atable
group by
column1 --<< MUST be unique columns here
Note: A query using GROUP BY is not nearly as boring as poor old “SELECT DISTINCT” and GROUP BY does offer quite a number of interesting options such as MAX(), MIN(), AVERAGE() etc.
ASKER
To Eugene2: I don't have a PK. Not sure what you are referring to..
>>Not sure what you are referring to..
Believe the question is "do you have a Primary Key (PK) an table A"
& Mmmm, just had another look and noticed you are using a scalar function. I'd suggest minimizing the number of calls made to that function by running it after the group by, like this:
Believe the question is "do you have a Primary Key (PK) an table A"
& Mmmm, just had another look and noticed you are using a scalar function. I'd suggest minimizing the number of calls made to that function by running it after the group by, like this:
SELECT
OPENYEAR
, TRXDATE
, dbo.pdc_GetFiscalPeriod(derived.TRXDATE) AS PERIODID
, DIVISION
, DEPARTMENT
, PROVIDER
, ACTNUMST
, DEBITAMT
, CRDTAMNT
, NET
FROM (
SELECT
A.OPENYEAR
, A.TRXDATE
, B.ACTNUMBR_1 AS DIVISION
, B.ACTNUMBR_2 AS DEPARTMENT
, B.ACTNUMBR_3 AS PROVIDER
, B.ACTNUMST
, SUM(A.DEBITAMT) AS DEBITAMT
, SUM(A.CRDTAMNT) AS CRDTAMNT
, SUM(A.DEBITAMT - A.CRDTAMNT) AS NET
FROM your_table AS A
JOIN other_table AS B ON A.x = B.y
GROUP BY A.OPENYEAR
, A.TRXDATE
, B.ACTNUMBR_1
, B.ACTNUMBR_2
, B.ACTNUMBR_3
, B.ACTNUMST
) AS derived
pstre:
did you try my code?
regarding "Not sure what you are referring to.. "
check my post
it is about your table records uniqueness
let say: like identity column - it is always unique ( normally)
did you try my code?
regarding "Not sure what you are referring to.. "
check my post
it is about your table records uniqueness
let say: like identity column - it is always unique ( normally)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If dex_row_id is some kind of key, then I believe PortletPaul's comment here would have led you to that conclusion.
pstre
Seeking advice from a colleague who knows your data model was a smart move.
You might however ponder how we could know that a field called [Dex_Row_ID] existed (we just didn't know that) so it would not be possible for us to suggest it.
Still, it was my hope you would reconsider a reliance on "select distinct" - over time maybe you will learn the wisdom of only using "select distinct" when it is appropriate. In particular, for this query, you might want to re-consider using it when also calling user defined functions.
Regards, Paul.
Seeking advice from a colleague who knows your data model was a smart move.
You might however ponder how we could know that a field called [Dex_Row_ID] existed (we just didn't know that) so it would not be possible for us to suggest it.
Still, it was my hope you would reconsider a reliance on "select distinct" - over time maybe you will learn the wisdom of only using "select distinct" when it is appropriate. In particular, for this query, you might want to re-consider using it when also calling user defined functions.
Regards, Paul.
ASKER
None of the solutions provided worked. A Sr. IT Analyst in my area know our code well and he was able to add Dex_Row_ID in the SQL statement to resolve the issue. Thank you!
>How do I write this statement that using DISTINCT on all other fields except the amount fields.
The easiest way would be to GROUP BY all columns except for the ones you want. Keep in mind that this will result in duplicate rows for all columns except for amount
Open in new window
Btw, two other things...