Solved

SQL: Union

Posted on 2014-10-21
16
351 Views
Last Modified: 2014-10-21
Is there a better way of doing the below? The first SELECT gets all payments, so all values are positive. The next SELECT statement gets all adjustments, so all values are negative. Together, they are my derived table (JOIN) to get total customer payments (payments - adjustments).
SELECT T.PartyId, SUM(T.Amount) as 'Amount'
FROM [Transaction] T
WHERE T.PaymentTypeId = 1 
GROUP BY T.PartyId

UNION

SELECT Tr.PartyId, SUM(Tr.Amount) as 'Amount'
FROM [Transaction] Tr JOIN [Transaction] T ON Tr.AdjustsTransactionId = T.TransactionId
WHERE Tr.PaymentTypeId = 9 AND 
T.PaymentTypeId = 1  
GROUP BY Tr.PartyId

Open in new window

Results might look like this:
PartyId         Amount
130528         4063.26
130528         -200.00

So I would probably need to drop the above code down a level into a derived table -- call it A. Then, SELECT a SUM of Amount FROM table A, and then group by PartyId. This will be derived table B.

And then from the main select clause, select the amount from table B.

Is there a better strategy?
0
Comment
Question by:pzozulka
  • 7
  • 5
  • 2
  • +1
16 Comments
 
LVL 65

Assisted Solution

by:Jim Horn
Jim Horn earned 50 total points
ID: 40394775
Not sure about 'better', although I'd recommend using UNION ALL instead of UNION, as UNION eliminates duplicates, and just to make sure that if there's not a duplicate row between the two ( negative payment / two adjustments of the same amount?) that the duplicate isn't eliminated.

>Together, they are my derived table (JOIN) to get total customer payments (payments - adjustments).
Show me what you mean  by this, as I don't see a JOIN here.   A subquery would work though, and depending on whate you're trying to do you can remove the SUM / GROUP BY in the main query above, and add it in the parent query below...

SELECT a.PartyID, a.Amount
FROM (the above query goes here) a

Open in new window

0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 450 total points
ID: 40394777
SELECT T.PartyId, SUM(T.Amount + ISNULL(T9.Amount, 0)) as 'Final_Amount'
FROM [Transaction] T
LEFT OUTER JOIN [Transaction] T9 ON
    T9.PaymentTypeId = 9 AND
    T9.AdjustsTransactionId = T.TransactionId
WHERE
    T.PaymentTypeId = 1
GROUP BY T.PartyId
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40394798
I didn't include the join in the above, but just wanted to make it known that the above code was part of my derived table. But here it is below, using your suggestion of UNION ALL. And just to make sure we're not removing dups, I added the transactionID for uniqueness.

Is there anything wrong with UNION as far as performance goes? My discomfort with it mostly because I don't use it often.

SELECT a.PartyId, SUM(a.amount)
FROM 
(
	SELECT T.TransactionId, T.PartyId, T.Amount
	FROM [Transaction] T
	WHERE T.PaymentTypeId = 1 

	UNION ALL

	SELECT Tr.TransactionId, Tr.PartyId, Tr.Amount
	FROM [Transaction] Tr JOIN [Transaction] T ON Tr.AdjustsTransactionId = T.TransactionId
	WHERE Tr.PaymentTypeId = 9 AND 
	T.PaymentTypeId = 1
) a
GROUP BY a.PartyId

Open in new window

0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 450 total points
ID: 40394804
>> Is there anything wrong with UNION as far as performance goes? <<

You're scanning the [Transaction] table at least one extra time, then you have additional I/O to combine the results from the separate scans.  If possible, it's best to do it in a single query, as I posted above.
0
 
LVL 65

Assisted Solution

by:Jim Horn
Jim Horn earned 50 total points
ID: 40394817
>Is there anything wrong with UNION as far as performance goes?
UNION eliminates duplicates, which takes some extra time.  UNION ALL does not eliminate duplicates, so no extra time.

Having said that, if you can get to a JOIN solution such as Scott's above, as his statement is correct in that it would eliminate the second SEEK/SCAN on Transaction.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40394852
Interesting...when I ran both queries with "Include Actual Execution Plan", Scott's solution took 59% of the batch, while the UNION took 41%.

If you view Estimated Execution plan, then they say both 50%.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40394857
What's more interesting is that I have to do this twice (two derived tables, in other words, two left joins) because the report requires two different columns, that have slightly different criteria.

This code, take 61% of batch as compared to same criteria using UNION ALL:
SELECT T.PartyId, SUM(T.Amount + ISNULL(T9.Amount, 0)) as 'Final_Amount'
FROM [Transaction] T
LEFT OUTER JOIN [Transaction] T9 ON
    T9.PaymentTypeId = 9 AND
    T9.AdjustsTransactionId = T.TransactionId
WHERE 
    T.PaymentTypeId = 1
GROUP BY T.PartyId

Open in new window


While this code, takes 50% of batch:
SELECT T.PartyId, SUM(T.Amount + ISNULL(T9.Amount, 0)) as 'Final_Amount'
FROM [Transaction] T
LEFT OUTER JOIN [Transaction] T9 ON
    T9.PaymentTypeId = 9 AND
    T9.AdjustsTransactionId = T.TransactionId
WHERE 
    T.PaymentTypeId = 3 AND T.PaymentMethodId = 1
GROUP BY T.PartyId

Open in new window

0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 450 total points
ID: 40394893
The "% of batch" value is useless.  Instead, issue these statements:

SET STATISTICS IO ON
SET STATISTICS TIME ON

and then run both query.  Look at the logical I/Os only, not the physical; in fact, for a more accurate comparison, confirming that there are no physical I/Os in either query, only logical.  That will provide the best comparison.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40394926
Thanks Scott, do I have to SET these values to OFF after I run the comparison or do they automatically turn off for future queries?
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40394940
OK, so as suggested the stats are below:

UNION query: CPU time = 156 ms, elapsed time = 239 ms, logical reads = 13651, physical reads = 0
JOIN query:     CPU time = 187 ms, elapsed time = 269 ms, logical reads = 10938, physical reads = 0
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40394959
So based on what you said, "look at logical I/Os only", it would seem that the JOIN query has better performance, even though the elapsed time is longer?

These stats don't make sense to me. You would think that the less logical reads, the smaller the elapsed time?
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40395021
You would indeed think that.  But remember, we're dealing with 30ms or less, very likely across hyper-threaded CPUs, so that is negligible.

I'm surprised the I/O is that close, though, only 25% more.  Be sure you're not comparing a "double lookup" in the JOIN:
WHERE
    T.PaymentTypeId = 3 AND T.PaymentMethodId = 1
to a single lookup in the UNION ALL.
0
 
LVL 32

Expert Comment

by:awking00
ID: 40395078
select PartyId,
sum(case when PaymentTypeId = 1 then Amount else 0 end) -
sum(case when PaymentTypeId = 9 then Amount else 0 end) as Amount
from Transaction
group by PartyId;
0
 
LVL 32

Expert Comment

by:awking00
ID: 40395097
>>Results might look like this:
PartyId         Amount
130528         4063.26
130528         -200.00<<
In my above query, the results would look like this:
PartyId         Amount
130528         3863.26
To get your results -
select PartyId,
sum(case when PaymentTypeId = 1 then Amount else 0 end) Amount
from Transaction
group by PartyId
union all
select PartyId,
sum(case when PaymentTypeId = 9 then Amount else 0 end) * -1 as Amount
from Transaction
group by PartyId;
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40395136
awking00: I don't think this would work for me. Notice the below line of code from Scott's query:

T9.AdjustsTransactionId = T.TransactionId

This ensures that all PaymentTypeId that equal 9 (Adjustments) are related to the original transactionId.
I can't just say ALL transactions of PaymentTypeId = 9 because then it will find unrelated transactions as well.

Hope that makes sense.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40395275
Also, awking is subtracting a negative number, which will add it instead; i.e.:
If the base data is this:
 PartyId         Amount
 130528         4063.26
 130528         -200.00<<
The query above will yield this, which is incorrect:
 PartyId         Amount
 130528         4263.26
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
VBScript Write Column Headers 3 37
SQL Querying data from 3 tables, all with 1 common column 4 35
SQL Insert parts by customer 12 33
Inserting oldest record into new table. 5 23
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question