[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 292
  • Last Modified:

SQL Query: Store calculation to be used later in query

I'm building a report for a client which has millions of records so performance is key.

Column1 (AvgDailyBalance - line 83): The requirements requested include a column which is a calculation of amounts ($$$).
Column2 (InterestExpense): The next column asks to multiply the first column by a constant number. For the sake of performance, I don't want to run my calculations for 1st column again. I was thinking of somehow storing the results of the orginal calculations to get the amount for column1, and then using the stored results for Column2.

Not sure how to accomplish this.

DECLARE @startDate datetime = '4/1/2012'        
DECLARE @endDate datetime = '12/31/2012' 
DECLARE @reportType VARCHAR(50) = 'Agent'
DECLARE @monthlyCostPerAccount MONEY = 25.00
DECLARE @costOfFunds MONEY = 5.00

SELECT DISTINCT 
	act.GlAccountTypeId, act.Name AS AccountType, at.GlAbstractTypeId, at.Name AS AbstractType, amt.GlAccountingMapTypeId, 
	amt.Name AS AccountingMapType, a.GlAccountId, a.AccountName, ammt.GlAccountingMapModifierTypeId, ammt.Name AS GlAccountingMapModifierType
INTO 
	#tmpMap
FROM  
	(
		SELECT DISTINCT 
			GlAccountingMapTypeId, GlAbstractTypeId, GlAccountId, GlAccountingMapModifierTypeId
		FROM 
			GlAccountingMap
	) am
	JOIN GlAbstractType at ON am.GlAbstractTypeId = at.GlAbstractTypeId
	JOIN GlAccount a ON am.GlAccountId = a.GlAccountId
	JOIN GlAccountType act ON a.GlAccountTypeId = act.GlAccountTypeId
	JOIN GlAccountingMapType amt ON am.GlAccountingMapTypeId = amt.GlAccountingMapTypeId
	JOIN GlAccountingMapModifierType ammt ON am.GlAccountingMapModifierTypeId = ammt.GlAccountingMapModifierTypeId

ORDER BY amt.GlAccountingMapTypeId, at.GlAbstractTypeId, act.GlAccountTypeId, a.GlAccountId, ammt.GlAccountingMapModifierTypeId



;WITH
cteTally10 AS (
		SELECT 0 AS tally UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
		SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally100 AS (
		SELECT ROW_NUMBER() OVER(ORDER BY c1.tally) - 1 AS tally
		FROM cteTally10 c1
		CROSS JOIN cteTally10 c2
),
cteEntities AS (
		SELECT DISTINCT PartyId
		FROM Entity
)

SELECT 
	RT.*, 
	CONVERT(VARCHAR,DATEPART(MONTH,RT.PeriodDate))+'/'+CONVERT(VARCHAR,DATEPART(YEAR,RT.PeriodDate)) as Period,
	DT.*,
	RF.ReferralAmount,
	dtGL.InterestIncome, dtGL.FeeIncome, dtGLWO.NETWriteOffs ,
--	dtAL.NumberAccounts, dtAl.acctnum
	(
	 (
		(SELECT
			SUM(ent.Amount) as SumBeforeStartPeriod
		 FROM Entity en
		 left join CustomerAccount CuAc on en.PartyId = (CASE WHEN @reportType = 'Agent' THEN CuAc.AgentId
															  WHEN @reportType = 'Submitted By' THEN CuAc.SubmittedById
														 END)
		 left join GlTransaction t on t.PartyId = CuAc.PartyId
		 left join GlEntry ent on ent.GlTransactionId = t.GlTransactionId
		 left join GlAccount a on a.GlAccountId = ent.GlAccountId
		 WHERE 
			en.PartyId = RT.PartyId
			AND t.WhenBooked < RT.PeriodDate 
			AND a.AccountName = 'Accounts Receivable - Premiums'
		) +
		
		(SELECT
			SUM(ent.Amount) as SumBeforeEndPeriod
		 FROM Entity en
		 left join CustomerAccount CuAc on en.PartyId = (CASE WHEN @reportType = 'Agent' THEN CuAc.AgentId
															  WHEN @reportType = 'Submitted By' THEN CuAc.SubmittedById
														 END)
		 left join GlTransaction t on t.PartyId = CuAc.PartyId
		 left join GlEntry ent on ent.GlTransactionId = t.GlTransactionId
		 left join GlAccount a on a.GlAccountId = ent.GlAccountId
		 WHERE 
			en.PartyId = RT.PartyId
			AND t.WhenBooked < DATEADD(MONTH, DATEDIFF(MONTH, 0, RT.PeriodDate) + 1, 0)
			AND a.AccountName = 'Accounts Receivable - Premiums'
		)
	 ) / 2 
	) * -1 as AvgDailyBalance,
	
	@costOfFunds / 100 as CostofFunds,
	
	
	
	
	
	
FROM
(
	SELECT 
		c.PartyId, DATEADD(MONTH, DATEDIFF(MONTH, 0, @startDate) + t.tally, 0) as PeriodDate
	FROM 
		cteTally100 t, cteEntities c
	WHERE 
		t.tally BETWEEN 0 AND DATEDIFF(MONTH, @startDate, @endDate)
) RT

--LEFT JOIN CustomerAccount CustAccount ON RT.PartyId = CustAccount.PartyId

LEFT JOIN 
( ..
...
) DT ON DT.PartyId = RT.PartyId AND 
  DT.createMonth >=  RT.PeriodDate AND
  DT.createMonth < DATEADD(MONTH, DATEDIFF(MONTH, 0, RT.PeriodDate) + 1, 0)
...
...
DROP TABLE #tmpMap 

Open in new window

0
pzozulka
Asked:
pzozulka
  • 5
  • 5
1 Solution
 
nemws1Commented:
In SQL Server you have separate out queries that assign values to variables from queries that request values from variables.  If you try to do so, you'll get:
A SELECT statement that assigns a value to a variable must not be combined with data-retrieval operations.

Open in new window

However, MSSQL is pretty smart.  I would think it does *NOT* do the calculation twice, even though you have that in your query.  It'll only calculate it once and recognize that it's used later on in the same row and use the cached value.  I know this is true when you have something like:
SELECT CONVERT(somefield INTO VARCHAR(40))
FROM sometable
ORDER BY CONVERT(somefield INTO VARCHAR(40))

Open in new window

It only does the convert *once*, even though you're using it in the ORDER BY clause.
0
 
PortletPaulCommented:
One method for reuse of a calculation is to use APPLY (either CROSS APPLY or OUTER APPLY depending on need) e.g.
select


      -- we can reference by alias calculations in the apply below

      substring(astring, pos1, len1)


from (
              select 'http://twitter.com/' as astring
      ) as data

-- this cross apply gives us 2 values in: pos1 & len1

cross apply (
              select charindex(':',astring), length(astring)
             ) as ca1 (pos1, len1)

Open in new window

BUT:
You are using 2 correlated queries in the select clause for your calculations, this is likely to be quite (very?) inefficient particularly as there are many joins involved in each one.

You are also using left joins in both of those correlated queries, but these are useless because of your where clause. For example if a.AccountName = 'Accounts Receivable - Premiums' is necessary then a.AccountName cannot be NULL, and if a.AccountName cannot be NULL you have the same condition as an INNER JOIN. This is true of t.WhenBooked also, it cannot be NULL either. Don't use less efficient LEFT JOINS if you don't have to.
SELECT
			SUM(ent.Amount) as SumBeforeEndPeriod
		 FROM Entity en
		 left join CustomerAccount CuAc on en.PartyId = (CASE WHEN @reportType = 'Agent' THEN CuAc.AgentId
															  WHEN @reportType = 'Submitted By' THEN CuAc.SubmittedById
														 END)
		 left join GlTransaction t on t.PartyId = CuAc.PartyId                 --<< change join
		 left join GlEntry ent on ent.GlTransactionId = t.GlTransactionId      --<< change join
		 left join GlAccount a on a.GlAccountId = ent.GlAccountId              --<< change join
		 WHERE 
			en.PartyId = RT.PartyId

			AND t.WhenBooked < DATEADD(MONTH, DATEDIFF(MONTH, 0, RT.PeriodDate) + 1, 0)
-- cannot be NULL, so inner join t

			AND a.AccountName = 'Accounts Receivable - Premiums'
-- cannot be NULL, so inner join a

Open in new window

But, I would be avoiding these correlated subqueries
0
 
pzozulkaAuthor Commented:
nemws1: I see what you're saying, but don't know if this applies to our case. Either way, I need to write an optimized query.

PortletPaul:
Thanks. I changed them to inner joins. I really don't see a way of avoiding the correlated subqueries. Perhaps I can combine them into a single subquery instead of adding two of them, but don't see a way of avoiding using them altogether.

Not seeing how the cross apply would work in my situation. Are you suggesting to basically put the subquery into the cross apply, and then select things inside it from the outside? If so, will RT.PartyId be reachable from within the Cross Apply? I ask because if you look at the query posted in my post above, line 104 includes a LEFT JOIN. RT.PartyId is not visible from inside that LEFT JOIN. I'm not sure why, but for the same reason have a feeling that it won't be visible from inside the CROSS APPLY.
0
Microsoft Certification Exam 74-409

VeeamĀ® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
PortletPaulCommented:
This question is about how to reuse a calculation, there are several ways of course, Here an APPLY would seem the most likely (simple) candidate. You may need an OUTER APPLY which will allow NULLs (like an outer join does).

But, I cannot be certain it will work for you, as I cannot test it.

However, I'm afraid I just cannot agree that there is no alternative to the correlated subqueries. Here is a suggestion:
LEFT JOIN (
            SELECT
                  en.PartyId
                , SUM(CASE
                        WHEN t.WhenBooked < RT.PeriodDate THEN ent.Amount END) AS SumBeforeStartPeriod
                , SUM(ent.Amount)                                              AS SumBeforeEndPeriod
            FROM Entity en
                  LEFT JOIN CustomerAccount CuAc
                              ON en.PartyId = (CASE
                                          WHEN @reportType = 'Agent' THEN CuAc.AgentId
                                          WHEN @reportType = 'Submitted By' THEN CuAc.SubmittedById END)
                  INNER JOIN GlTransaction t
                              ON t.PartyId = CuAc.PartyId
                  INNER JOIN GlEntry ent
                              ON ent.GlTransactionId = t.GlTransactionId
                  INNER JOIN GlAccount a
                              ON a.GlAccountId = ent.GlAccountId
            
            WHERE a.AccountName = 'Accounts Receivable - Premiums'
                  AND t.WhenBooked < DATEADD(MONTH, DATEDIFF(MONTH, 0, RT.PeriodDate) + 1, 0)
            
            GROUP BY
                  en.PartyId

         ) as data ON RT.PartyId = data.PartyId

Open in new window

This approach is going to be much more efficient and will probably also solve the reuse of calculation issue as well.

{+edit} I see I have overlooked the RT.PeriodDate issue in the above - sorry. However I woud still be looking for ways to avoid those correlated subqueries.
0
 
pzozulkaAuthor Commented:
Can you tell me or provide links to explain why correlated subqueries are not good, and why a LEFT JOIN would be better?

I just ran my query with the correlated subquery included -- took 6:22 seconds.
Commented out the correlated subquery -- took 2 seconds.

I'm eager to learn why correlated subqueries are so bad.
0
 
pzozulkaAuthor Commented:
Should I open a new question about replacing the correlated subqueries. This query crashed my server. :)

I need another way.
0
 
pzozulkaAuthor Commented:
I have already tried using your code suggestion in post ID: 40275122 and it will not work because it does not see RT.___ fields within the LEFT JOIN.
0
 
PortletPaulCommented:
yes I did mention that in an edit, but moving from correlated subqueries can be achieved I'm sure
0
 
PortletPaulCommented:
>>"should I open a new question about replacing the correlated subqueries. "
yes, I think that would be a good idea
0
 
PortletPaulCommented:
>>"I'm eager to learn why correlated subqueries are so bad."

I don't know of a good reference, or if I can explain it well. I am particularly concerned by correlated subqueries IN THE SELECT CLAUSE

The Select clause is one of the last to be evaluated, that order is:
1: FROM
2: the JOINS
3: WHERE clause
4: GROUP BY clause
5: HAVING clause
6: SELECT clause
7: ORDER BY clause

So correlations in step 6 forces evaluation of factors that should have been considered in steps 1, 2 & 3
- it goes backwards!

Not only that, it goes backwards, PER RECORD! Those queries are evaluated literally, row by row in the resultset, it's sort of similar to looping through a cursor - and it just isn't efficient.

So, try to move the gathering of information into steps 1&2

-------
There is one exception to make for correlated subqueries:
If using EXISTS or NOT EXISTS in the where clause, these require correlated subqueries, but as a general rule these correlated subqueries work  well.

i.e. While most correlated subqueries are things to avoid its different for EXISTS/NOT EXISTS
0
 
pzozulkaAuthor Commented:
Very well explained. Thank you.
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now