Solved

SQL query optimization

Posted on 2014-04-24
17
327 Views
Last Modified: 2014-04-24
We have an existing query, and I need to add a new field to it called EffectiveDate.

SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
 ( SELECT TOP 1 ip.PolicyNumber
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId 
        ) AS PolicyNumber
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId ) 

Open in new window

The code to add my new field is:
( SELECT TOP 1 ip.EffectiveDate
                    FROM [InsurancePolicy] ip
                    JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                    where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                    order by ip.InsurancePolicyId 
        ) AS EffectiveDate

Open in new window

However, it would not be efficient to run these two subqueries separately, especially since PolicyNumber and EffectiveDate are pulled from the same exact tables. Also, using TOP 1 is not efficient either.

How would you re-write this query for maximum efficiency/performance?

I'm thinking of JOINing to a subquery that selects both fields, and instead of using TOP 1, use MIN to find the smallest InsurancePolicyId, and use that in WHERE clause of subquery.
0
Comment
Question by:pzozulka
  • 8
  • 5
  • 2
  • +1
17 Comments
 
LVL 6

Accepted Solution

by:
Dulton earned 167 total points
ID: 40020386
use row_number to order them in a cte, then join to the cte on the same customer account id field and where the order = 1
;With PolicyOrder AS
(
SELECT ip.PolicyNumber
, ip.EffectiveDate
, pfl.CustomerAccountId
, ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
FROM InsurancePolicy AS ip
INNER JOIN PremiumFinanceLoad AS pfl
ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId

)
SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount
,po.PolicyNumber
,po.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )
LEFT JOIN PolicyOrder AS po 
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 167 total points
ID: 40020451
no points pl.
I too would suggest row_number()

is there any particular reason for choosing the first policy based on the lowest InsurancePolicyId?

with row_number() you have the ability to order by - for example - that date field  
(see line 6 above, there is an order by in that line)

also note it isn't necessary to use a CTE, a derived table could be used instead
LEFT JOIN (
        SELECT ip.PolicyNumber
        , ip.EffectiveDate
        , pfl.CustomerAccountId
        , ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
        FROM InsurancePolicy AS ip
        INNER JOIN PremiumFinanceLoad AS pfl
        ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId
) AS po 
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 8

Author Comment

by:pzozulka
ID: 40020477
is there any particular reason for choosing the first policy based on the lowest InsurancePolicyId?

Because InsurancePolicyId is a primary key, it would mean that the Policy Effective Date would also be the smallest if we used MIN(InsurancePolicyId).

The specs for this report say the new field should display the Policy Effective Date of the first Policy on the account. Since there can be multiple policies, which start on different dates, we would need to get the first one.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40020516
>>"The specs for this report say the new field should display the Policy Effective Date of the first Policy on the account. "

Dulton's query should do exactly that
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40020987
I tested both suggestions against my original approach to simply add a field by using the below subquery in the SELECT clause:
( SELECT TOP 1 ip.EffectiveDate
                    FROM [InsurancePolicy] ip
                    JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                    where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                    order by ip.InsurancePolicyId 
        ) AS EffectiveDate

Open in new window

I tested by running both queries at the same time and enabling the output of "Actual Execution Plan". It shows that my query cost took 35% of the batch, while your suggestion cost 65%.

I'm a bit surprised at these results as it wouldn't make sense to run a subquery twice.
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 166 total points
ID: 40021004
1) I suggest using OUTER APPLY, as below.
2) You can't rely on the % estimates.  I've seen wildly wrong estimates there.
3) TOP (1) is not inherently inefficient.


SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
    ip_data.PolicyNumber, ip_data.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1)
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )  
OUTER APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId
        ) AS ip_data
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021037
ScottPletcher: Using your query the result set went from 132 rows to 637 rows.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 40021051
Try CROSS APPLY instead of OUTER APPLY.  (I guess.  I don't see why another OUTER join would add rows; are the WHERE conditions the same on both queries?)


SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
...
CROSS APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
    ...
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 8

Author Comment

by:pzozulka
ID: 40021092
Same effect.
0
 
LVL 6

Expert Comment

by:Dulton
ID: 40021107
-Hold the press.

you have a bunch of left/outer joins.... but none of the outer tables are being returned in the result set.....

that makes all of those joins un-necessary.

see if this performs any better, and gives the right results.

;With  PolicyOrder AS
(
SELECT ip.PolicyNumber
, ip.EffectiveDate
, pfl.CustomerAccountId
, ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
FROM InsurancePolicy AS ip
INNER JOIN PremiumFinanceLoad AS pfl
ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId
)
SELECT CA.AccountNumber
, CA.Name
, Agent.Name
, CA.InstallmentsMadeCount
,po.PolicyNumber
,po.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT JOIN PolicyOrder AS po
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021124
The outer tables are being returned in the result set, but I did not paste the entire query, just the essentials needed to answer my question.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021139
Can someone also explain why ca.[PartyId] in the query below returns the error below, while the same thing in the CROSS APPLY subquery seems to be just fine?
The multi-part identifier "ca.PartyId" could not be bound.

LEFT JOIN (
      SELECT MIN(ip.InsurancePolicyId) as InsurancePolicyId, [PremiumFinanceLoan].CustomerAccountId
                  FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
            WHERE [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                  GROUP BY [PremiumFinanceLoan].CustomerAccountId
              ) AS InsPolTb2 ON whatever = whatever

CROSS APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId
        ) AS InsPolTb
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 40021210
>> ScottPletcher: Using your query the result set went from 132 rows to 637 rows. <<

Still seems very odd.  Just adding an OUTER APPLY should not increase the number of rows for this query.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021233
Here's the entire FROM clause below -- not leaving anything out:
   ...     , ip_data.EffectiveDate
        , ip_data.PolicyNumber
        
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )
CROSS APPLY 
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId 
        ) AS ip_data

Open in new window

0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 40021284
So running the query w/o the CROSS APPLY only returns 132 rows, but with the CA it returns 637 rows?  

So can the same PartyID appear multiple times in the outer query?
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021348
I'm sorry, I accidentally switched to a different database. CROSS APPLY works fine, and according to execution plan costs 49% while original query costs 51%. You said not to trust this.

In that case, what do you recommend I use to check performance metrics? The database I'm using in dev environment does not allow me to check how long the execution time is as both execute in 0 seconds.

The production database is HUGE.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 40021406
Look at the Estimated Query Plan.  SQL can generate that in a second or so, no matter how large the tables are.

Presumably the CROSS APPLY plan should have a single scan (at most) of the underlying table, with a lazy spool.

The one with two different TOP (1) subqueries should show multiple lookups/scans of the underlying table: SQL could implement that logic different ways, would have to see it to be sure.
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Problem with SqlConnection 5 115
Need Counts 11 42
separate column 24 20
Haw to apply join on 2 tables with this scenario 4 5
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now