[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

SQL query optimization

Posted on 2014-04-24
17
Medium Priority
?
347 Views
Last Modified: 2014-04-24
We have an existing query, and I need to add a new field to it called EffectiveDate.

SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
 ( SELECT TOP 1 ip.PolicyNumber
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId 
        ) AS PolicyNumber
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId ) 

Open in new window

The code to add my new field is:
( SELECT TOP 1 ip.EffectiveDate
                    FROM [InsurancePolicy] ip
                    JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                    where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                    order by ip.InsurancePolicyId 
        ) AS EffectiveDate

Open in new window

However, it would not be efficient to run these two subqueries separately, especially since PolicyNumber and EffectiveDate are pulled from the same exact tables. Also, using TOP 1 is not efficient either.

How would you re-write this query for maximum efficiency/performance?

I'm thinking of JOINing to a subquery that selects both fields, and instead of using TOP 1, use MIN to find the smallest InsurancePolicyId, and use that in WHERE clause of subquery.
0
Comment
Question by:pzozulka
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
  • 2
  • +1
17 Comments
 
LVL 6

Accepted Solution

by:
Dulton earned 668 total points
ID: 40020386
use row_number to order them in a cte, then join to the cte on the same customer account id field and where the order = 1
;With PolicyOrder AS
(
SELECT ip.PolicyNumber
, ip.EffectiveDate
, pfl.CustomerAccountId
, ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
FROM InsurancePolicy AS ip
INNER JOIN PremiumFinanceLoad AS pfl
ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId

)
SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount
,po.PolicyNumber
,po.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )
LEFT JOIN PolicyOrder AS po 
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 49

Assisted Solution

by:PortletPaul
PortletPaul earned 668 total points
ID: 40020451
no points pl.
I too would suggest row_number()

is there any particular reason for choosing the first policy based on the lowest InsurancePolicyId?

with row_number() you have the ability to order by - for example - that date field  
(see line 6 above, there is an order by in that line)

also note it isn't necessary to use a CTE, a derived table could be used instead
LEFT JOIN (
        SELECT ip.PolicyNumber
        , ip.EffectiveDate
        , pfl.CustomerAccountId
        , ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
        FROM InsurancePolicy AS ip
        INNER JOIN PremiumFinanceLoad AS pfl
        ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId
) AS po 
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 8

Author Comment

by:pzozulka
ID: 40020477
is there any particular reason for choosing the first policy based on the lowest InsurancePolicyId?

Because InsurancePolicyId is a primary key, it would mean that the Policy Effective Date would also be the smallest if we used MIN(InsurancePolicyId).

The specs for this report say the new field should display the Policy Effective Date of the first Policy on the account. Since there can be multiple policies, which start on different dates, we would need to get the first one.
0
Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

 
LVL 49

Expert Comment

by:PortletPaul
ID: 40020516
>>"The specs for this report say the new field should display the Policy Effective Date of the first Policy on the account. "

Dulton's query should do exactly that
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40020987
I tested both suggestions against my original approach to simply add a field by using the below subquery in the SELECT clause:
( SELECT TOP 1 ip.EffectiveDate
                    FROM [InsurancePolicy] ip
                    JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                    where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                    order by ip.InsurancePolicyId 
        ) AS EffectiveDate

Open in new window

I tested by running both queries at the same time and enabling the output of "Actual Execution Plan". It shows that my query cost took 35% of the batch, while your suggestion cost 65%.

I'm a bit surprised at these results as it wouldn't make sense to run a subquery twice.
0
 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 664 total points
ID: 40021004
1) I suggest using OUTER APPLY, as below.
2) You can't rely on the % estimates.  I've seen wildly wrong estimates there.
3) TOP (1) is not inherently inefficient.


SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
    ip_data.PolicyNumber, ip_data.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1)
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )  
OUTER APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId
        ) AS ip_data
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021037
ScottPletcher: Using your query the result set went from 132 rows to 637 rows.
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 40021051
Try CROSS APPLY instead of OUTER APPLY.  (I guess.  I don't see why another OUTER join would add rows; are the WHERE conditions the same on both queries?)


SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
...
CROSS APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
    ...
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021092
Same effect.
0
 
LVL 6

Expert Comment

by:Dulton
ID: 40021107
-Hold the press.

you have a bunch of left/outer joins.... but none of the outer tables are being returned in the result set.....

that makes all of those joins un-necessary.

see if this performs any better, and gives the right results.

;With  PolicyOrder AS
(
SELECT ip.PolicyNumber
, ip.EffectiveDate
, pfl.CustomerAccountId
, ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
FROM InsurancePolicy AS ip
INNER JOIN PremiumFinanceLoad AS pfl
ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId
)
SELECT CA.AccountNumber
, CA.Name
, Agent.Name
, CA.InstallmentsMadeCount
,po.PolicyNumber
,po.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT JOIN PolicyOrder AS po
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021124
The outer tables are being returned in the result set, but I did not paste the entire query, just the essentials needed to answer my question.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021139
Can someone also explain why ca.[PartyId] in the query below returns the error below, while the same thing in the CROSS APPLY subquery seems to be just fine?
The multi-part identifier "ca.PartyId" could not be bound.

LEFT JOIN (
      SELECT MIN(ip.InsurancePolicyId) as InsurancePolicyId, [PremiumFinanceLoan].CustomerAccountId
                  FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
            WHERE [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                  GROUP BY [PremiumFinanceLoan].CustomerAccountId
              ) AS InsPolTb2 ON whatever = whatever

CROSS APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId
        ) AS InsPolTb
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 40021210
>> ScottPletcher: Using your query the result set went from 132 rows to 637 rows. <<

Still seems very odd.  Just adding an OUTER APPLY should not increase the number of rows for this query.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021233
Here's the entire FROM clause below -- not leaving anything out:
   ...     , ip_data.EffectiveDate
        , ip_data.PolicyNumber
        
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )
CROSS APPLY 
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId 
        ) AS ip_data

Open in new window

0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 40021284
So running the query w/o the CROSS APPLY only returns 132 rows, but with the CA it returns 637 rows?  

So can the same PartyID appear multiple times in the outer query?
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021348
I'm sorry, I accidentally switched to a different database. CROSS APPLY works fine, and according to execution plan costs 49% while original query costs 51%. You said not to trust this.

In that case, what do you recommend I use to check performance metrics? The database I'm using in dev environment does not allow me to check how long the execution time is as both execute in 0 seconds.

The production database is HUGE.
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 40021406
Look at the Estimated Query Plan.  SQL can generate that in a second or so, no matter how large the tables are.

Presumably the CROSS APPLY plan should have a single scan (at most) of the underlying table, with a lazy spool.

The one with two different TOP (1) subqueries should show multiple lookups/scans of the underlying table: SQL could implement that logic different ways, would have to see it to be sure.
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Windocks is an independent port of Docker's open source to Windows.   This article introduces the use of SQL Server in containers, with integrated support of SQL Server database cloning.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question