Solved

SQL query optimization

Posted on 2014-04-24
17
342 Views
Last Modified: 2014-04-24
We have an existing query, and I need to add a new field to it called EffectiveDate.

SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
 ( SELECT TOP 1 ip.PolicyNumber
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId 
        ) AS PolicyNumber
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId ) 

Open in new window

The code to add my new field is:
( SELECT TOP 1 ip.EffectiveDate
                    FROM [InsurancePolicy] ip
                    JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                    where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                    order by ip.InsurancePolicyId 
        ) AS EffectiveDate

Open in new window

However, it would not be efficient to run these two subqueries separately, especially since PolicyNumber and EffectiveDate are pulled from the same exact tables. Also, using TOP 1 is not efficient either.

How would you re-write this query for maximum efficiency/performance?

I'm thinking of JOINing to a subquery that selects both fields, and instead of using TOP 1, use MIN to find the smallest InsurancePolicyId, and use that in WHERE clause of subquery.
0
Comment
Question by:pzozulka
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
  • 2
  • +1
17 Comments
 
LVL 6

Accepted Solution

by:
Dulton earned 167 total points
ID: 40020386
use row_number to order them in a cte, then join to the cte on the same customer account id field and where the order = 1
;With PolicyOrder AS
(
SELECT ip.PolicyNumber
, ip.EffectiveDate
, pfl.CustomerAccountId
, ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
FROM InsurancePolicy AS ip
INNER JOIN PremiumFinanceLoad AS pfl
ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId

)
SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount
,po.PolicyNumber
,po.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )
LEFT JOIN PolicyOrder AS po 
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 49

Assisted Solution

by:PortletPaul
PortletPaul earned 167 total points
ID: 40020451
no points pl.
I too would suggest row_number()

is there any particular reason for choosing the first policy based on the lowest InsurancePolicyId?

with row_number() you have the ability to order by - for example - that date field  
(see line 6 above, there is an order by in that line)

also note it isn't necessary to use a CTE, a derived table could be used instead
LEFT JOIN (
        SELECT ip.PolicyNumber
        , ip.EffectiveDate
        , pfl.CustomerAccountId
        , ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
        FROM InsurancePolicy AS ip
        INNER JOIN PremiumFinanceLoad AS pfl
        ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId
) AS po 
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 8

Author Comment

by:pzozulka
ID: 40020477
is there any particular reason for choosing the first policy based on the lowest InsurancePolicyId?

Because InsurancePolicyId is a primary key, it would mean that the Policy Effective Date would also be the smallest if we used MIN(InsurancePolicyId).

The specs for this report say the new field should display the Policy Effective Date of the first Policy on the account. Since there can be multiple policies, which start on different dates, we would need to get the first one.
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 
LVL 49

Expert Comment

by:PortletPaul
ID: 40020516
>>"The specs for this report say the new field should display the Policy Effective Date of the first Policy on the account. "

Dulton's query should do exactly that
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40020987
I tested both suggestions against my original approach to simply add a field by using the below subquery in the SELECT clause:
( SELECT TOP 1 ip.EffectiveDate
                    FROM [InsurancePolicy] ip
                    JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                    where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                    order by ip.InsurancePolicyId 
        ) AS EffectiveDate

Open in new window

I tested by running both queries at the same time and enabling the output of "Actual Execution Plan". It shows that my query cost took 35% of the batch, while your suggestion cost 65%.

I'm a bit surprised at these results as it wouldn't make sense to run a subquery twice.
0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 166 total points
ID: 40021004
1) I suggest using OUTER APPLY, as below.
2) You can't rely on the % estimates.  I've seen wildly wrong estimates there.
3) TOP (1) is not inherently inefficient.


SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
    ip_data.PolicyNumber, ip_data.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1)
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )  
OUTER APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId
        ) AS ip_data
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021037
ScottPletcher: Using your query the result set went from 132 rows to 637 rows.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40021051
Try CROSS APPLY instead of OUTER APPLY.  (I guess.  I don't see why another OUTER join would add rows; are the WHERE conditions the same on both queries?)


SELECT CA.AccountNumber, CA.Name, Agent.Name, CA.InstallmentsMadeCount,
...
CROSS APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
    ...
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021092
Same effect.
0
 
LVL 6

Expert Comment

by:Dulton
ID: 40021107
-Hold the press.

you have a bunch of left/outer joins.... but none of the outer tables are being returned in the result set.....

that makes all of those joins un-necessary.

see if this performs any better, and gives the right results.

;With  PolicyOrder AS
(
SELECT ip.PolicyNumber
, ip.EffectiveDate
, pfl.CustomerAccountId
, ROW_NUMBER() OVER(PARTITION BY pfl.CustomerAccountId ORDER BY ip.InsurancePolicyId) AS [PolicyRowOrder]
FROM InsurancePolicy AS ip
INNER JOIN PremiumFinanceLoad AS pfl
ON ip.PremiumFinanceLoanId = pfl.PremiumFinanceLoanId
)
SELECT CA.AccountNumber
, CA.Name
, Agent.Name
, CA.InstallmentsMadeCount
,po.PolicyNumber
,po.EffectiveDate
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT JOIN PolicyOrder AS po
ON  ca.[PartyId] = po.[CustomerAccountId]
AND po.PolicytRowOrder = 1

Open in new window

0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021124
The outer tables are being returned in the result set, but I did not paste the entire query, just the essentials needed to answer my question.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021139
Can someone also explain why ca.[PartyId] in the query below returns the error below, while the same thing in the CROSS APPLY subquery seems to be just fine?
The multi-part identifier "ca.PartyId" could not be bound.

LEFT JOIN (
      SELECT MIN(ip.InsurancePolicyId) as InsurancePolicyId, [PremiumFinanceLoan].CustomerAccountId
                  FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
            WHERE [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                  GROUP BY [PremiumFinanceLoan].CustomerAccountId
              ) AS InsPolTb2 ON whatever = whatever

CROSS APPLY
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId
        ) AS InsPolTb
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40021210
>> ScottPletcher: Using your query the result set went from 132 rows to 637 rows. <<

Still seems very odd.  Just adding an OUTER APPLY should not increase the number of rows for this query.
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021233
Here's the entire FROM clause below -- not leaving anything out:
   ...     , ip_data.EffectiveDate
        , ip_data.PolicyNumber
        
FROM CustomerAccount as CA
INNER JOIN Entity AS Agent ON Agent.PartyId = CA.AgentId
LEFT OUTER JOIN [Entity] s ON CA.[SubmittedById] = s.[PartyId]
LEFT JOIN [Address] ON ([Address].PartyId = CA.PartyId AND [Address].IsPrimary = 1) 
LEFT JOIN Region ON ([Address].RegionId = Region.RegionId )
CROSS APPLY 
 ( SELECT TOP (1) ip.PolicyNumber, ip.EffectiveDate
                FROM [InsurancePolicy] ip
                JOIN [PremiumFinanceLoan] ON ip.[PremiumFinanceLoanId] = [PremiumFinanceLoan].[PremiumFinanceLoanId]
                where [PremiumFinanceLoan].[CustomerAccountId] =ca.[PartyId]
                order by ip.InsurancePolicyId 
        ) AS ip_data

Open in new window

0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40021284
So running the query w/o the CROSS APPLY only returns 132 rows, but with the CA it returns 637 rows?  

So can the same PartyID appear multiple times in the outer query?
0
 
LVL 8

Author Comment

by:pzozulka
ID: 40021348
I'm sorry, I accidentally switched to a different database. CROSS APPLY works fine, and according to execution plan costs 49% while original query costs 51%. You said not to trust this.

In that case, what do you recommend I use to check performance metrics? The database I'm using in dev environment does not allow me to check how long the execution time is as both execute in 0 seconds.

The production database is HUGE.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 40021406
Look at the Estimated Query Plan.  SQL can generate that in a second or so, no matter how large the tables are.

Presumably the CROSS APPLY plan should have a single scan (at most) of the underlying table, with a lazy spool.

The one with two different TOP (1) subqueries should show multiple lookups/scans of the underlying table: SQL could implement that logic different ways, would have to see it to be sure.
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question