sql server 2012: Query needs improvement

Gurus,

This query is spanning over 750 million rows (Integration.LoyaltyAccountActivity) , The 1st subquery has 71 rows, the second has 2.8 . million rows. The hard coded date integers are variables in the stored proc. It takes over an hour to return.  

SELECT 20150615				AS SnapshotDateKey
     , AccountInternalKey
	 , MemberDK
	 , MatrixMemberId
	 , BuyingUnitDK
	 , SUM(EarnValue)					AS EarnValue
	 , SUM(RedeemValue)					AS RedeemValue
	 , SUM(EarnValue)-SUM(RedeemValue)	AS Balance
  FROM Integration.LoyaltyAccountActivity
 WHERE DateKey <= 20150615
    and memberdk NOT IN
      (select memberdk
	     from reprocess.ProblemRowsMultipleMemDKBuyDK
	   )
   and BuyingUnitDK IN
      ( select  max(buyingunitdk) as BuyingUnitDK 
           from reprocess.MultipleMemDKBuyDK
            group by memberdk 	 
	  )
   GROUP BY MemberDK
        , BuyingUnitDK
		, MatrixMemberId
		, AccountInternalKey

Open in new window

Paula DiTalloIntegration developerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Paula DiTalloIntegration developerAuthor Commented:
arnoldCommented:
How often do you run this or variation of this query?
Decoupling your query to improve is rather difficult.
If there are multiple sequential queries and the server is not overloaded creating a view with the data against which you can then run your queries should improve the overall.


Can you post a few rows from each table and what is

So everytime you are running the query on data that has not changed and will not change ever.
Or is the number of rows is for the recent period?
PadawanDBAOperational DBACommented:
Completely shooting from the hip on this...  It would be helpful to have the relationships between the tables that are being queried, table schemas and also an overview of what it is you're looking to accomplish (I am finding some of the logic a bit tough to understand without the relationships as I am not sure why the second subquery isn't correlated, for instance).  Anyhoot, I would try something like this:

;with tempCTE as
(
	select
		max(buyingunitdk) as BuyingUnitDK 
	from 
		reprocess.MultipleMemDKBuyDK
	group by 
		memberdk
) 

SELECT 
	20150615 AS SnapshotDateKey
	LAA.AccountInternalKey,
	LAA.MemberDK,
	LAA.MatrixMemberId,
	LAA.BuyingUnitDK,
	SUM( LAA.EarnValue ) AS EarnValue,
	SUM( LAA.RedeemValue ) AS RedeemValue,
	SUM( LAA.EarnValue ) - SUM( LAA.RedeemValue ) AS Balance,
FROM 
	Integration.LoyaltyAccountActivity as LAA
		inner join tempCTE as RMM on LAA.buyingUnitDK = RMM.buyingUnitDK
		left join reprocess.ProblemRowsMultipleMemDKBuyDK as RPR on LAA.MemberDK = RPR.MemberDK
WHERE
	LAA.DateKey <= 20150615 and
	RPR.MemberDK is null
GROUP BY 
	LAA.MemberDK,
	LAA.BuyingUnitDK,
	LAA.MatrixMemberId,
	LAA.AccountInternalKey;

Open in new window

Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

Mike EghtebasDatabase and Application DeveloperCommented:
what version of SQL Server you are working with?
Scott PletcherSenior DBACommented:
For accuracy, and speed, for the current query you must add "where memberdk is not null" to the NOT IN subquery:
...
 FROM Integration.LoyaltyAccountActivity
 WHERE DateKey <= 20150615
    and memberdk NOT IN
      (select memberdk
           from reprocess.ProblemRowsMultipleMemDKBuyDK
           where memberdk is not null
         )
...

But for this specific situation, I'd seriously consider denormalizing the data in this case by adding a bit flag, "is_ProblemRowsMultipleMemDKBuyDK", to the LoyaltyAccountActivity table.  You can use a trigger on the Problem* table to maintain the flag(s) on the LoyaltyAccountActivity table, although that processing would require an index on memberdk in LoyaltyAccountActivity.  Then you'd just check that flag in the table instead of having to do 750M (or 750M*2 lookups without the IS NULL check) on the Problem* table.
Vitor MontalvãoMSSQL Senior EngineerCommented:
Try this query:
SELECT 20150615 AS SnapshotDateKey
     , AccountInternalKey
	 , MemberDK
	 , MatrixMemberId
	 , BuyingUnitDK
	 , SUM(EarnValue)					AS EarnValue
	 , SUM(RedeemValue)					AS RedeemValue
	 , SUM(EarnValue)-SUM(RedeemValue)	AS Balance
FROM Integration.LoyaltyAccountActivity laa
	INNER JOIN (select max(buyingunitdk) as BuyingUnitDK 
				from reprocess.MultipleMemDKBuyDK) budk ON laa.BuyingUnitDK = budk.BuyingUnitDK
WHERE DateKey <= 20150615
	AND NOT EXISTS (select 1  
					from reprocess.ProblemRowsMultipleMemDKBuyDK prob
					where prob.memberdk = laa.memberdk)
GROUP BY MemberDK
        , BuyingUnitDK
		, MatrixMemberId
		, AccountInternalKey

Open in new window

But you'll need to create indexes on LoyaltyAccountActivity.BuyingUnitDK, LoyaltyAccountActivity.memberdk and ProblemRowsMultipleMemDKBuyDK.memberdk  since there are table scans running on it.
Olaf DoschkeSoftware DeveloperCommented:
Why this part?

   ...Integration.LoyaltyAccountActivity.BuyingUnitDK IN
      ( select  max(buyingunitdk) as BuyingUnitDK
           from reprocess.MultipleMemDKBuyDK
            group by memberdk       
        )

I prefixed the table Name to make clear you search the 750 Million rows of Integration.LoyaltyAccountActivity against the 2.8 Million in this subquery. What's the value range of the BuyingUnitDK field? Is it a currency? It seems this would almost all times result in "yes" and is quite costly. Are you sure you want to look up Integration.LoyaltyAccountActivity.BuyingUnitDK in all the max values of reprocess.MultipleMemDKBuyDK.buyingunitdk per memberdk or just the one max value of the reprocess.MultipleMemDKBuyDK.memberdk equal to Integration.LoyaltyAccountActivity.memberdk?

If so, this has to be an inner join instead of an IN subquery, like this:

SELECT 20150615				AS SnapshotDateKey
     , AccountInternalKey
     , MemberDK
     , MatrixMemberId
     , BuyingUnitDK
     , SUM(EarnValue)			AS EarnValue
     , SUM(RedeemValue)			AS RedeemValue
     , SUM(EarnValue)-SUM(RedeemValue)	AS Balance
  FROM Integration.LoyaltyAccountActivity
 WHERE DateKey <= 20150615
    and memberdk NOT IN
      ( Select memberdk
	From Reprocess.ProblemRowsMultipleMemDKBuyDK
        Where NOT MemberDK is NULL
      )
   INNER JOIN
      ( Select memberdk, max(buyingunitdk) as MaxBuyingUnitDK 
        From reprocess.MultipleMemDKBuyDK
        Group by MemberDK 	 
      ) MaxBuyingUnitDKperMemberDK
   ON MaxBuyingUnitDKperMemberDK.MemberDK = Integration.LoyaltyAccountActivity.MemberDK
   AND MaxBuyingUnitDKperMemberDK.MaxBuyingUnitDK = Integration.LoyaltyAccountActivity.BuyingUnitDK
   GROUP BY MemberDK
          , BuyingUnitDK
          , MatrixMemberId
	  , AccountInternalKey

Open in new window


This will surely run faster, as it limits the joins and the result set, but as the others say: Without knowing data and its meaning I can only make guesses at what you want.

Bye, Olaf.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.