Link to home
Start Free TrialLog in
Avatar of Juan Velasquez
Juan VelasquezFlag for United States of America

asked on

Retrieving filtered records for pivot

I'm trying to create a stored procedures that will allow me to input two parameters consisting of two different ids.  All the records in each dataversion dataset has have records with the same metric names but with different values in those field.  All the records are in a single table called metrics.  I am trying to retrieve the difference between metric values for each dataversion Id.  I can do this simply enough via CTE as shown below.  However I also need to pivot this data.  The provlem is that I  need to use Dynamic sql and Dynamic SQL can't use CTE.  Others hiave proposed using a vew and then using that view in the pivot process.  however, you can't pass parameters to a view.  Of course, I could place the dataversion criteria in the dynamic sql but that would mean first retrieving all the records via the ctes and then apply the criteria in the pivoting stage.  We're talking serveral hundred thousand records here.  
-- Retrieves data
Create PROCEDURE up_PrimaryBenefitsTraditionalMetricsReport 
@DVID1 as NVARCHAR(MAX), 
@DVID2 as NVARCHAR(MAX) 
         
AS 
Declare @strSQL varchar(max)           
DECLARE @cols varchar(2000)     
 
BEGIN 
        -- SET NOCOUNT ON added to prevent extra result sets from 
        -- interfering with SELECT statements. 
        SET NOCOUNT ON; 
 
        WITH C1 as 
        ( 
                SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionRowID, M1.DataVersionID FROM Metrics M1 
                INNER JOIN DataVersion DV 
                ON M1.DataVersionRowID = DV.DataVersionRowID 
                WHERE DV.DataVersionID = @DVID1
         ), 
          
         C2 AS 
         ( 
                SELECT DV.ForecastYear, MetricName , MetricValue, M2.DataVersionRowID, M2.DataVersionID FROM Metrics M2 
                INNER JOIN DataVersion DV 
                ON M2.DataVersionRowID = DV.DataVersionRowID 
                WHERE DV.DataVersionID = @DVID2
        ), 
         
        C3 AS 
        ( 
                SELECT  C1.Forecastyear, C1.MetricName,  C1.MetricValue,  C1.MetricValue-C2.METRICVALUE DELTA FROM C1 
                INNER JOIN C2  
                ON C1.MetricName = C2.MetricName  
        ) 
 SELECT * FROM C3

Open in new window

Avatar of Brendt Hess
Brendt Hess
Flag of United States of America image

Actually, if you check the internal query flow, you will find that Views DO use filters if you apply them correctly.  I would rewrite this for the view as:

        WITH C1 as
        (
                SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionID
                FROM Metrics M1
                INNER JOIN DataVersion DV
                ON M1.DataVersionRowID = DV.DataVersionRowID
                WHERE DV.DataVersionID = @DVID1
         ),
         
         C2 AS
         (
                SELECT DV.ForecastYear, MetricValue, M2.DataVersionID
                FROM Metrics M2
                INNER JOIN DataVersion DV
                ON M2.DataVersionRowID = DV.DataVersionRowID
--                WHERE DV.DataVersionID = @DVID2
        ),
         
        C3 AS
        (
                SELECT  C1.Forecastyear, C1.MetricName,  C1.MetricValue,  C1.MetricValue-C2.METRICVALUE DELTA, C1.DataVersionID as DVID1, C2.DataVersionID as DVID2
                FROM C1
                INNER JOIN C2  
                ON C1.MetricName = C2.MetricName  
        )
 SELECT * FROM C3

----------------------------------

When you reference the view, be sure to include the two DVIDs as criteria, e.g.:

SELECT *
FROM MyView
WHERE DVID1 = @DVID1
    AND DVID2 = @DVID2

Check the execution plan, and you will see that filtering is going on in the view query.  Whether it is filtering enough is a different question, but this could be an answer.

Another method would be to forego CTEs entirely.  I don't think you gain anything by using them here when compared to this code:

SELECT C1.Forecastyear, C1.MetricName,  C1.MetricValue,  C1.MetricValue-C2.METRICVALUE DELTA
FROM Metrics C1
INNER JOIN DataVersion DV1
    ON C1.DataVersionRowID = DV1.DataVersionRowID
INNER JOIN Metrics C2
    ON C1.MetricName = c2.MetricName
INNER JOIN DataVersion DV2
    ON C2.DataVersionRowID = DV2.DataVersionRowID
WHERE DV1.DataVersionID = @dvid1
    AND DV2.DataVersionID = @dvid2

Embedding this block as a derived table should allow your dynamic SQL to function fine.
Avatar of Juan Velasquez

ASKER

I came up with the following code and it seems to work
ALTER PROCEDURE up_PrimaryBenefitsTraditionalMetricsReport 
@DVID1 as NVARCHAR(MAX), 
@DVID2 as NVARCHAR(MAX) 
         
AS 
   
 
BEGIN 
        -- SET NOCOUNT ON added to prevent extra result sets from 
        -- interfering with SELECT statements. 
        SET NOCOUNT ON; 
        
TRUNCATE table tblReportMetrics;
 
        WITH C1 as 
        ( 
                SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionRowID, M1.DataVersionID FROM Metrics M1 
                INNER JOIN DataVersion DV 
                ON M1.DataVersionRowID = DV.DataVersionRowID 
                WHERE DV.DataVersionID = @DVID1
         ), 
          
         C2 AS 
         ( 
                SELECT DV.ForecastYear, MetricName , MetricValue, M2.DataVersionRowID, M2.DataVersionID FROM Metrics M2 
                INNER JOIN DataVersion DV 
                ON M2.DataVersionRowID = DV.DataVersionRowID 
                WHERE DV.DataVersionID = @DVID2
        ), 
         
        C3 AS 
        ( 
                SELECT  C1.Forecastyear, C1.MetricName,  C1.MetricValue,  C1.MetricValue-C2.METRICVALUE DELTA FROM C1 
                INNER JOIN C2  
                ON C1.MetricName = C2.MetricName  
        ) 




-- SELECT ForecastYear, MetricName, DELTA  INTO REPORTMETRICS  FROM C3
INSERT INTO tblReportMetrics(ForecastYear, MetricName, DELTA)
SELECT ForecastYear, MetricName, DELTA   FROM C3
 
GO;
   
Declare @strSQL varchar(max)           
DECLARE @cols varchar(2000)      
         
        SELECT  @cols = STUFF(( SELECT DISTINCT             
                                '],[' + cast(ForecastYear as varchar)            
                        From tblReportMetrics   
                        ORDER BY 1            
                        FOR XML PATH('')            
                      ), 1, 2, '') + ']'            
       
         
set @strSQL = 'SELECT MetricName, ' + @cols +    
                ' from (   
                        SELECT  MetricName, ForecastYear, Delta                                      
                        FROM dbo.tblReportMetrics  
                        ) o   
                pivot(sum(Delta) for ForecastYear in (' + @cols + '))p'   
     select @strSQL 
   exec(@strSQL)    
         
END 

Open in new window

This has one limitation - the query can not be run simultaneously on two stations without data conflicts in the working table - and one possible limitation - it may be slower than the simple JOIN syntax - that is, the CTEs may be less efficient than the standard SQL.

One workaround for the first problem would be to include an additional field in the work table for the current SPID, and pass that into the dynamic SQL string.  Below is a sample of that fix:

ALTER PROCEDURE up_PrimaryBenefitsTraditionalMetricsReport 
@DVID1 as NVARCHAR(MAX), 
@DVID2 as NVARCHAR(MAX) 
         
AS 
 
BEGIN 
        -- SET NOCOUNT ON added to prevent extra result sets from 
        -- interfering with SELECT statements. 
        SET NOCOUNT ON; 

DECLARE @spid int
SET @spid = @@SPID
        
DELETE FROM tblReportMetrics
WHERE spid = @spid;
 
        WITH C1 as 
        ( 
                SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionRowID, M1.DataVersionID FROM Metrics M1 
                INNER JOIN DataVersion DV 
                ON M1.DataVersionRowID = DV.DataVersionRowID 
                WHERE DV.DataVersionID = @DVID1
         ), 
          
         C2 AS 
         ( 
                SELECT DV.ForecastYear, MetricName , MetricValue, M2.DataVersionRowID, M2.DataVersionID FROM Metrics M2 
                INNER JOIN DataVersion DV 
                ON M2.DataVersionRowID = DV.DataVersionRowID 
                WHERE DV.DataVersionID = @DVID2
        ), 
         
        C3 AS 
        ( 
                SELECT  C1.Forecastyear, C1.MetricName,  C1.MetricValue,  C1.MetricValue-C2.METRICVALUE DELTA FROM C1 
                INNER JOIN C2  
                ON C1.MetricName = C2.MetricName  
        ) 




-- SELECT ForecastYear, MetricName, DELTA  INTO REPORTMETRICS  FROM C3
INSERT INTO tblReportMetrics(spid, ForecastYear, MetricName, DELTA)
SELECT @spid, ForecastYear, MetricName, DELTA   FROM C3
 
GO;
   
Declare @strSQL varchar(max)           
DECLARE @cols varchar(2000)      
         
        SELECT  @cols = STUFF(( SELECT DISTINCT             
                                '],[' + cast(ForecastYear as varchar)            
                        From tblReportMetrics   
                        ORDER BY 1            
                        FOR XML PATH('')            
                      ), 1, 2, '') + ']'            
       
         
set @strSQL = 'SELECT MetricName, ' + @cols +    
                ' from (   
                        SELECT  MetricName, ForecastYear, Delta                                      
                        FROM dbo.tblReportMetrics 
                        WHERE spid = ' + Cast(@spid as varchar(10)) + '
                        ) o   
                pivot(sum(Delta) for ForecastYear in (' + @cols + '))p'   
     select @strSQL 
   exec(@strSQL)    
         
END 

Open in new window

Hello bhess1,

Thank you for bringing up those two points.  I just want to be sure I understand the issues you raised rather than just implementing the suggestions you make without understanding them.  
@@SPID returns the session ID of the current user process. By storing that value in the working table each user will only be retrieving or working with data corresponding to the dataversion Id's he or she selected.  You know I hadn't even thought of that.  I've been developing in a dev environment an at present I only have one user working in a QA environment.  As a result this issue didn't come up.  I'm glad you brought it up.  As for the point that "CTEs may be less efficient than the standard SQL.".  I'll try both solutions and see which one is faster.  I'll let you know what I find out today.
From what I understand the benefits of using table expressions are related to the logical aspects of the code and not to performance.  Basically I can use a more modular approach using CTEs thus making it easier to understand.  Am I correct.  I just want to be sure that I actually understand the nature of CTE correctly
ASKER CERTIFIED SOLUTION
Avatar of Brendt Hess
Brendt Hess
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'd like to run test and compare both approaches - CTE and non-CTE.  I'v created another version of the code by embedding the block as a derived table as you suggested.  It seems to compile properly but I keep getting the following error when I try to execute it.  I've isolated the error to the first block, where I Select @col  = STUFF...................  The only difference between this portion of the code and the corresponding portion of the CTE version is that I've replaced tblReportMetrics with the derived table.  I need a second pair of eyes to see what I am doing wrong.  I'm pretty sure it's something simple

Msg 8155, Level 16, State 2, Line 17
No column name was specified for column 1 of 'x'.
Msg 207, Level 16, State 1, Line 8
Invalid column name 'ForecastYear'.

DECLARE @dvid1 as nvarchar(MAX)= '2011-EEPort-NA-BAU-B-1 -NEMS-None-Mar 30 2010  9:09AM'
DECLARE @dvid2 as nvarchar(MAX) = '2011-EEPort-NA-BAU-T-1 -NEMS-None-Mar 30 2010  8:53AM'


Declare @strSQL varchar(max)           
DECLARE @cols varchar(2000)     

		SELECT  @cols = STUFF((SELECT DISTINCT '],[' + cast(x.ForecastYear as varchar )            
		From (
				SELECT 
				dv1.ForecastYear 
				FROM Metrics C1
				INNER JOIN DataVersion DV1 ON C1.DataVersionRowID = DV1.DataVersionRowID
				INNER JOIN Metrics C2 ON C1.MetricName = c2.MetricName
				INNER JOIN DataVersion DV2 ON C2.DataVersionRowID = DV2.DataVersionRowID
				WHERE DV1.DataVersionID = @dvid1 AND DV2.DataVersionID = @dvid2 								
				ORDER BY ForecastYear  FOR XML PATH('')	 	 )as x), 1, 2, '') + ']'

        select @cols  
       
         
set @strSQL = 'SELECT MetricName, ' + @cols +    
                ' from (   
                        SELECT  MetricName, ForecastYear, Delta                                      
                        FROM SELECT DV1.Forecastyear, C1.MetricName,  C1.MetricValue,  C1.MetricValue-C2.METRICVALUE DELTA
						FROM Metrics C1
						INNER JOIN DataVersion DV1
						ON C1.DataVersionRowID = DV1.DataVersionRowID
						INNER JOIN Metrics C2
						ON C1.MetricName = c2.MetricName
						INNER JOIN DataVersion DV2
						ON C2.DataVersionRowID = DV2.DataVersionRowID
						WHERE DV1.DataVersionID = ' + @dvid1 + '
						AND DV2.DataVersionID = ' + @dvid2 + ')o 
                        
                pivot(sum(Delta) for ForecastYear in (' + @cols + '))p'   
     --select @strSQL 
   exec(@strSQL)  

Open in new window

Correction ORDER BY ForecastYear  should be Order By 1.  I still get the same error
I'm still working on it and still haven't found the problem
Hmmm.... what is the need for the  FOR XML PATH('') in the x query?  I believe that it is what is causing the issue with the no fieldname.... what do you see when you run just that part of the query solo?
I'm am usind it to create a xml string that contains all the forecast years.  This "single" value is necessary since I am using a subquery.
I found your explanation of CTE very useful.  I wasn't able to get the code working using subqueries so I was forced to use CTEs'  I'll revisit that code at a later date.