Juan Velasquez
asked on
Retrieving filtered records for pivot
I'm trying to create a stored procedures that will allow me to input two parameters consisting of two different ids. All the records in each dataversion dataset has have records with the same metric names but with different values in those field. All the records are in a single table called metrics. I am trying to retrieve the difference between metric values for each dataversion Id. I can do this simply enough via CTE as shown below. However I also need to pivot this data. The provlem is that I need to use Dynamic sql and Dynamic SQL can't use CTE. Others hiave proposed using a vew and then using that view in the pivot process. however, you can't pass parameters to a view. Of course, I could place the dataversion criteria in the dynamic sql but that would mean first retrieving all the records via the ctes and then apply the criteria in the pivoting stage. We're talking serveral hundred thousand records here.
-- Retrieves data
Create PROCEDURE up_PrimaryBenefitsTraditionalMetricsReport
@DVID1 as NVARCHAR(MAX),
@DVID2 as NVARCHAR(MAX)
AS
Declare @strSQL varchar(max)
DECLARE @cols varchar(2000)
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
WITH C1 as
(
SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionRowID, M1.DataVersionID FROM Metrics M1
INNER JOIN DataVersion DV
ON M1.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID1
),
C2 AS
(
SELECT DV.ForecastYear, MetricName , MetricValue, M2.DataVersionRowID, M2.DataVersionID FROM Metrics M2
INNER JOIN DataVersion DV
ON M2.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID2
),
C3 AS
(
SELECT C1.Forecastyear, C1.MetricName, C1.MetricValue, C1.MetricValue-C2.METRICVALUE DELTA FROM C1
INNER JOIN C2
ON C1.MetricName = C2.MetricName
)
SELECT * FROM C3
ASKER
I came up with the following code and it seems to work
ALTER PROCEDURE up_PrimaryBenefitsTraditionalMetricsReport
@DVID1 as NVARCHAR(MAX),
@DVID2 as NVARCHAR(MAX)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
TRUNCATE table tblReportMetrics;
WITH C1 as
(
SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionRowID, M1.DataVersionID FROM Metrics M1
INNER JOIN DataVersion DV
ON M1.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID1
),
C2 AS
(
SELECT DV.ForecastYear, MetricName , MetricValue, M2.DataVersionRowID, M2.DataVersionID FROM Metrics M2
INNER JOIN DataVersion DV
ON M2.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID2
),
C3 AS
(
SELECT C1.Forecastyear, C1.MetricName, C1.MetricValue, C1.MetricValue-C2.METRICVALUE DELTA FROM C1
INNER JOIN C2
ON C1.MetricName = C2.MetricName
)
-- SELECT ForecastYear, MetricName, DELTA INTO REPORTMETRICS FROM C3
INSERT INTO tblReportMetrics(ForecastYear, MetricName, DELTA)
SELECT ForecastYear, MetricName, DELTA FROM C3
GO;
Declare @strSQL varchar(max)
DECLARE @cols varchar(2000)
SELECT @cols = STUFF(( SELECT DISTINCT
'],[' + cast(ForecastYear as varchar)
From tblReportMetrics
ORDER BY 1
FOR XML PATH('')
), 1, 2, '') + ']'
set @strSQL = 'SELECT MetricName, ' + @cols +
' from (
SELECT MetricName, ForecastYear, Delta
FROM dbo.tblReportMetrics
) o
pivot(sum(Delta) for ForecastYear in (' + @cols + '))p'
select @strSQL
exec(@strSQL)
END
This has one limitation - the query can not be run simultaneously on two stations without data conflicts in the working table - and one possible limitation - it may be slower than the simple JOIN syntax - that is, the CTEs may be less efficient than the standard SQL.
One workaround for the first problem would be to include an additional field in the work table for the current SPID, and pass that into the dynamic SQL string. Below is a sample of that fix:
One workaround for the first problem would be to include an additional field in the work table for the current SPID, and pass that into the dynamic SQL string. Below is a sample of that fix:
ALTER PROCEDURE up_PrimaryBenefitsTraditionalMetricsReport
@DVID1 as NVARCHAR(MAX),
@DVID2 as NVARCHAR(MAX)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE @spid int
SET @spid = @@SPID
DELETE FROM tblReportMetrics
WHERE spid = @spid;
WITH C1 as
(
SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionRowID, M1.DataVersionID FROM Metrics M1
INNER JOIN DataVersion DV
ON M1.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID1
),
C2 AS
(
SELECT DV.ForecastYear, MetricName , MetricValue, M2.DataVersionRowID, M2.DataVersionID FROM Metrics M2
INNER JOIN DataVersion DV
ON M2.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID2
),
C3 AS
(
SELECT C1.Forecastyear, C1.MetricName, C1.MetricValue, C1.MetricValue-C2.METRICVALUE DELTA FROM C1
INNER JOIN C2
ON C1.MetricName = C2.MetricName
)
-- SELECT ForecastYear, MetricName, DELTA INTO REPORTMETRICS FROM C3
INSERT INTO tblReportMetrics(spid, ForecastYear, MetricName, DELTA)
SELECT @spid, ForecastYear, MetricName, DELTA FROM C3
GO;
Declare @strSQL varchar(max)
DECLARE @cols varchar(2000)
SELECT @cols = STUFF(( SELECT DISTINCT
'],[' + cast(ForecastYear as varchar)
From tblReportMetrics
ORDER BY 1
FOR XML PATH('')
), 1, 2, '') + ']'
set @strSQL = 'SELECT MetricName, ' + @cols +
' from (
SELECT MetricName, ForecastYear, Delta
FROM dbo.tblReportMetrics
WHERE spid = ' + Cast(@spid as varchar(10)) + '
) o
pivot(sum(Delta) for ForecastYear in (' + @cols + '))p'
select @strSQL
exec(@strSQL)
END
ASKER
Hello bhess1,
Thank you for bringing up those two points. I just want to be sure I understand the issues you raised rather than just implementing the suggestions you make without understanding them.
@@SPID returns the session ID of the current user process. By storing that value in the working table each user will only be retrieving or working with data corresponding to the dataversion Id's he or she selected. You know I hadn't even thought of that. I've been developing in a dev environment an at present I only have one user working in a QA environment. As a result this issue didn't come up. I'm glad you brought it up. As for the point that "CTEs may be less efficient than the standard SQL.". I'll try both solutions and see which one is faster. I'll let you know what I find out today.
Thank you for bringing up those two points. I just want to be sure I understand the issues you raised rather than just implementing the suggestions you make without understanding them.
@@SPID returns the session ID of the current user process. By storing that value in the working table each user will only be retrieving or working with data corresponding to the dataversion Id's he or she selected. You know I hadn't even thought of that. I've been developing in a dev environment an at present I only have one user working in a QA environment. As a result this issue didn't come up. I'm glad you brought it up. As for the point that "CTEs may be less efficient than the standard SQL.". I'll try both solutions and see which one is faster. I'll let you know what I find out today.
ASKER
From what I understand the benefits of using table expressions are related to the logical aspects of the code and not to performance. Basically I can use a more modular approach using CTEs thus making it easier to understand. Am I correct. I just want to be sure that I actually understand the nature of CTE correctly
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I'd like to run test and compare both approaches - CTE and non-CTE. I'v created another version of the code by embedding the block as a derived table as you suggested. It seems to compile properly but I keep getting the following error when I try to execute it. I've isolated the error to the first block, where I Select @col = STUFF................... The only difference between this portion of the code and the corresponding portion of the CTE version is that I've replaced tblReportMetrics with the derived table. I need a second pair of eyes to see what I am doing wrong. I'm pretty sure it's something simple
Msg 8155, Level 16, State 2, Line 17
No column name was specified for column 1 of 'x'.
Msg 207, Level 16, State 1, Line 8
Invalid column name 'ForecastYear'.
Msg 8155, Level 16, State 2, Line 17
No column name was specified for column 1 of 'x'.
Msg 207, Level 16, State 1, Line 8
Invalid column name 'ForecastYear'.
DECLARE @dvid1 as nvarchar(MAX)= '2011-EEPort-NA-BAU-B-1 -NEMS-None-Mar 30 2010 9:09AM'
DECLARE @dvid2 as nvarchar(MAX) = '2011-EEPort-NA-BAU-T-1 -NEMS-None-Mar 30 2010 8:53AM'
Declare @strSQL varchar(max)
DECLARE @cols varchar(2000)
SELECT @cols = STUFF((SELECT DISTINCT '],[' + cast(x.ForecastYear as varchar )
From (
SELECT
dv1.ForecastYear
FROM Metrics C1
INNER JOIN DataVersion DV1 ON C1.DataVersionRowID = DV1.DataVersionRowID
INNER JOIN Metrics C2 ON C1.MetricName = c2.MetricName
INNER JOIN DataVersion DV2 ON C2.DataVersionRowID = DV2.DataVersionRowID
WHERE DV1.DataVersionID = @dvid1 AND DV2.DataVersionID = @dvid2
ORDER BY ForecastYear FOR XML PATH('') )as x), 1, 2, '') + ']'
select @cols
set @strSQL = 'SELECT MetricName, ' + @cols +
' from (
SELECT MetricName, ForecastYear, Delta
FROM SELECT DV1.Forecastyear, C1.MetricName, C1.MetricValue, C1.MetricValue-C2.METRICVALUE DELTA
FROM Metrics C1
INNER JOIN DataVersion DV1
ON C1.DataVersionRowID = DV1.DataVersionRowID
INNER JOIN Metrics C2
ON C1.MetricName = c2.MetricName
INNER JOIN DataVersion DV2
ON C2.DataVersionRowID = DV2.DataVersionRowID
WHERE DV1.DataVersionID = ' + @dvid1 + '
AND DV2.DataVersionID = ' + @dvid2 + ')o
pivot(sum(Delta) for ForecastYear in (' + @cols + '))p'
--select @strSQL
exec(@strSQL)
ASKER
Correction ORDER BY ForecastYear should be Order By 1. I still get the same error
ASKER
I'm still working on it and still haven't found the problem
Hmmm.... what is the need for the FOR XML PATH('') in the x query? I believe that it is what is causing the issue with the no fieldname.... what do you see when you run just that part of the query solo?
ASKER
I'm am usind it to create a xml string that contains all the forecast years. This "single" value is necessary since I am using a subquery.
ASKER
I found your explanation of CTE very useful. I wasn't able to get the code working using subqueries so I was forced to use CTEs' I'll revisit that code at a later date.
WITH C1 as
(
SELECT DV.ForecastYear, MetricName, MetricValue, M1.DataVersionID
FROM Metrics M1
INNER JOIN DataVersion DV
ON M1.DataVersionRowID = DV.DataVersionRowID
WHERE DV.DataVersionID = @DVID1
),
C2 AS
(
SELECT DV.ForecastYear, MetricValue, M2.DataVersionID
FROM Metrics M2
INNER JOIN DataVersion DV
ON M2.DataVersionRowID = DV.DataVersionRowID
-- WHERE DV.DataVersionID = @DVID2
),
C3 AS
(
SELECT C1.Forecastyear, C1.MetricName, C1.MetricValue, C1.MetricValue-C2.METRICVA
FROM C1
INNER JOIN C2
ON C1.MetricName = C2.MetricName
)
SELECT * FROM C3
--------------------------
When you reference the view, be sure to include the two DVIDs as criteria, e.g.:
SELECT *
FROM MyView
WHERE DVID1 = @DVID1
AND DVID2 = @DVID2
Check the execution plan, and you will see that filtering is going on in the view query. Whether it is filtering enough is a different question, but this could be an answer.
Another method would be to forego CTEs entirely. I don't think you gain anything by using them here when compared to this code:
SELECT C1.Forecastyear, C1.MetricName, C1.MetricValue, C1.MetricValue-C2.METRICVA
FROM Metrics C1
INNER JOIN DataVersion DV1
ON C1.DataVersionRowID = DV1.DataVersionRowID
INNER JOIN Metrics C2
ON C1.MetricName = c2.MetricName
INNER JOIN DataVersion DV2
ON C2.DataVersionRowID = DV2.DataVersionRowID
WHERE DV1.DataVersionID = @dvid1
AND DV2.DataVersionID = @dvid2
Embedding this block as a derived table should allow your dynamic SQL to function fine.