asked on
Help transforming text string data in order to split into six parts, calculate average of each section and put averages in new columns. Sample data provided
I'm having trouble figuring out how to create a view in SQL SSMS v18.11.1 that does what I need.
Here is my sample data:
DB Fiddle - SQL Database Playground (db-fiddle.com)
What I'm trying to accomplish:
-Get SampleData out of text string
-Divide each number in the SampleData by 10,000
-Split the SampleData into six sections
-Get average of each section and put these averages in columns labeled Section1AVG,
Section2AVG, etc.
-Only grab the data with a Source = 'Thickness[0,0]
Any help is appreciated. Thanks.
"Split the SampleData into six sections"
What rules should be used for the grouping?
CREATE TABLE dbo.Thickness (
PKId int NOT NULL,
Source varchar(50) NOT NULL,
DateCreated datetime NOT NULL,
SampleSize int NOT NULL,
SampleData varchar(max) NOT NULL
)
INSERT INTO dbo.Thickness (PKId, Source, DateCreated, SampleSize, SampleData) VALUES
(1, 'Thickness[0,0]', '2023-09-01 06:08:40.050', '11', '121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(2, 'Thickness[0,1]', '2023-09-01 06:08:40.050', '11', '121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(3, 'Thickness[0,0]', '2023-09-01 06:07:36.050', '12', '125555, 121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(4, 'Thickness[0,1]', '2023-09-01 06:07:36.050', '12', '125555, 121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(5, 'Thickness[0,0]', '2023-09-01 06:06:25.050', '11', '121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(6, 'Thickness[0,1]', '2023-09-01 06:06:25.050', '11', '121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(7, 'Thickness[0,0]', '2023-09-01 06:05:55.050', '15', '123456, 123213, 122111, 123458, 121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654'),
(8, 'Thickness[0,1]', '2023-09-01 06:05:55.050', '15', '123456, 123213, 122111, 123458, 121558, 121624, 121587, 121678, 122576, 123454, 123476, 121456, 128742, 129889, 123654');
SELECT PKId, [1] Section1AVG, [2] Section2AVG, [3] Section3AVG, [4] Section4AVG, [5] Section5AVG, [6] Section6AVG FROM (
SELECT PKId, CAST(value AS INTEGER)/10000. newVal, CAST(NTILE(6) OVER (PARTITION BY PKId ORDER BY value) AS CHAR(1)) grp
FROM dbo.Thickness t
CROSS APPLY STRING_SPLIT(t.SampleData, ',')
WHERE Source = 'Thickness[0,0]'
) s
PIVOT
(
AVG(newVal)
FOR grp IN ([1],[2],[3],[4],[5],[6])
) AS pt
Only if running SQL Server 2016 or above.
That is why I asked for Product and Version.
ASKER
Whoops! Here's a link to one that is using SQL Server.
Here is an example of my expected results:
My version of SQL is:
Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 but I am creating the view in SSMS v18.11.1
For the grouping, I just need them to be as equal as possible. There are normally around 500 data points in the SampleData so it won't make too big of a difference if some groups have a couple more data points than others.
Assuming a typo in there somewhere, take a look at the SQL Pavel posted. It should be close.
More important - The grouping is still unclear... NTILE needs some order and it seems your sample groups are simply taken as they appear in the string. Unfortunately SQL 2017 STRING_SPLIT does not support such sorting and it would need SQL 2022...
ASKER
I need them to be grouped as they appear in the string.
So if my SampleData has 120 data points in it:
Section1AVG needs the first 20 data points.
Section2Average needs points 21-40
And so on..
I would need SQL 2022 to do this?
;WITH CteThickness AS (
SELECT *, NTILE(6) OVER(PARTITION BY PKId ORDER BY ItemNumber) AS Ntile
FROM dbo.Thickness
CROSS APPLY (
SELECT CAST(SampleData AS varchar(8000)) AS SampleDataChar
) AS ca1
CROSS APPLY dbo.DelimitedSplit8K(SampleDataChar, ',') AS ds
WHERE Source = 'Thickness[0,0]' AND
LEN(CAST(SampleData AS varchar(max))) <= 8000
)
SELECT PkId,
AVG(CASE WHEN Ntile = 1 THEN CAST(CAST(Item AS decimal(9, 2)) / 10000.0 AS decimal(9, 2)) ELSE NULL END) AS Avg1,
AVG(CASE WHEN Ntile = 2 THEN CAST(CAST(Item AS decimal(9, 2)) / 10000.0 AS decimal(9, 2)) ELSE NULL END) AS Avg2,
AVG(CASE WHEN Ntile = 3 THEN CAST(CAST(Item AS decimal(9, 2)) / 10000.0 AS decimal(9, 2)) ELSE NULL END) AS Avg3,
AVG(CASE WHEN Ntile = 4 THEN CAST(CAST(Item AS decimal(9, 2)) / 10000.0 AS decimal(9, 2)) ELSE NULL END) AS Avg4,
AVG(CASE WHEN Ntile = 5 THEN CAST(CAST(Item AS decimal(9, 2)) / 10000.0 AS decimal(9, 2)) ELSE NULL END) AS Avg5,
AVG(CASE WHEN Ntile = 6 THEN CAST(CAST(Item AS decimal(9, 2)) / 10000.0 AS decimal(9, 2)) ELSE NULL END) AS Avg6
FROM CteThickness
GROUP BY PKId
I think only if you use NULL, which can cause warning messages to be issued, I believe.
An avg is nothing more than the SUM / COUNT, as people tend to forget since they are used to using avg.
@Bri: If you need to deal with data lens > 8000, let me know. If it's only up to 16000, you could just split it into 2 8ks and still use delimitedsplit8k. If it goes beyond that, it gets trickier.
The following query does not order the groups correctly (data are the same as in my first post):
SELECT PKId, CAST(value AS INTEGER)/10000. newVal, NTILE(6) OVER (PARTITION BY PKId ORDER BY value) grp
FROM dbo.Thickness t
CROSS APPLY STRING_SPLIT(t.SampleData, ',')
WHERE Source = 'Thickness[0,0]'
I would say the group 6 should always contain the highest value or two from the string split but the results are showing something else:Corrected query:
SELECT PKId, CAST(value AS INTEGER)/10000. newVal, NTILE(6) OVER (PARTITION BY PKId ORDER BY LTRIM(value)) grp
FROM dbo.Thickness t
CROSS APPLY STRING_SPLIT(t.SampleData, ',')
WHERE Source = 'Thickness[0,0]'
ASKER
Sorry for the late response. I have been sick and am just getting back to working on this.
Thanks for all the help. The solution is working as expected.
The Fiddle is MySQL.
I'd also like to see your expected results. For the first one "Get SampleData out of text string" I'm guessing you want the CSV turned into individual rows.
How to do that varies greatly between database products and versions.
What database product and version does this need to work on?