We help IT Professionals succeed at work.

Flatten data with Select

ReconIT
ReconIT asked
on
1,852 Views
Last Modified: 2012-06-22
MS SQL 2005 - I have a table that contains data that I need to flatten. This will be an ongoing process. The table contains millions of records. I'm wanting to flatten it using a cursor but I'm sure there is a better/faster way. I've included a code snippet with notes to set up an example of what I'm trying to do. Thank you in advance for any help offered.
CREATE TABLE #TEMP_OR (
PID VARCHAR(5), CLNO VARCHAR(20), HN VARCHAR(20), 
DSS SMALLDATETIME, DSE SMALLDATETIME, PT VARCHAR(5), 
D1 VARCHAR(6), D2 VARCHAR(6), D3 VARCHAR(6), D4 VARCHAR(6),
DELETE_D1 VARCHAR(1), DELETE_D2 VARCHAR(1), DELETE_D3 VARCHAR(1), 
DELETE_D4 VARCHAR(1))
 
--I NEED THE DATA IN THIS TABLE TO END UP LOOKING LIKE THE DATA IN #TEMP_FLAT
INSERT #TEMP_OR
SELECT 'A1234', '1234567890', '987654321', '1/1/2007', '1/31/2007', 'AAAA', 'WERBT', 'YYUN', 'HHYH', 'HHYA', '', '', 'Y', ''
UNION
SELECT 'A1234', '1234567891', '444466770', '5/1/2007', '5/31/2007', '40', 'WERBT', 'YYUN', '', '', '', '', '', ''
 
CREATE TABLE #TEMP_FLAT (
PID VARCHAR(5), CLNO VARCHAR(20), HN VARCHAR(20), 
DSS SMALLDATETIME, DSE SMALLDATETIME, PT VARCHAR(5), [DELETE] VARCHAR(1),
D1 VARCHAR(6))
 
--THIS SHOULD BE REPLACED WITH SQL TO SELECT FROM #TEMP_OR IN A WAY THE PRODUCES THE SAME END RESULT IN #TEMP_FLAT
INSERT #TEMP_FLAT
SELECT 'A1234', '1234567890', '987654321', '1/1/2007', '1/31/2007', '01', '', 'WERBT'
UNION
SELECT 'A1234', '1234567890', '987654321', '1/1/2007', '1/31/2007', '02', '', 'YYUN'
UNION
SELECT 'A1234', '1234567890', '987654321', '1/1/2007', '1/31/2007', '02', 'Y', 'HHYH'
UNION
SELECT 'A1234', '1234567890', '987654321', '1/1/2007', '1/31/2007', '02', '', 'HHYA'
UNION
SELECT 'A1234', '1234567891', '444466770', '5/1/2007', '5/31/2007', '40', '', 'WERBT'
UNION
SELECT 'A1234', '1234567891', '444466770', '5/1/2007', '5/31/2007', '40', '', 'YYUN'
 
--YOU WILL NOTICE THAT WE HAVE A FEW SPECIAL CASES IN THE END RESULT. 
--THE FIRST 5 COLUMNS FROM #TEMP_OR STAY THE SAME FOR EACH ROW IN #TEMP_FLAT
--THE SIXTH ROW (PT) HAS SOME RULES AROUND IT. WHEN IT IS AAAA IT IS CONVERTED TO 01 FOR D1 ONLY AND
--02 FOR ANY REMAINING Dx BEYOND D1. WHEN IT IS NUMERIC IT STAYS WHAT IT IS FOR ALL Dx. 
--EACH OF THE DELETE_Dx COLUMNS MAP TO THE SAME Dx COLUMN. IF THE DELETE_Dx COLUMN IS MARKED WITH A Y THE
--DELETE COLUMN IN #TEMP_FLAT IS MARKED WITH A Y
SELECT * FROM #TEMP_FLAT
 
 
DROP TABLE #TEMP_FLAT
DROP TABLE #TEMP_OR

Open in new window

Comment
Watch Question

This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Just seen the Millions of records...you might want to remove the sorting and the HAVING statement from this query, and do that as a second operation once the first query has been run (putting the output into a temp table of course)

But really you don't want the above - you can just use UNPIVOT in MSSQL 2005 .... !!! :)

Author

Commented:
Thanks PaultheBroker. I should have mentioned that we are running 2005 under 80 compatibility. PIVOT and UNPIVOT are only available under 90 compatibility so it is not an option for me at this time. I'll try working with your first example to see where I can get with it. Thanks again.
This is the UNPIVOT example - I don't think that you can unpivot two columns with this, so would have to run this twice and then join the results ????  Maybe another expert can qualify that for us?


SELECT PID,CLNO,HN,DSS,DSE,D
,PT = case D_Cnt
			when 'D1' then case when PT = 'AAAA' then '01' else PT end
			when 'D2' then case when PT = 'AAAA' then '02' else PT end
			when 'D3' then case when PT = 'AAAA' then '02' else PT end
			when 'D4' then case when PT = 'AAAA' then '02' else PT end
		else ''
		end
FROM
	(SELECT PID,CLNO,HN,DSS,DSE,PT, D1,D2,D3,D4
	FROM #TEMP_OR) p
UNPIVOT
	(D for D_Cnt IN (D1,D2,D3,D4) ) AS unpvt
WHERE D <> ''

Open in new window

CERTIFIED EXPERT

Commented:
Try this:
SELECT 
	main.PID,
	main.CLNO,
	main.HN,
	main.DSS,
	main.DSE,
	flat.PT,
	flat.[DELETE],
	flat.D1
FROM 
	#TEMP_OR main
	INNER JOIN 
	(
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				WHEN PT='AAAA' THEN '01' 
				ELSE '02'
			END AS PT,
			DELETE_D1 AS [DELETE], D1
		FROM #TEMP_OR
		UNION ALL  
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				ELSE '02'
			END AS PT,
			DELETE_D2, D2 
		FROM #TEMP_OR
		UNION ALL  
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				ELSE '02'
			END AS PT,
			DELETE_D3, D3 
		FROM #TEMP_OR
		UNION ALL  
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				ELSE '02'
			END AS PT,
			DELETE_D4, D4 
		FROM #TEMP_OR
	) flat 
		ON 	flat.PID=main.PID
		AND	flat.CLNO=main.CLNO
		AND	flat.HN=main.HN
		AND	flat.DSS=main.DSS
		AND	flat.DSE=main.DSE
WHERE 
	isnull(ltrim(rtrim(flat.D1)),'')<>''
ORDER BY
	flat.PT,
	flat.D1

Open in new window

CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
@Zbertoec - yes that might work, but I would just be a little bit worried about scanning the table four times...it would be interesting to see what the performance difference was with the live recordset....
CERTIFIED EXPERT

Commented:
You can't avioid that no matter what other select construction you might come up with but is way faster than any looping method, using cursor or not.

Author

Commented:
Thanks to both of you. I'm working with both to see if one seems any better in the end with the live data. I left one important piece out that is giving me trouble. I won't grade anyone based on it since I didn't ask up front but one other requirement I have is that I also have to assign a sequence to the records as I flatten them. Not a sequence in terms of each row but the total number of rows as I flatten them. The order doesn't matter but the end needs to look like this:


SELECT 'A1234', '0000001' as SEQ, '1234567890', '987654321', '1/1/2007', '1/31/2007', '01', '', 'WERBT'
UNION
SELECT 'A1234', '0000002', '1234567890', '987654321', '1/1/2007', '1/31/2007', '02', '', 'YYUN'
UNION
SELECT 'A1234', '0000003', '1234567890', '987654321', '1/1/2007', '1/31/2007', '02', 'Y', 'HHYH'
UNION
SELECT 'A1234', '0000004', '1234567890', '987654321', '1/1/2007', '1/31/2007', '02', '', 'HHYA'
UNION
SELECT 'A1234', '0000005', '1234567891', '444466770', '5/1/2007', '5/31/2007', '40', '', 'WERBT'
UNION
SELECT 'A1234', '0000006', '1234567891', '444466770', '5/1/2007', '5/31/2007', '40', '', 'YYUN'

Open in new window

Senior DBA
CERTIFIED EXPERT
Most Valuable Expert 2018
Distinguished Expert 2019
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
>Not a sequence in terms of each row but the total number of rows as I flatten them
I don't understand the difference?

Author

Commented:
I need to spit out an a SEQ with each row in the flat table. As in this is record 000001, 000002, etc. Paul made a suggestion of adding an identity to the flat table and I think that will help. I'll still need to format the SEQ like above but I can do that with the identity when I select it out.
CERTIFIED EXPERT

Commented:
You can use this:

1. insert the result into a table, say called #TBL_FLAT (it doesn't necessarily have to be a temp table)

2. Add an insert column to that table:
-- 1. Insert into a new table
SELECT 
	* 
INTO #TBL_FLAT
FROM 
	(
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				WHEN PT='AAAA' THEN '01' 
				ELSE '02'
			END AS PT,
			DELETE_D1 AS [DELETE], D1
		FROM #TEMP_OR
		UNION ALL  
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				ELSE '02'
			END AS PT,
			DELETE_D2, D2 
		FROM #TEMP_OR
		UNION ALL  
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				ELSE '02'
			END AS PT,
			DELETE_D3, D3 
		FROM #TEMP_OR
		UNION ALL  
		SELECT PID,CLNO,HN,DSS,DSE,
			CASE 
				WHEN isnumeric(PT)=1 THEN PT 
				ELSE '02'
			END AS PT,
			DELETE_D4, D4 
		FROM #TEMP_OR
	) flat 
WHERE 
	isnull(ltrim(rtrim(flat.D1)),'')<>''
ORDER BY
	flat.PT,
	flat.D1
GO
 
-- 2. add an identity column
ALTER TABLE #TBL_FLAT ADD idn int IDENTITY
GO

Open in new window

Oh - I'm an idiot - in my first example of course you don't need the GROUP BY statement, and the MAX() funcitons - Scott is of course right....(I was getting confused with the opposite operation - the PIVOT.......  :S

Sorry about that....

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.