Link to home
Start Free TrialLog in
Avatar of Brian Pringle
Brian PringleFlag for United States of America

asked on

SQL Denormalize Column into One Row with Many Columns

I need a SQL query that will convert output from a normalized database table into a denormalized table with one row per record with many columns.

What I have:

Invoice_Num     Emp_ID
A1234                 1
A1234                 2
A1234                 3
B2345                 3
B2345                 6
C4567                 4
C4567                 2
D9876                 1
D9876                 6
E1234                 1
F6543                 6
F6543                 2
F6543                 3
F6543                 4

What I need:

Invoice_Num     Emp1         Emp2         Emp3         Emp4   ---  and so on dynamically
A1234                 1                 2                 3
B2345                 3                 6
C4567                 4                 2
D9876                 1                 6
E1234                 1
F6543                 6                 2                 3                 4
Avatar of Russell Fox
Russell Fox
Flag of United States of America image

You can use Mitchell Sellers' dynamic pivot technique. It uses a two-step process to get the dynamic new column names and then pivots it using dynamic SQL.
Avatar of Brian Pringle

ASKER

Can you provide a working solution for the table that I provided?  I was able to get his working, but could not get a list to work.  I am wondering if it is because I am not using aggregated data?
SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I really would like to get this working dynamically instead of having to hard-code all of the case statements.
The hard-coding is just the max number of emp_ids to have in the result.

If you really want to limit it to the existing data, you'd get the max # of emp ids from this query:

DECLARE @Emp_Count int

SELECT @Emp_Count =  MAX(Emp_Count)
FROM (
    SELECT COUNT(*) AS Emp_Count
    FROM table_name
    GROUP BY Invoice_Num
) AS derived

Then use that counter to generate the appropriate number of:
    MAX(CASE WHEN row_num = nn THEN Emp_ID END) AS Empnn,
columns.
I know.  The problem is that it is possible for 40-50 people to touch a particular invoice before it is being released.  I don't want to have to have that many columns if the particular invoice requested only has 2 people.  To get this information currently, they use Excel to perform a pivot on the data and it only takes a couple of seconds.  I want to be able to do that dynamically in SQL.
Yeah, for a small list of particular invoice numbers, it's probably not too much overhead to get the max count first and then use that info to build the dynamic query/result.

For all or a lot of invoice numbers, having to get the max count first is likely almost as costly as the full retrieval query would be.

Bottom line, in SQL Server, there's not a super-easy way to do what you want to do .
That is why I am hoping to build it ahead of time so that I can script it for when they start needing that information.  I can use the PIVOT function, but have not figured out a good way to make it work without aggregated data.
Not sure if it's helpful but I use this code to dynamically build tables:

CREATE TABLE #test (Invoice_Num VARCHAR(50))

DECLARE @num INT
DECLARE @end INT
SET @num = 1
SET @end = 100
DECLARE @sql VARCHAR(max)

WHILE @num <= @end
BEGIN
	SET @sql = 'ALTER TABLE #test ADD Emp' + CAST(@num AS VARCHAR(50)) + ' INT'
	EXEC(@sql)
	SET @num = @num+1
END

SELECT * FROM #test
DROP TABLE #test

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Russell - that's a lot less code than I wrote! :)

But, it seems as if it doesn't left justify like in the first post and instead puts each empID under a column labeled after itself.  I think, if you had 1000 different emp_ids that are used each once you'd have 1001 columns.
@Dustin: yes, but I think that's what Brian wants.
For testing I wrote a data generator to match the data Brian has in his sample:

--Generate Utility Table
CREATE TABLE #Alphabet (letter VARCHAR(50))
INSERT INTO #Alphabet VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I');
GO

CREATE TABLE #Length (lgth INT)
INSERT INTO #Length VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)
GO

--Populate Fake Data
DECLARE @a VARCHAR(50)
DECLARE @b INT
DECLARE @start INT
DECLARE @end INT
DECLARE @employeesTotal INT
DECLARE @maxEmployeePerInvoice INT
DECLARE @thisInvoice VARCHAR(50)
DECLARE @thisEmployee INT
DECLARE @iteration INT
DECLARE @maxIteration INT

SET @start = 1
SET @end = 4000

SET @employeesTotal = 2000 
SET @maxEmployeePerInvoice = 40
SET @iteration = 2
SET @maxIteration = 1

WHILE @start <= @end
BEGIN
	IF @iteration > @maxIteration
	BEGIN
		SET @a = (SELECT TOP 1 * FROM #Alphabet ORDER BY NEWID())
		SET @b = (SELECT TOP 1 * FROM #Length ORDER BY NEWID()) * 1000 + (SELECT TOP 1 * FROM #Length ORDER BY NEWID()) * 100 + (SELECT TOP 1 * FROM #Length ORDER BY NEWID()) * 10 + (SELECT TOP 1 * FROM #Length ORDER BY NEWID())
		SET @thisInvoice = (SELECT @a + CAST(@b AS VARCHAR(50)))
		SET @iteration = 1
		SET @maxIteration = (SELECT FLOOR(RAND()*(@maxEmployeePerInvoice-1)+1))
	END

	SET @thisEmployee = (SELECT FLOOR(RAND()*(@employeesTotal-1)+1))

	INSERT INTO invoiceemp (Invoice_Num, Emp_ID) VALUES (@thisInvoice, @thisEmployee)

	SET @iteration = @iteration + 1
	SET @start = @start + 1

END

--Clean Up
DROP TABLE #Length
DROP TABLE #Alphabet

Open in new window


Just change this line to the table you want to put test data in:
INSERT INTO invoiceemp (Invoice_Num, Emp_ID) VALUES (@thisInvoice, @thisEmployee)

Open in new window


and
SET @start = 1
SET @end = 4000

SET @employeesTotal = 2000 
SET @maxEmployeePerInvoice = 40

Open in new window


Where @end is the total number of records to create, @employees total is the total pool of empIDs to pull from, @maxEmployeePerInvoice is the max number of empids that can be on one invoice.

There were 2 issues with not sliding the results to the leftI found, the first being that if there are 1000 unique empIDs used then there are 1000 columns most of which are nulls.  The second problem was that if there are more than 1023 unique ids, it hit the column limit.

My results are as such (here's a sample attached but the results here are for 4000 records with up to 40 empIDs per invoice).  I think it's exactly what you wanted.  But I think an  expert might be able to condense my code- I think I just naturally approach these things the same way I would in C# vs an elegant query.
resultsQuery.png
@Russell Fox - This is what I needed, but I couldn't use it for the project that I was working on yesterday.  I am going to save that for in the future, though.  

@ScottPletcher - I wanted something dynamic, but ended up using yours for the project that I was working on yesterday.  

@Dustin Saunders - I am going to save that for my bag of tricks for later!

Thanks, guys!  I really appreciate it!