Link to home
Start Free TrialLog in
Avatar of friend_s
friend_s

asked on

Sql Server table column data concat

Hi, can i please get help with the below. I have a sql server table with close to 200,000 records.  The data is similar to the top pattern in the attached file. I need to change it into the bottom pattern(into a new table) in the attached file. Appreciate your response.
ModifyTableData.doc
Avatar of Member_2_861731
Member_2_861731
Flag of Canada image

How will you know which values belong to a specific KeyNumber, when it's NULL for all the values that should be grouped by?
Friend_s: Firstly, you need to replace the nulls in KeyNumber with the correct KeyNumber values (I'm assuming that records with IDs 2, 3, & 4 in your example have the same KeyNumber as record 1).   (I have no SQL Server here, so please forgive typos!)

update ot
    set ot.KeyNumber = (  select KeyNumber    --get keynumber val
                                         from OriginalTable     --for that lower ID
                                         where ID = ( select max(ID)   --max. lower ID than current one
                                                               from OriginalTable    --that has a KeyNumber
                                                               where ID < ot.ID
                                                               and KeyNumber is not null )
                                        )
    where ot.KeyNumber is null
    from OriginalTable ot

Open in new window


(work intervened here - I'll try to do the rest later!)
Avatar of friend_s
friend_s

ASKER

DcpKing:   This data format is from a public website over which we have no control. We are told to use their data as is. So we also assumed that records with IDs 2, 3, & 4 in my example have the same KeyNumber as record 1. That is the reason i showed the bottom expected results table in that manner. I will look forward to your reply.
Hi Friend_s,

sorry it took so long.

After you've run the first part, on OriginalTable, run this. You'll need to do a few things first -
1. the code below will run perfectly and contains the data within it
2. If you want to run it twice, un-comment the two drop table lines at the top!
3. Replace the references to #ta with your OriginalTable name, and #tb with your destination table, and take out lines 1-21 and 23 to rid yourself of the demo part.

Have fun!!

hth

Mike



--	-----------------------------Load demo source table
--drop table #ta
--drop table #tb
create table #ta(ID int, KeyNumber varchar(16), ColumnA varchar(16), ColumnB varchar(16), ColumnC varchar(16))
insert into #ta values 
(1,	'1111111111',	'ValueA1',	'ValueB1',	'ValueC1'),
(2,	'1111111111',	'ValueA2',	'ValueB2',	'ValueC2'),
(3,	'1111111111',	'ValueA2',	'ValueB2',	'ValueC3'),
(4,	'1111111111',	'ValueA3',	'ValueB3',	'ValueC4'),
(5,	'2222222222',	'ValA1',	'ValB1',	'ValC1'),
(6,	'2222222222',	'ValA2',	'ValB2',	'ValC2'),
(7,	'2222222222',	'ValA3',	'ValB3',	'ValC3'),
(8,	'2222222222',	'ValA4',	'ValB4',	'ValC4'),
(9,	'2222222222',	'ValA4',	'ValB5',	'ValC4')
select * from #ta
--	-----------------------------
create table #tb(	ID int identity(1, 1), 
					KeyNumber varchar(16), 
					ColumnA varchar(64), 
					ColumnB varchar(64), 
					ColumnC varchar(64))
insert into #tb select distinct KeyNumber, '', '', '' from #ta
select * from #tb 
declare @strKeyNumber varchar(16)
declare @strCmd varchar(3000) = ''

while 'True' = 'True'
begin
	select @strKeyNumber = min(KeyNumber) from #tb where ColumnA = ''
	select @strKeyNumber as '@strKeyNumber'
	if @strKeyNumber is null break
	--	---------------------
	set @strCmd = ' '
	set @strCmd = @strCmd + 'declare @taCol table(id int identity(1, 1), Col varchar(16)); '
	set @strCmd = @strCmd + 'declare @strTemp varchar(144) = '''', '
	set @strCmd = @strCmd + '		@intLine int, '
	set @strCmd = @strCmd + '		@intNrLines int; '
	set @strCmd = @strCmd + 'delete from @taCol; '
	set @strCmd = @strCmd + 'insert into @taCol '
	set @strCmd = @strCmd + '	select distinct ColumnA from #ta where KeyNumber = ''' + @strKeyNumber + '''; '
	set @strCmd = @strCmd + 'set @intLine = 1; '
	set @strCmd = @strCmd + 'set @intNrLines = (select count(*) from @taCol); '
	set @strCmd = @strCmd + 'while ''True'' = ''True'' '
	set @strCmd = @strCmd + 'begin '
	set @strCmd = @strCmd + '	set @strTemp = @strTemp + (select Col from @taCol where id = @intLine); '
	set @strCmd = @strCmd + '	if @intLine = @intNrLines break; '
	set @strCmd = @strCmd + '	set @strTemp = @strTemp + '', '';  '
	set @strCmd = @strCmd + '	set @intLine = @intLine + 1; '
	set @strCmd = @strCmd + 'end; '
	set @strCmd = @strCmd + 'update #tb '
	set @strCmd = @strCmd + '	set ColumnA = @strTemp '
	set @strCmd = @strCmd + '	where KeyNumber = ''' + @strKeyNumber + '''; '
	exec(@strCmd)
end
select * from #tb 

Open in new window

Mike, thank you very much. Your code gave the result for one column. Can you please tell about the other columns? In addition, the real data has 3 more columns like KeyNumber but they are not null. I am attaching the file here. If possible, can you please take a look?
ModifyTableData.doc
ok, although you might not want to see this :)

What I did was wrap the dynamic SQL inside another layer, running the whole thing once through for each column, just like I was doing for each KeyNumber.

It looks horrendous, but you'll find that all you do is double every "   '  " to a "   ''  " and wrap, so that
"declare @str varchar(2) = ''  "
becomes
 "@strCmd = @strCmd + 'declare @str varchar(2) = ''''    ' "

One thing: I used a shortcut assigning data into @taColNames - if you're not using SQL Server 2012 you'll need to expand it out (sorry!).


--	-----------------------------Load demo source table
drop table #ta
drop table #tb
create table #ta(ID int, KeyNumber varchar(16), Action varchar(3), Type varchar(8), Code varchar(8), ColumnA varchar(16), ColumnB varchar(16), ColumnC varchar(16))
insert into #ta values 
(1,	'1111111111',	'AC1', 'Type1', 'Code1', 'ValueA1',	'ValueB1',	'ValueC1'),
(2,	'1111111111',	'AC1', 'Type1', 'Code1', 'ValueA2',	'ValueB2',	'ValueC2'),
(3,	'1111111111',	'AC1', 'Type1', 'Code1', 'ValueA2',	'ValueB2',	'ValueC3'),
(4,	'1111111111',	'AC1', 'Type1', 'Code1', 'ValueA3',	'ValueB3',	'ValueC4'),
(5,	'2222222222',	'AC2', 'Type2', 'Code2', 'ValA1',	'ValB1',	'ValC1'),
(6,	'2222222222',	'AC2', 'Type2', 'Code2', 'ValA2',	'ValB2',	'ValC2'),
(7,	'2222222222',	'AC2', 'Type2', 'Code2', 'ValA3',	'ValB3',	'ValC3'),
(8,	'2222222222',	'AC2', 'Type2', 'Code2', 'ValA4',	'ValB4',	'ValC4'),
(9,	'2222222222',	'AC2', 'Type2', 'Code2', 'ValA4',	'ValB5',	'ValC4')
select * from #ta
--	-----------------------------
create table #tb(	ID int identity(1, 1), 
					KeyNumber varchar(16),
					[Action] varchar(3), 
					[Type] varchar(8), 
					Code varchar(8),  
					ColumnA varchar(64), 
					ColumnB varchar(64), 
					ColumnC varchar(64))
insert into #tb select distinct KeyNumber, [Action], [Type], Code, '', '', '' from #ta
select * from #tb 
--	-----------------------------
declare @taColNames table (id int identity(1, 1), ColName varchar(16), Done varchar(3))
insert into @taColNames values ('ColumnA', 'No'), ('ColumnB', 'No'), ('ColumnC', 'No')
declare @strOuterCmd varchar(3000),
		@strThisColumnName varchar(16),
		@intID int
While 'True' = 'True'
begin
	set @intID = (select top 1 id from @taColNames where Done = 'No' order by id)
	set @strThisColumnName = (select ColName from @taColNames where id = @intID)
	if @strThisColumnName is null   break   --this happens when there aren't any matching records
	--	---------------------------------------------
	set @strOuterCmd = ''
	set @strOuterCmd = @strOuterCmd + '	declare @strKeyNumber varchar(16)  '
	set @strOuterCmd = @strOuterCmd + '	declare @strCmd varchar(2000) = ''''   '
--	-------------------
	set @strOuterCmd = @strOuterCmd + '	while ''True'' = ''True''   '
	set @strOuterCmd = @strOuterCmd + '	begin   '
	set @strOuterCmd = @strOuterCmd + '	select @strKeyNumber = min(KeyNumber) from #tb where ' + @strThisColumnName + ' = ''''   '
	set @strOuterCmd = @strOuterCmd + '	select @strKeyNumber as  ''@strKeyNumber''   ' 
	set @strOuterCmd = @strOuterCmd + '		if @strKeyNumber is null break   '
	--	---------------------
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = '' ''  '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''declare @taCol table(id int identity(1, 1), Col varchar(16)); ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''declare @strTemp varchar(144) = '''''''', ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''		@intLine int, ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''		@intNrLines int; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''delete from @taCol; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''insert into @taCol ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	select distinct ' + @strThisColumnName + ' from #ta where KeyNumber = '''''' + @strKeyNumber + ''''''; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''set @intLine = 1; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''set @intNrLines = (select count(*) from @taCol); ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''while ''''True'''' = ''''True'''' ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''begin ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	set @strTemp = @strTemp + (select Col from @taCol where id = @intLine); ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	if @intLine = @intNrLines break; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	set @strTemp = @strTemp + '''', '''';  ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	set @intLine = @intLine + 1; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''end; ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''update #tb ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	set ' + @strThisColumnName + ' = @strTemp ''      '
	set @strOuterCmd = @strOuterCmd + '		set @strCmd = @strCmd + ''	where KeyNumber = '''''' + @strKeyNumber + ''''''; ''      '
	set @strOuterCmd = @strOuterCmd + '		select @strCmd as ''InnerCommand'' '
	set @strOuterCmd = @strOuterCmd + '		exec(@strCmd) '
	set @strOuterCmd = @strOuterCmd + '	end '
	set @strOuterCmd = @strOuterCmd + '	select * from #tb '
--	---------------------------------------------
	update @taColNames set Done = 'Yes' where id = @intID
	select @strOuterCmd  as 'OuterCommand'                             
	exec(@strOuterCmd)
end
select * from #tb

Open in new window



Have fun - hope it helps

Mike
Mike, thanks a lot. But i noticed it does the concat of those last columns only for KeyNumber. But we need to do the concat of those last columns for this combination : KeyNumber, [Action], [Type], Code. Which means we need something like  :   where  KeyNumber=@strKeyNumber  and  Action=@strAction  and  Type=@strType  and  Code=@strCode.  ------------these four are the distinct combination. Can you please help with this?
Mike, i tried your code using just the keynumber on the real table of 200,000 plus rows, and it was too slow. I myself wrote different code for this using the sql STUFF function and that was too slow. So i need something that is faster.
To speed thing up some try this:

Change the structure of both #ta and #tb to make KeyNumber a primary key:
In each case change "KeyNumber varchar(16)" to  "KeyNumber varchar(16) primary key"

The various queries and inserts are being done depending on this field, so keying it should help.

hth

Mike
Mike, thank you. Can you please see my previous post where i said we need something like  :   where  KeyNumber=@strKeyNumber  and  Action=@strAction  and  Type=@strType  and  Code=@strCode.  ------------these four are the distinct combination.
I've been looking at it - it won't be too difficult (I think!). We're on a long holiday weekend here in the USA, so I'm a bit occupied by family! I should get to it in the next day, tho'!
Thank you Mike. When you get to it, will it be possible to make your demo table have 200,000 rows (may be insert data into it in a loop) and see if it takes less time with your solution?
Well I could, but you should know that I'm working on a VM on a laptop :)
Friend_s: Make the two tables permanent for testing (just take off the "#"). Then add a non-clustered non-unique index to both tables, on field KeyNumber in each case. That took me from 280 records per minute into tb to 5800 !  I'll admit that it isn't the fastest of processes, as it may take 3 hours to do what you want for all three fields. As I said earlier, though, those numbers are on a laptop in a Windows VM with other things going on too, so you should get better numbers on a server.
Mike, thank you for your time.  I posted this question on another website and got results from that solution in 30 seconds. So far, this is the fastest among the different approaches i tried. So going ahead with that. Once again, appreciate your time and effort.
Well, I think I demonstrated that T-SQL is a bit of a dog for concentrated string manipulation! A million rows of input took 9 hours! How did you end up doing it?
ASKER CERTIFIED SOLUTION
Avatar of friend_s
friend_s

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Interesting. Thanks. Mike
The database table was huge, and the code i got from here, as well as the different ways i did, were taking too long on this huge table. Finally this link gave me the faster results.