databoks
asked on
Best way to compare Data in tables
Hi Experts.
I need to compare Data in a table with new information downloaded from a server.
This means the follwoing:
1. I download data from the webserver.
2. Read the data.
3. Before inserting, compare first. If the data is equal, then skip, if there is a change, then update and store the old value in a different table.
I have about 90 Columns in each table, this means that i dont want to make 90 if else statements.
I am using Linq To SQL
I need a way to do this dynamically.
Best Regards,
John Johnson.
I need to compare Data in a table with new information downloaded from a server.
This means the follwoing:
1. I download data from the webserver.
2. Read the data.
3. Before inserting, compare first. If the data is equal, then skip, if there is a change, then update and store the old value in a different table.
I have about 90 Columns in each table, this means that i dont want to make 90 if else statements.
I am using Linq To SQL
I need a way to do this dynamically.
Best Regards,
John Johnson.
Step 3 is the biggie here.
3. Before inserting, compare first. If the data is equal, then skip, if there is a change, then update and store the old value in a different table.
OK So I am assuming that you have the ability to identify your incoming data, i.e. you have some kind of a key. If that is the case then you can simply use the CHECKSUM function.
drop table t
go
drop table t2
go
create table t
(i int, v varchar(10), v2 varchar(10))
go
insert t values (1, 'abc', 'xyz')
insert t values (2, 'abcd', 'ghi')
go
create table t2 (i int, v varchar(10), v2 varchar(10), )
go
insert t2 values (1, 'abc', 'xyz')
insert t2 values (2, 'abd', 'ghi')
insert t2 values (3, 'edcd', 'xx')
go
ALTER TABLE t ADD CHK AS CHECKSUM(v,v2)
go
ALTER TABLE t2 ADD CHK AS CHECKSUM(v,v2)
go
select *
from t join t2 on t.i = t2.i
where t.chk <> t2.chk
go
However see [http://msdn.microsoft.com/en-us/library/ms189788.aspx]
If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change
The other way to do it is to create the comparison statement using dynamic sql. First of all you need to build you query by getting meta information about your columns. For nullable columns, you can't just check with not equals, thus it is a little bit complicated:
select
'(' +
CASE
WHEN IS_NULLABLE = 'YES'
THEN '(t1.' + COLUMN_NAME + ' IS NULL AND t2.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR ' +
'(t2.' + COLUMN_NAME + ' IS NULL AND t1.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR '
ELSE ''
END + '(t1.' + COLUMN_NAME + ' <> ' + 't2.' + COLUMN_NAME + ')'
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 't'
This will produce a list like this which you have to build together into a string. Write a loop and build up your query that way.
((t1.i IS NULL AND t2.i IS NOT NULL) OR (t2.i IS NULL AND t1.i IS NOT NULL) OR (t1.i <> t2.i)
((t1.v IS NULL AND t2.v IS NOT NULL) OR (t2.v IS NULL AND t1.v IS NOT NULL) OR (t1.v <> t2.v)
((t1.v2 IS NULL AND t2.v2 IS NOT NULL) OR (t2.v2 IS NULL AND t1.v2 IS NOT NULL) OR (t1.v2 <> t2.v2)
3. Before inserting, compare first. If the data is equal, then skip, if there is a change, then update and store the old value in a different table.
OK So I am assuming that you have the ability to identify your incoming data, i.e. you have some kind of a key. If that is the case then you can simply use the CHECKSUM function.
drop table t
go
drop table t2
go
create table t
(i int, v varchar(10), v2 varchar(10))
go
insert t values (1, 'abc', 'xyz')
insert t values (2, 'abcd', 'ghi')
go
create table t2 (i int, v varchar(10), v2 varchar(10), )
go
insert t2 values (1, 'abc', 'xyz')
insert t2 values (2, 'abd', 'ghi')
insert t2 values (3, 'edcd', 'xx')
go
ALTER TABLE t ADD CHK AS CHECKSUM(v,v2)
go
ALTER TABLE t2 ADD CHK AS CHECKSUM(v,v2)
go
select *
from t join t2 on t.i = t2.i
where t.chk <> t2.chk
go
However see [http://msdn.microsoft.com/en-us/library/ms189788.aspx]
If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change
The other way to do it is to create the comparison statement using dynamic sql. First of all you need to build you query by getting meta information about your columns. For nullable columns, you can't just check with not equals, thus it is a little bit complicated:
select
'(' +
CASE
WHEN IS_NULLABLE = 'YES'
THEN '(t1.' + COLUMN_NAME + ' IS NULL AND t2.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR ' +
'(t2.' + COLUMN_NAME + ' IS NULL AND t1.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR '
ELSE ''
END + '(t1.' + COLUMN_NAME + ' <> ' + 't2.' + COLUMN_NAME + ')'
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 't'
This will produce a list like this which you have to build together into a string. Write a loop and build up your query that way.
((t1.i IS NULL AND t2.i IS NOT NULL) OR (t2.i IS NULL AND t1.i IS NOT NULL) OR (t1.i <> t2.i)
((t1.v IS NULL AND t2.v IS NOT NULL) OR (t2.v IS NULL AND t1.v IS NOT NULL) OR (t1.v <> t2.v)
((t1.v2 IS NULL AND t2.v2 IS NOT NULL) OR (t2.v2 IS NULL AND t1.v2 IS NOT NULL) OR (t1.v2 <> t2.v2)
Um I forgot to add:
of course, if your table structure is static, just use the dynamic SQL above and
edit it all together as:
--> run this
select
CASE
WHEN IS_NULLABLE = 'YES'
THEN '(t1.' + COLUMN_NAME + ' IS NULL AND t2.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR ' +
'(t2.' + COLUMN_NAME + ' IS NULL AND t1.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR '
ELSE ''
END + '(t1.' + COLUMN_NAME + ' <> ' + 't2.' + COLUMN_NAME + ') OR'
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 't'
Copy the results put together something like this:
select t2.*
from t as t1 join t2 as t2 on t1.i = t2.i
where
(t1.v IS NULL AND t2.v IS NOT NULL) OR (t2.v IS NULL AND t1.v IS NOT NULL) OR (t1.v <> t2.v) OR
(t1.v2 IS NULL AND t2.v2 IS NOT NULL) OR (t2.v2 IS NULL AND t1.v2 IS NOT NULL) OR (t1.v2 <> t2.v2)
And that is your final statement to run. Of course you can always do this on the fly using the exec function but you need to program a lot more around it to get everything working.
of course, if your table structure is static, just use the dynamic SQL above and
edit it all together as:
--> run this
select
CASE
WHEN IS_NULLABLE = 'YES'
THEN '(t1.' + COLUMN_NAME + ' IS NULL AND t2.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR ' +
'(t2.' + COLUMN_NAME + ' IS NULL AND t1.' + COLUMN_NAME + ' IS NOT NULL)'
+ ' OR '
ELSE ''
END + '(t1.' + COLUMN_NAME + ' <> ' + 't2.' + COLUMN_NAME + ') OR'
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 't'
Copy the results put together something like this:
select t2.*
from t as t1 join t2 as t2 on t1.i = t2.i
where
(t1.v IS NULL AND t2.v IS NOT NULL) OR (t2.v IS NULL AND t1.v IS NOT NULL) OR (t1.v <> t2.v) OR
(t1.v2 IS NULL AND t2.v2 IS NOT NULL) OR (t2.v2 IS NULL AND t1.v2 IS NOT NULL) OR (t1.v2 <> t2.v2)
And that is your final statement to run. Of course you can always do this on the fly using the exec function but you need to program a lot more around it to get everything working.
ASKER
Thanks guy.
I forgot to mention that i download a XML file containig the data. I search through the XML for changes..
I use c#. I need c# code if possible.
I forgot to mention that i download a XML file containig the data. I search through the XML for changes..
I use c#. I need c# code if possible.
OK
CHECK
compare 2 xml files with csharp
http://www.daniweb.com/software-development/csharp/threads/46345
CHECK
compare 2 xml files with csharp
http://www.daniweb.com/software-development/csharp/threads/46345
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
i fixed this by using if else statements.
try to use EXISTS
http://msdn.microsoft.com/en-us/library/ms188336.aspx
if you are in sqlserver 2005 and up
try EXCEPT
http://msdn.microsoft.com/en-us/library/ms188055.aspx