Link to home
Start Free TrialLog in
Avatar of jmanGJHS97
jmanGJHS97

asked on

Column-by-column comparison of the data in two SQL Server 2000 databases

Can anyone give me a SQL script that takes the tables in two databases and compares the data column-by-column to see where the differences are?  I have an application that I want to verify that production database contains exactly the same data as the development database.

I'd be willing to run such a script once per table and input the table name manually; that's no big deal.  The comparison of the values in every row and column is the thing I'm most interested in.

Thanks.
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

on the other hand: if you want to have the 2 "identical", why not just do a backup+restore? :)
Avatar of jmanGJHS97
jmanGJHS97

ASKER

I've got a utility to use, but the issue is that the production database has a bunch of clustered primary keys, and the development database doesn't, so the utility won't run because the schemas aren't the same.

Also, it's not that I want to make an identical copy of the database.  What I want is to verify whether they already are identical.

We had an attack on our site, and we need to see what changed on the production database, if anything, so we want to compare it to dev.
You can run following statement for each table on dev and prod db's and see if tables contain same data or not:

select CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM [Type Your Table Name Here]

If you want to see row differences, and have SQL Server 2005 on your development platform, you can compare tables and generate script for equalizing data using the utility called tablediff.exe located at

\Program Files\Microsoft SQL Server\90\COM

directory. It can be used compare both SQL Server 2000 and 2005 databases. The syntax is simple and like this:

tablediff -sourceserver "InstanceName" -sourcedatabase "DBName" -sourcetable "TableName" -destinationserver "InstanceName" -destinationdatabase "DBName" -destinationtable "DN_TableName"
-o "out.txt"

-o switch creates a text file from which you can see differences.

Following article gives details on the utility.

http://msdn.microsoft.com/en-us/library/ms162843.aspx
Ok, for example, I did this:

select CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM [prod_db].[dbo].[Customer]
select CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM [development_db].[dbo].[Customer]

And the results were:

1749344427
300041798

What does this tell me?
It means the tables are different. If they contain exactly same data, binary checksum gives same results. In order to check row equality by just using SQL, you can remove CHECKSUM_AGG, then the query returns a checksum for each row that can be compared againts other table.

My question is that, do every table of your db has a primary key (even if it is not defined). If not, how do we know if a row corresponds to other?
Ok, so there's several things going on here.  First, the table schemas are not the same.  For example, one table in the prod db would be this:

CREATE TABLE [tblAffiliate] (
      [AffiliateID] [int] IDENTITY (1, 1) NOT NULL ,
      [strEmail] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [dtmDate] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      CONSTRAINT [PK_tblAffiliate] PRIMARY KEY  CLUSTERED
      (
            [AffiliateID]
      )  ON [PRIMARY]
) ON [PRIMARY]
GO

The dev equivalent is:

CREATE TABLE [tblAffiliate] (
      [AffiliateID] [int] NOT NULL ,
      [strEmail] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
      [dtmDate] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO

So, the fields are the same, but the prod db has that constraint that establishes a clustered primary key, and the dev db doesn't have a primary key at all (unless I'm missing something).  Would this alone be enough for the SQL statement you gave me to return that they are not the same?

Given that the tables are known to not have the exact same schema, I need to know if the data contained in each table is the same in both databases.
ASKER CERTIFIED SOLUTION
Avatar of inter
inter
Flag of Türkiye image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If the schema are different, then, the checksum route is not for you - they are different structures and so will be different...

can you "see" the prod and dev databases from the same server ?

There is a manual way that could be scripted fairly easily, of recursing the information_schema.column_names on one table and checking if that column name exists on the other table and then checking content. Does that sound like the type of thing you are looking for ?

inter,

I'll give that a try.  Thanks.

mark_wills,

Yes, the prod and dev databases can be seen from the same server.  Yes, that's what I'd be interested in.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great, I'll give it a whirl.
I ended up adding the contstraints to the tables in order to use the 3rd party utility we purchased previously.  I did try both of your solutions, and they seem to work fine as well, so I'm going to credit you both for your help.

Thanks a lot.