Solved

How to prepare a fake database via SQL of a working database for testing purposes for development

Posted on 2011-09-20
39
410 Views
Last Modified: 2013-12-13
We would like to test our VB.NET developments on database with real data but not affect the production database.

We've been tweaking with the idea of making a copy of the production database, rearrange all sensitive data as: last names, addresses and invoices, payments amounts.

The problem we're having is the algorithm or logic of relating masters tables and their transactions tables.

What we're looking for is to have the same structure from the production database so we can developed on it.  In addition, for validating the apps, the values of agings, historicals, and also to have it square off with their related tables.

We would liek to go about using SQL script since our DB are in MS SQL.

0
Comment
Question by:rayluvs
  • 21
  • 17
39 Comments
 

Author Comment

by:rayluvs
ID: 36571649
Here's is an example of what we we're trying to do:

ACTUAL REAL DATABASE CONTENTS:

     Customer                          Address                      Invoice Amt    payment Amt
     ===================  =============== ========== ===========
     Johnny Walker Red Comp. Corner Drive 124, LA       15,000.00         12,520.00
     Big E LTD.                        Alba 9087, NY                    250.00             250.00
     Summers Inc                    Raton Hall, FL                  4,569.00              100.00

FAKE DATABASE CONTENTS FOR DEVELOPING & TESTING:

     Customer                          Address                      Invoice Amt    payment Amt
     ===================  =============== ========== ===========
     Johnny Walker Red Comp. Corner Drive 124, LA       15,000.00         12,520.00
     Big E LTD.                        Alba 9087, NY                    250.00             250.00
     Summers Inc                    Raton Hall, FL                  4,569.00              100.00
0
 
LVL 22

Accepted Solution

by:
pivar earned 500 total points
ID: 36572129
Hi,

I'm not sure I understand where the problem lies.

Use a copy of your production db, that is, backup and restore to another db (in Management Studio rightclick the db choose Tasks/Backup and Restore). You would then test on the same data and model/structure as the production db. You need to be able to change connectionstring/db in your application.

In your example I can't see any differences in the data. But it would be possible, via script, to "de-personalize" the data. But you have to show the tables and in which way you want them changed.

/peter
0
 
LVL 15

Expert Comment

by:Eyal
ID: 36572141
backup production database
move it to development server
restore it
and use it
0
 

Author Comment

by:rayluvs
ID: 36572212
What we want is to "de-personalize" the data.  That is, we have in-house programmer and outside consultant.  We don't want to have he programmer or the consulting to see where the owner lives, his phone, his age, etc.  

You are correct, I didn't change how we want to have it; this is how should be:

ACTUAL REAL DATABASE CONTENTS:

     Customer                          Address                      Invoice Amt    payment Amt
     ===================  =============== ========== ===========
     Johnny Walker Red Comp. Corner Drive 124, LA       15,000.00         12,520.00
     Big E LTD.                        Alba 9087, NY                    250.00              250.00
     Summers Inc                    Raton Hall, FL                  4,569.00              100.00

FAKE DATABASE CONTENTS FOR DEVELOPING & TESTING:

     Customer                          Address                      Invoice Amt    payment Amt
     ===================  =============== ========== ===========
     Johnny Walker Red Comp. xxxxxxxxx                            25.00               20.00
     Big E LTD.                        xxxxxxxxx                            25.00               25.00
     Summers Inc                    xxxxxxxxx                            25.00               20.00

Hope this helps identify what we are looking for.
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36572412
OK, so after you backuped and restored to your TESTdb, you could run an update statement on your TESTdb.

UPDATE thetable SET Address='xxxxxxxxx', [Invoice Amt]=25,  [payment Amt]=CASE WHEN [Invoice Amt]=[payment Amt] THEN 25 ELSE 20 END

0
 

Author Comment

by:rayluvs
ID: 36573795
We'll try.

How about the CustID/VendorID/EmployeeID?  We use actual naming for the ID codes and when looked at a person can determine who is who.
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36574118
I may misunderstand you know.

The CustID/VendorID/EmployeeID are int IDENTITY fields? In that case, as you make a copy of the db, they will stay the same.
0
 

Author Comment

by:rayluvs
ID: 36574138
No they are not.  The Identity fields are sepertaly.
0
 
LVL 22

Expert Comment

by:pivar
ID: 36574150
What kind of data is CustID/VendorID/EmployeeID? But as I said earlier, they will be the same as the production db. Do you wan't to de-personalize them as well?
0
 

Author Comment

by:rayluvs
ID: 36574178
Yes because when read, it can be identified.  the CustID/VendorID/EmployeeID are located both on the master, invoice file and payment file.
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36574239
OK, so how they look today? And how do you wan't them to look like (if you have any preferences)? And do you have an IDENTITY column corresponding to the CustID/VendorID/EmployeeID (or some other unique key)?
0
 

Author Comment

by:rayluvs
ID: 36574365
I just want to change the id so it wont be recognize.  there is no preference for Identity column for the I'd

Maybe the cust/vend, etc I'd replace each characters?
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36574427
Is there a relation between 2 or more tables for each ID? Then we have to change the ID for every table, and I was looking for some common ground to change in multiple tables. So we don't have to use a temporary table.
0
 

Author Comment

by:rayluvs
ID: 36578448
There is no tied from table to table, nor one dependent from the other.  But the transaction table hold the ID of the master table.
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36578733
So you have the same ID in two tables. Then we need to find a way to change them so they continue to have the same data but de-personalized. Is there an unique ID in master table that we can use? Or better can you show both master and transaction tables with all columns?
0
 

Author Comment

by:rayluvs
ID: 36580715
Ofcourse.  The tables have a lot of columns but here is the most important:

Master Table:
Id     EmployeeID             EmpName            EmpAddr
1       Johnsmity                John Smity           123 Road
2       Lilygold                    Lily Gold               700 Calif

Invoice Table
Id      Date                         InvNo                    Amt       EmpId
1        2/2/2011                  333333           1000.00             2

Payment Table
Id      Date                      CheckNo          Amt    EmpId  InvId
1        3/2/2011                 333333       500.00           2       1

This is the basic structure where transaction table points to master tables (in this case invoice & payment table)
0
 
LVL 22

Expert Comment

by:pivar
ID: 36580790
So EmployeeID exists only in master table?

So you just want to de-personalize this column.

UPDATE MasterTable SET EmployeeID='xxxx', EmpName='xxxxx xxxxx', EmpAddr='xxxxxxxxx'
0
 

Author Comment

by:rayluvs
ID: 36581122
Sorry, my mistake, the Employe ID exist also in the transactions, in addition to relate it to the ID of Masters; here is the correct format:

Master Table:
Id     EmployeeID             EmpName            EmpAddr
------ ----------------------------- ------------------------- ----------------------
1     Johnsmity                John Smity           123 Road
2     Lilygold                    Lily Gold               700 Calif

Invoice Table
Id     EmployeeID             Date                         InvNo                    Amt       EmpId
------ ------------------------------ ----------------------------- -------------------- ----------------- -------------
1      Lilygold                   2/2/2011                  333333           1000.00             2

Payment Table
Id     EmployeeID             Date                      CheckNo          Amt        EmpId     InvId
------ ------------------------------ ----------------------------- -------------------- -------------  ------------  --------
1      Lilygold                        3/2/2011                 333333       500.00           2       1
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36583441
OK, you could use a hash function to de-personalize ids, like

UPDATE MasterTable SET EmployeeID=CHECKSUM(EmployeeID), EmpName='xxxxx xxxxx', EmpAddr='xxxxxxxxx'
UPDATE InvoiceTable SET EmployeeID=CHECKSUM(EmployeeID)
UPDATE PaymentTable SET EmployeeID=CHECKSUM(EmployeeID)
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:rayluvs
ID: 36583748
What checksum for?

Also I see that the master is updated first, is this ok to if the transactions are related to the I'd of the checksum
0
 

Author Comment

by:rayluvs
ID: 36583767
Just googled checksum and it looks that it create a unique value but also say of requirements for the column.  It kind of confusing.  Can you elaborate on checksum as to our need?
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36583780
CHECKSUM computes a hashvalue for the string, it will always be the same for the same stringvalues.
There is a (very) small risk that different stringvalues can get the same hashvalue. But in this case I think the risk is worth taking.
I assumed you didn't have any foreign keys between the tables. But if you have, you have to disable them before updating. Then it doesn't matter which one is first.
0
 

Author Comment

by:rayluvs
ID: 36586594
Ok to finalized:

  - first create a table with original Master ID codes so we can use when changing all
    transaction tables ID code
  - then disable any foreign keys if any
  - Proceed with the actual ID change

    UPDATE InvoiceTable SET EmployeeID=CHECKSUM(EmployeeID)
    UPDATE PaymentTable SET EmployeeID=CHECKSUM(EmployeeID)UPDATE MasterTable SET
    EmployeeID=CHECKSUM(EmployeeID), EmpName='xxxxx xxxxx', EmpAddr='xxxxxxxxx'

  - enable foreign keys

Would this be correct?  Please advice

(i placed a foreign key question since its separate from this, if you can answered, it be great http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_27324057.html)
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36589672
Yes, except for the mastertable creation. That table already existed?
If you're thinking of making a temporary table to map ID to de-personalized ID, there is no need since you're going to use checksum to de-personalize the IDs.
Also, you only need to disable EmployeeID foreign keys.
0
 

Author Comment

by:rayluvs
ID: 36591442
I understood that we needed the Temp table to hold the real Employee ID and the hashed value.  Then use that table to join each additional table to change the ID to the hashed code.

If it's not like this, please advice on how to change various tables when the ID code is referenced.
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36593588
No, there is no need for a temp table. Just remove the foreign key on EmployeeID and exec the updates and re-add the foreign key.
0
 

Author Comment

by:rayluvs
ID: 36593628
Don't understand and I think I need more specifics.  If all tables has an EmployID column and all transaction tables refers to the EmployID of the master table in order to print reports, if I change the master ID by assigning the checksum value, how do the trans tablea know which checksum to place?1
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36596243
If you use checksum it will always return the same output for the same input. For instance, if you have LilyGold as EmployeeId, executing CHECKSUM('LilyGold') will always return 628981680.
So if you update EmployeeId for all three tables with CHECKSUM(EmployeeId) the new value for EmployeeId will be the same on all three tables for the same EmployeeId.

-- Disable FK
ALTER TABLE InvoiceTable DROP CONSTRAINT FK_Invoice_Master;
ALTER TABLE PaymentTable DROP CONSTRAINT FK_Payment_Master;
GO

-- De-personalize employee info
UPDATE InvoiceTable SET EmployeeID=CHECKSUM(EmployeeID);   -- CHECKSUM('LilyGold') => 628981680
UPDATE PaymentTable SET EmployeeID=CHECKSUM(EmployeeID);  -- CHECKSUM('LilyGold') => 628981680
UPDATE MasterTable SET EmployeeID=CHECKSUM(EmployeeID), EmpName='xxxxx xxxxx', EmpAddr='xxxxxxxxx';  -- CHECKSUM('LilyGold') => 628981680
GO

-- Enable FK
ALTER TABLE InvoiceTable ADD CONSTRAINT FK_Invoice_Master FOREIGN KEY (EmployeeID) REFERENCES MasterTable (EmployeeID);
ALTER TABLE PaymentTable ADD CONSTRAINT FK_Payment_Master FOREIGN KEY (EmployeeID) REFERENCES MasterTable (EmployeeID);
GO

Open in new window

0
 

Author Comment

by:rayluvs
ID: 36596353
OK!!!  Now I understand!  I thought that we needed to hold the old value, but that is not the case with CHECKSUM.  

In conclusion CHECKSUM will change the EmployeeID value to the same value no matter what table is in.

Now that we fully understand your recommendation (we just finished testing partially here & works excellently), and prior closing the question, we have one last related question:

If a user that knows a bit of SQL, and deducts that CHECKSUM has been used, is there a way to decipher or revert the value of the CHECKSUM back to its original value? ; that is if 'LilyGold' = 628981680, can 628981680 be converted back to 'LilyGold'?

Please advice on this?
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36597522
No that should not be possible. As I mentioned earlier, it is possible to get the same checksum for different values. For a safer Hashalgorithm, you could use HASHBYTES instead. But then you need a 32/40 bytes column for EmployeeId and a functioon to convert binary to hex. If you're on SQL Server 2008, CONVERT can do that, otherwise you need a BinaryToHex function.
0
 

Author Comment

by:rayluvs
ID: 36597563
Thanx for the info.

We're almost done, but found 2 error messages:

   Msg 4406, Level 16, State 1, Line 1
   Update or insert of view or function 'TableX' failed because it contains a derived or constant field.

   (519 row(s) affected)
   Msg 2601, Level 14, State 1, Line 1
   Cannot insert duplicate key row in object 'dbo.xx_acctcode' with unique index 'XAKxx_acctcode'.
   The statement has been terminated.

Please advice
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36597590
1. What is TableX? You are trying to update a column that is constant/derived. Can you remove that column from your update statement?

2. What is dbo.xx_acctcode and index 'XAKxx_acctcode? If it's EmployeeId of MasterTable, you should try to use HashBytes instead.
It says that you're trying to insert the same value twice (different rows) in a column that is constrained to unique values (with an index).

0
 

Author Comment

by:rayluvs
ID: 36597653
Question...

What is 'constant/derived'?

Regarding index 'XAKxx_acctcode' HashBytes, what would be the format?

We found the following, but don't quite understand it:

   http://msdn.microsoft.com/en-us/library/ms174415.aspx

   DECLARE @HashThis nvarchar(4000);
   SELECT @HashThis = CONVERT(nvarchar(4000),'dslfdkjLK85kldhnv$n000#knf');
   SELECT HashBytes('SHA1', @HashThis);

  Where would EmployeeID be placed and would be the same as Checksum as to that it will
  the same output in all tables with EmployeeID?
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36597927
1. In your view you have a column that is defined like

SELECT 1 AS constant, CONVERT(varchar(40), HASHBYTES('SHA1', EmployeeId), 2) AS derived

2. Instead of CHECKSUM, use (if SQL 2008)
EmployeId = CONVERT(varchar(40), HASHBYTES('SHA1', EmployeeId), 2)
0
 

Author Comment

by:rayluvs
ID: 36598060
We are using SQL 2000 and SQL 2005.

What does it mean when the error says "...failed because it contains a derived or constant field..."?
0
 
LVL 22

Assisted Solution

by:pivar
pivar earned 500 total points
ID: 36598095
Then try (fn_varbintohexstr is an undocumented function that should wiork, haven't tried it on 2005 though).

EmployeId = master.dbo.fn_varbintohexstr(HASHBYTES('SHA1', EmployeeId))


You're trying to update a column in the view that isn't updatable. Can you show the view and how it's created? Also the update statement?

0
 

Author Comment

by:rayluvs
ID: 36598128
Hi...even though the 2 error messages are related to this question, I placed 3 separate question so it can be awarded separately:

  http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_27327397.html
  http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_27327402.html
  http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/Q_27327412.html

Please refer to these questions
0
 

Author Comment

by:rayluvs
ID: 36718595
Thank you very much my friend, all has been successful!!
0
 

Author Closing Comment

by:rayluvs
ID: 36718619
Excellent!
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now