scramble PK and name

anushahanna
anushahanna used Ask the Experts™
on
emp table has

PK First_Name Last_Name among other data

there 5000 rows there, and this is a test server.

there is a need to hide the identity of the employee, and yet allow it to look normal.

the logic needed is
the first and last name should be ordered randomly and assigned randomly to the PKs.

PK 1 might have been John Adams.

this John may go to PK 386 and the Adams part may go to PK 4098 and in return,
PK 109's Mary may get to PK 1's first name, and PK 2034's Last name Cooper may come to PK1.

And thus it has to be updated through code-

reverting back is not needful.

how would you do this in tsql?

thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Kent DyerIT Security Analyst Senior
Commented:
What you could do is to set it to be ROT-13 or an MD5 hash..  However, it would be helpful to have a seed to undo when needed..

Here is a good article to get you started..
http://www.sqlservercentral.com/articles/Security/freeencryption/1980/

HTH,

Kent
lcohanDatabase Analyst
Commented:
Wouldn't be native "Data Encryption and Decryption" enough for this?

http://msdn.microsoft.com/en-us/library/cc278098(SQL.100).aspx

Commented:
I don't understand, First Name and last Name are in the same table but you want to assign different PK?


You have one table emp:

PK FirstName LastName
1   John          adam
2   Mary          Cooper

and you want to transform it to looks like what?

PK      FirstName LastName
386    John          NULL        
4098  NULL         Adam
1        Mary         NULL
2034  NULL        Cooper

???




Become a Certified Penetration Testing Engineer

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

Expert of the Quarter 2010
Expert of the Year 2010
Commented:
Hope this is SQL 2005+
This will do nicely
/* test table
create table employee (PK int identity primary key, First_Name varchar(100), Last_Name varchar(100))
insert employee (first_name, last_name) select
'John', 'Parker' union all select
'Jim', 'Parker' union all select
'Mary', 'Holmes' union all select
'Poe', 'White' union all select
'Johnny', 'Brown' union all select
'Tim', 'Hopkins' union all select
'Tom', 'Hawkings' union all select
'Santa', 'Cruz' union all select
'Bob', 'Carlos'
*/

-- your query follows
;with randomized as (
select
	pk, first_name, last_name,
	pk_seq=row_number() over (order by pk),
	firstname_from=row_number() over (order by newid()),
	lastname_from=row_number() over (order by newid()) from employee
)
update r
set first_name = f.first_name, last_name = l.last_name
from randomized r
inner join randomized f on f.firstname_from=r.pk_seq
inner join randomized l on l.lastname_from=r.pk_seq

/*
-- quick test using sample table. won't match the test table anymore
select * from employee
*/

Open in new window

Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist
Commented:
<<the logic needed is
the first and last name should be ordered randomly and assigned randomly to the PKs.

PK 1 might have been John Adams.

this John may go to PK 386 and the Adams part may go to PK 4098 and in return,
PK 109's Mary may get to PK 1's first name, and PK 2034's Last name Cooper may come to PK1.>>
Let me get this straight.  Are you asking for primary keys to be permanently permutating according to the data entered ?  Sorry, but what you are describing can not be considered a primary key, in relational perspective.  

A primary key *must* be *stable*(subject to as little updates as possible) and *unique* to be called a primary key in the first place.  The scheme you are describing neither guarantee stability nor it can guarantee an easy way to insure uniqueness.

I recommend you keep things simple by creating an encrypted IDENTITY value that maps to each employee to hide it from audiences.  

Also keep in mind that, even though small, the probability for having a homonym (same first and last name) in a 5000 people span is *far* from being neglectable.  First and Last Name are usually insufficient to guarantee uniqueness.  I suggest First Name, Last Name and eventually DOB.

Hope this helps...

Author

Commented:
kdyer/lcohan

I'd look at that, but for this case, the management wants this for a quick need...

Author

Commented:
Cboudroz, yes, just shuffling the first and last names randomly.. but not touching the PK

Author

Commented:
cyberkiwi, PK is Identity but there are missing numbers.. for example 3005 and next number is 3009 etc

in the code, pk_sql thus does not equal to the pk.

thus row_number() over (order by newid() may not join properly on the PK to bring over the results, i think. how could we align the row_number results to inner join with the actual pk.

thanks

Expert of the Quarter 2010
Expert of the Year 2010

Commented:
anushahanna,

I specifically created a sequence over the PK rather than using the PK itself so that it will in fact align correctly.
Give it a try.

    pk_seq=row_number() over (order by pk)  -- this one "packs" the gaps in PK
    firstname_from=row_number() over (order by newid())  -- this and the next have no gaps, but are randomly ordered
    lastname_from=row_number() over (order by newid())
-- the next two lines marry up the random sequence with the sequential list of PK Ids
inner join randomized f on f.firstname_from=r.pk_seq
inner join randomized l on l.lastname_from=r.pk_seq

Regards

Author

Commented:
Racimo, this does not touch the PK- only reassign the other values - it is also an identity column....yes, I take your idea and will recommend including the DOB also.
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
> will recommend including the DOB also.

I hope I have shown you in general a method that can be easily expanded for any number of columns.

Author

Commented:
cyberkiwi, for some reason i am getting one value populated in all rows by the code- and that is what made me suspect about the skipping identity values being a problem.. do you think that could cause that problem?...
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
Can you check your code again?
This is nearly 20,000 rows and they are pretty much a mess after going through the query.

Run everything in the uncommented section in tempdb.
/* test table
use tempdb

create table employee (PK int identity primary key, First_Name varchar(100), Last_Name varchar(100))
insert employee (first_name, last_name) select
'John', 'Parker' union all select
'Jim', 'Parker' union all select
'Mary', 'Holmes' union all select
'Poe', 'White' union all select
'Johnny', 'Brown' union all select
'Tim', 'Hopkins' union all select
'Tom', 'Hawkings' union all select
'Santa', 'Cruz' union all select
'Bob', 'Carlos'

insert into employee
select first_name + '_' + convert(varchar(10), number), first_name+'.'+last_name + '.' + convert(varchar(10), number)
from employee, master..spt_values
where type='p'

select * from employee -- inspect that the first-names are aligned to the last-names
*/

-- your query follows
;with randomized as (
select
	pk, first_name, last_name,
	pk_seq=row_number() over (order by pk),
	firstname_from=row_number() over (order by newid()),
	lastname_from=row_number() over (order by newid()) from employee
)
update r
set first_name = f.first_name, last_name = l.last_name
from randomized r
inner join randomized f on f.firstname_from=r.pk_seq
inner join randomized l on l.lastname_from=r.pk_seq

/*
-- quick test using sample table. won't match the test table anymore
select * from employee
*/

Open in new window

Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
Sorry.  I ignored tour management does Database design.

Author

Commented:
Thank you Richard - not sure what happened the first time around..

this was very excellent logic.
thanks again.

Author

Commented:
>>I hope I have shown you in general a method that can be easily expanded for any number of columns.

yes, very much so.

Author

Commented:
>>Sorry.  I ignored tour management does Database design.

were you posting for another thread, Racimo?
Racim BOUDJAKDJIDatabase Architect - Dba - Data Scientist

Commented:
<<were you posting for another thread, Racimo?>>
Never mind...

Good luck with the rest of the design...:)

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial