Avatar of bgarib
bgarib
 asked on

Need to scramble data in a column SQL Server

I'm looking for a function to scramble or randomize a column in my existing database. What I mean by this is that I have a few thousand records of patients. I want to randomize their last names so that the last names do not match the rest of the record so they can't be directly identified. I'll do a similar process with other columns (first name, date of birth, etc.) but what I need is a way to swap out the names so the last name is written to a random record.

Thanks!
Microsoft SQL Server

Avatar of undefined
Last Comment
Timetrips

8/22/2022 - Mon
nmcdermaid

Is this so that you can get a sample data set which doesn't match the real world?

Post the table definition and we can come up with something. Does the table have a primary key (IDENTITY)? That will make things a bit easier to scramble.
bgarib

ASKER
Yes, we want to not give out protected health information that can uniquely identify a patient. It's for a demo database where we need to show some pseudo-real data. For all sakes and purposes the table structure simplified is:

PatientID                  numeric        (primary key)
PatientLastName     varchar

Thanks
nmcdermaid

Is patientID a numerically ascending number with no holes in it?

Does it have to be guaranteed that every single person has to have their surname swapped? Is it sufficient to just shift all the surnames down by say four records?

I just want to get the whole story before I write any code.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
bgarib

ASKER
The PatientID is numerically ascending but I can't be guaranteed that there are no holes in it.

Yes, every person has to have the name swapped.

I also entertained shifting all the names down then just manually fixing the ones at the top. This sounds like not as good a way but I would use it if I had to.

Thanks!
ASKER CERTIFIED SOLUTION
Scott Pletcher

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
bgarib

ASKER
Sorry for the long delay. I couldn't get that query to work. I was able to sucessfully scramble data using an MS Access tool I found online if it helps anyone.

http://alexdyb.blogspot.com/2007/03/how-to-scramble-data.html

Timetrips

If I don't have to worry about "duplicate" data (i.e. key fields) then I found the following has worked well for me .  It has the added bonus of retaining the data formatting (i.e. Spaces, Capitals, Punctuation, etc) and it is realatively fast even against large data sets:

--Example Usage:
update myTable_
      set social_security_number_ = dbo.p_ScrambleText_(social_security_number_)
         ,name_last_ = dbo.p_ScrambleText_(name_last_)



if exists (select 1
            from  sysobjects
           where  id = object_id('v_RandomNumber_')
            and   type = 'V')
   drop view v_RandomNumber_
go
 
CREATE VIEW v_RandomNumber_
AS
SELECT RAND() as RandomNumber_
GO
 
if exists (select 1
          from sysobjects
          where  id = object_id('p_ScrambleText_')
          and type in ('IF', 'FN', 'TF'))
   drop function p_ScrambleText_
go
 
CREATE FUNCTION p_ScrambleText_(@Value_ varchar(255))
RETURNS varchar(255)
AS
BEGIN
 
	DECLARE @Length_ int
	DECLARE @RetVal_ varchar(255)
	DECLARE @Counter_ smallint
	DECLARE @RandomNumber_ float
	DECLARE @CurChar_ as char(1)
    DECLARE @ASCIIValue_ as smallint
	SET @Length_ = LEN(@Value_)
	SET @RetVal_ = ''
    SET @ASCIIValue_ = 0
 
	SET @Counter_ = 1
	WHILE @Counter_ < (@Length_ + 1)
	BEGIN
        --Process next available character 
		SET @CurChar_ = SUBSTRING(@Value_, @Counter_, 1)
        --Get the ASCII value of the character we are processing
        SET @ASCIIValue_ = ASCII(@CurChar_)
        --Generate a new random number (remeber this generates a number between 0 and 1)
        SET @RandomNumber_ = (select RandomNumber_ from v_RandomNumber_)
       --Using the ASCII value see if we are working with a number 
		IF @ASCIIValue_ between 48 and 57 
		BEGIN
            --Use the random number to get a value between 0 and 9 to be used as the random replacement number
			SET @CurChar_ = Convert(Char,Convert(tinyint, 9 * @RandomNumber_)) 
        END
        --if not a number check for an uppercase letter
        ELSE IF @ASCIIValue_ between 65 and 90
        BEGIN
            --Now use the random number to get a value between 1 and 26 which we will use to generate a random replacement letter of the same case
			SET @CurChar_ = CHAR(Convert(tinyint, (25 * @RandomNumber_ + 1)) + 63) 
        END
        --If not an uppercase letter check for a lowercase letter
		ELSE IF @ASCIIValue_ between 97 and 122 
        BEGIN
            --Now use the random number to get a value between 1 and 26 which we will use to generate a random replacement letter of the same case
			SET @CurChar_ = CHAR(Convert(tinyint, (25 * @RandomNumber_ + 1)) + 96) 
		END 
        --NOTE: If the character being processed is not a number value or a letter (A-Z or a-Z) then we keep that character
        --      This helps preserve formatting found with data like Phone Number, SSN's, emails etc.
        SET @RetVal_ = @RetVal_ + @CurChar_
		SET @Counter_ = @Counter_ + 1 
	END 
 
	RETURN @RetVal_ 
END
GO 
if exists (select 1
          from sysobjects
          where  id = object_id('p_ScrambleDate_')
          and type in ('IF', 'FN', 'TF'))
   drop function p_ScrambleDate_
go
 
CREATE FUNCTION p_ScrambleDate_(@Date_ datetime)
RETURNS datetime
AS
BEGIN
 
	DECLARE @RandomNumber_ tinyint
    --Generate a random number between 1 and 25 (no special reason for 25 but it is a reasonable number of years to decrement date.
    SET @RandomNumber_ = Convert(tinyint, ((24) * (SELECT RandomNumber_ FROM v_RandomNumber_) + 1)) 
    SELECT @date_ = DateAdd(d,@RandomNumber_,@date_) --Increment date by RandomNumber_ days
    SELECT @date_ = DateAdd(m,@RandomNumber_,@date_) --Increment date by RandomNumber_ months
    SELECT @date_ = DateAdd(yy,(@RandomNumber_ * -1),@date_) --Decrement date by RandomNumber_ years.
 
	RETURN @Date_ 
END
GO 

Open in new window

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.