Link to home
Start Free TrialLog in
Avatar of JGH5
JGH5Flag for United States of America

asked on

Replace Alpha and Numeric patterns with X and 9 dynamically based on a look up table for the entire database

I am trying to replace columns in a database with either a numeric when a number pattern is found and an alpha when an alpha pattern is found.  I have several columns in the [TestDB] such as [LastName], [FirstName] and [TelephoneNum].

I need to add Something like Replace((Replace('X','[A-Z]')), '9','[0-9]'))) where [LastName] and [FirstName] would result with  XXXXX and 99999 for [TelephoneNum].

Furthermore, I have a list of columns in a table I need to find and then replace with an X or 9.

Is there a way to use the FindText function and query a table with targeted columns and replace with X or 9.


I use this Stored Procedure to find columns one at a time but need to come up with a better solution to find multiple columns based on a look up table and then replace the values based on the alpha or numeric pattern found in that particular column:

Example Input Parameter:    exec sp_FindText '%LastName%', '[TEST_DB]'

IF EXISTS (SELECT name FROM sysobjects WHERE name = N'sp_FindText' AND type = 'P')
	DROP PROCEDURE sp_FindText
go
CREATE PROCEDURE [dbo].sp_FindText
	  @text varchar(255)
	, @dbname varchar(64) = NULL
AS 
SET NOCOUNT ON 
  
	IF @dbname IS NULL  
		BEGIN  
			--enumerate all databases 
			DECLARE #db CURSOR FOR 
			SELECT [name] FROM master..sysdatabases

			declare @c_dbname varchar(64)

			OPEN #db FETCH #db INTO @c_dbname  
 
			WHILE @@FETCH_STATUS <> -1 --and @MyCount < 500   
				BEGIN  
					execute sp_FindText @text, @c_dbname   
					FETCH #db INTO @c_dbname   
				END
			CLOSE #db DEALLOCATE #db
		END	--if @dbname is null   
	ELSE  
		BEGIN --@dbname is not null
			declare @sql varchar(250)

			--create the find like command
			SELECT @sql = 'SELECT ''' + @dbname + ''' as db, o.name, m.definition '  
			SELECT @sql = @sql + ' FROM ' + @dbname + '.sys.sql_modules m '  
			SELECT @sql = @sql + ' INNER JOIN ' + @dbname + '..sysobjects o on m.object_id=o.id'  
			SELECT @sql = @sql + ' WHERE [definition] like ''%' + @text + '%'''

			print @sql
			EXEC (@sql)
		END --@dbname is not null  

go

Open in new window

Avatar of Koen Van Wielink
Koen Van Wielink
Flag of Netherlands image

Hi JGHS,

I wrote below stored procedure a while ago to check for specific values throughout our database. I think it's pretty close to what you are looking for, although perhaps not a 100% fit. But it should probably get you started.
The procedure takes a datatype and a data value as input parameters. It will first check which tables contain columns that use that particular datatype and stores that in a temp table. After that a cursor is used to check for each of the columns with the correct datatype whether they contain the data value provided. This is stored in a second temp table which is the final result set.
I'm pretty sure you could rewrite this slightly to look up the more generic values you're looking for. You can use the final result set in the #DataExists table to run your replace statements.

/****** Object:  StoredProcedure [dbo].[uSP_Check_data_used]    Script Date: 10/29/2013 09:06:01 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

Create Procedure [dbo].[uSP_Check_data_used]

	@Data_type		nvarchar(100)	--The datatype of the column you're checking.
,	@Data_name		nvarchar(100)	--The value which you want to check

--For example, if you're checking if the packaging code 'PA1 50S TH' is used in the system, @Data_type would be 'Material_abbreviated'
--and @data_name would be 'PA1 50S TH'. The output is a list of all the tables and corresponding columns where the code is used.

AS

Set nocount on

--Declare additional variables
Declare		@SQL			nvarchar(max)
		,	@Table_name		nvarchar(100)
		,	@column_name	nvarchar(100)
		,	@database_name	nvarchar(100)

--Declare table variable to hold database names
Declare	@V_databases	table
(Database_name	nvarchar(100))

--Insert database names into table variable
Insert into @V_databases
Select	name
from	sys.databases

--Create table to hold all possible table and column names for selected data_type and data_name
Create table	#Datacheck
(	Database_name	nvarchar(100)
,	Table_name		nvarchar(100)
,	column_name		nvarchar(100)
)

--Create table to hold all table and column names where the selected value exists.
Create table	#DataExists
(	Database_name	nvarchar(100)
,	Table_name		nvarchar(100)
,	column_name		nvarchar(100)
,	Value_checked	nvarchar(100)
)

--Insert all possible table and column names for selected data_type and data_name
--Declare cursor to check each database
	Declare	T_databases cursor for

	Select	Database_name
	from	@V_databases
	Order by Database_name
	
	Open	T_databases
	Fetch	Next from T_databases
	into
			@database_name
			
	While	(@@FETCH_STATUS <> -1)
	Begin
	
--Create dynamic SQL query to loop through each of the databases
	Set	@SQL = 
		'Insert into	#Datacheck

		Select	'	+ '''' + @database_name + '''	,
						o.name as ''Table''			,
						c.name as ''Column_name''
		from	'	+ '' + @Database_name + '.sys.types t
				inner join ' + '' + @Database_name + '.sys.columns c
					on t.user_type_id = c.user_type_id
				inner join ' + '' + @Database_name + '.sys.objects o
					on c.object_id = o.object_id
					and o.type <> ''V''
		where	t.name = ' + '''' + @Data_type + ''''

	Exec	(@SQL)
	
	Set		@SQL = ''

	Fetch	Next from T_databases
	into
			@database_name
			
	End
	
	Close	T_databases
	Deallocate T_databases

--Clear @database_name variable
Set @database_name = ''

--Create cursor which loops through all selected tables and checks the columns for the selected value
	Declare	T_Datacheck	cursor for

	Select		Database_name
			,	Table_name
			,	Column_name
	From	#Datacheck
	Order by	Table_name
	
	Open	T_Datacheck
	Fetch	next from T_Datacheck
	Into
				@Database_name
			,	@Table_name
			,	@column_name
	
	While (@@FETCH_STATUS) <> -1
	Begin
	
	--Dynamic SQL statement to check each table
	Set		@SQL =	'IF Exists	(
								Select	1
								From	' + @database_name + '.dbo.' + @table_name	+ ' 
								Where	' + @column_name + ' = ''' + @Data_name + ''''
								+ ' )
					Begin
					Insert	Into	#DataExists
					Select	'''			+ @database_name	+ ''''
							+ ', '''	+ @Table_name		+ ''''
							+ ', '''	+ @column_name		+ ''''
							+ ', '''	+ @Data_name		+ ''''
					+ ' End'
			
--	Print	@SQL
--Execute the dynamic SQL to insert possible values into the final dataset
	Exec	(@SQL)
	
--Clear the dynamic SQL variable for next loop
	Set @SQL = ''
	
	Fetch	next from T_Datacheck
	Into
				@Database_name
			,	@Table_name
			,	@column_name
	End
	
	Close	T_Datacheck
	Deallocate T_Datacheck

--Select final dataset
Select	*
from	#DataExists	
	
--Drop temp tables
Drop table #Datacheck
Drop table #DataExists
	
Set nocount off
GO

Open in new window


Let me know if this is clear or whether more explanation is required. As an example, the final data set would look something like the attached screenshot.
ExampleData.png
Oh, one more comment to the code. I'm actually looking up user_type_ID datatypes as this is a better fit for our data. If you want a more generic lookup on the system datatypes, modify the code in the "create dynamic SQL Query to loop through each of the databases" section to link the system_type_id's rather than the user_type_id's in the sys.types and sys.columns tables.
Avatar of Scott Pletcher
I've read the original q three times and I'm still not sure what you want to do, but here's my best guess.


For a given list of columns in a table (or tables?? or entire db's tables??), you want to replace the current values in those columns as follows:

1) if the current value in a given byte location is A-Z (and a-z!?), replace it with X.
2) if the current value in a given byte location is 0-9, replace it with 9.
(so result will look kinda like a COBOL PIC clause :-)).

Thus, 'ABC1' would become 'XXX9', and '3/2B/4DE1' would become '9/9X/9XX9'.

Is that right?
Avatar of JGH5

ASKER

Hi Scott,

Yes, this is needed to de-identify production data to use for testing.
Avatar of JGH5

ASKER

On a simpilar note something like this where i hard code the column and table:

UPDATE
[Customer]
SET
CustID =
(
Case when CustID like '%[A-Z]%' then 'X'
Case when CustID like '%[a-z]%' then 'x'    
Case when CustID Like '%[0-1]%' then '2'  
Case when CustID Like '%[2-3]%' then '4'  
Case when CustID Like '%[4-5]%' then '6'  
Case when CustID Like '%[5-6]%' then '7'  
Case when CustID Like '%[7-8]%' then '9'
 
ELSE
CustID
END
)
Hi JGHS,

Do you need to replace ALL characters, or is it simply enough to replace any string with a single X, and any number with a single 9?
If so, this will do the trick:

USE [ExpertExchange]
GO

/****** Object:  StoredProcedure [dbo].[uSP_anomymize_data]    Script Date: 11/05/2013 09:50:41 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


Alter Procedure [dbo].[uSP_anomymize_data]

/*
--Replaces string fields with X and number fields with 9
--Check is done by datatype:

name		system_type_id		Replaced_with
text		35					X
tinyint		48					9
smallint	52					9
int			56					9
float		62					9
ntext		99					X
decimal		106					9
numeric		108					9
bigint		127					9
varchar		167					X
char		175					X
nvarchar	231					X
nchar		239					X
sysname		231					X
*/

AS

Set nocount on

--Declare variables
Declare		@SQL			nvarchar(max)
		,	@Table_name		nvarchar(100)
		,	@column_name	nvarchar(100)
		,	@database_name	nvarchar(100)
		,	@data_type		int

--Declare table variable to hold database names
Declare	@V_databases	table
(Database_name	nvarchar(100))

--Insert database names into table variable
Insert into @V_databases
Select	name
from	sys.databases
where	Database_ID > 4		--Excludes system tables

--Create table to hold all table and column names for included data_type
Create table	#DataToUpdate
(	Database_name	nvarchar(100)
,	Table_name		nvarchar(100)
,	column_name		nvarchar(100)
,	data_type		int
)

--Insert all table and column names for included data_type
--Declare cursor to check each database
	Declare	T_databases cursor for

	Select	Database_name
	from	@V_databases
	Order by Database_name
	
	Open	T_databases
	Fetch	Next from T_databases
	into
			@database_name
			
	While	(@@FETCH_STATUS <> -1)
	Begin
	
-- Create dynamic SQL query to loop through each of the databases and insert relevant records in temp table
	Set	@SQL = 
		'Insert into	#DataToUpdate

		Select	'	+ '''' + @database_name + '''	,
						o.name as ''Table''			,
						c.name as ''Column_name''	,	
						t.system_type_id as ''Data_type''
		from	'	+ '' + @Database_name + '.sys.types t
				inner join ' + '' + @Database_name + '.sys.columns c
					on t.system_type_id = c.system_type_id
				inner join ' + '' + @Database_name + '.sys.objects o
					on c.object_id = o.object_id
					and o.type = ''U''
		Where	t.system_type_ID in (35,99,167,231,239,175,48,52,56,62,106,108,127)'
	
-- For testing, print @SQL
-- print	@SQL

-- Execute the SQL query
	Exec	(@SQL)
	
-- Clear @SQL variable for next query
	Set		@SQL = ''

	Fetch	Next from T_databases
	into
			@database_name
			
	End
	
	Close	T_databases
	Deallocate T_databases

--Clear @database_name variable
Set @database_name = ''

--Create cursor which loops through all selected tables creates update statements for the columns
	Declare	T_Datacheck	cursor for

	Select		Database_name
			,	Table_name
			,	Column_name
			,	Data_type
	From	#DataToUpdate
	Order by	Table_name
	
	Open	T_Datacheck
	Fetch	next from T_Datacheck
	Into
				@Database_name
			,	@Table_name
			,	@column_name
			,	@Data_type
	
	While (@@FETCH_STATUS) <> -1
	Begin
	
	--Dynamic SQL statement to create the update statements
	Set		@SQL =	'Update	' + @database_name + '.dbo.' + @table_name	+ ' 
					Set		' + @column_name + ' = Case
														When	' + ltrim(rtrim(str(@data_type,4,0))) + ' in (48,52,56,62,106,108,127)
														Then	9
														When	' + ltrim(rtrim(str(@data_type,4,0))) + ' in (35,99,167,231,239,175)
														Then	''X''
														Else	' + @column_name + '
													End'

-- For testing purposes, print @SQL	
 Print	@SQL

--Execute the dynamic SQL to insert possible values into the final dataset
--	Exec	(@SQL)
	
--Clear the dynamic SQL variable for next loop
	Set @SQL = ''
	
	Fetch	next from T_Datacheck
	Into
				@Database_name
			,	@Table_name
			,	@column_name
			,	@Data_type
	End
	
	Close	T_Datacheck
	Deallocate T_Datacheck

--Drop temp table
Drop table #DataToUpdate
	
Set nocount off

GO

Open in new window


I've modified the earlier procedure I put up. It now lists all the tables and columns from non-system databases that contain either number or string fields, based on their system_type_id's. The list at the top shows which ID's are included and what they are replaced with. You can modify the second dynamic SQL statement to include more or exclude some of these types by adding/removing them from the "IN" statement.
The procedure will loop through all your databases and list the table names, column names and data ID's for the selected data types. It will then generate update statements for each of those columns and replace number values with 9, and text/string values with X.
Make sure to replace the database name at the very top of the statement to your database name. I've also disabled the Exec (@SQL) command for the update statement to prevent accidental update. Instead it will print the generated command for review.
Let me know if this is what you need, or if further modifications are required.
Oh, and of course the "ALTER Procedure" statement should be changed tp "CREATE Procedure"....
Avatar of JGH5

ASKER

I need to replace all of the characters or numbers in some of the tables and columns in the database.
When you say "some tables and columns", does that mean you have a fixed list? Does it mean you're not looking to replace ALL text and numbers in your database, but only some of it? If there are no obvious selection criteria like datatypes, how do you provide your query with that information? Some sample data would be useful.
Avatar of JGH5

ASKER

Need to change all data in these columns (small example set) and they have different data types and all of the information in the Column/Row has to be replaced with repeating X or 9 to be able to view the obfascated data with the same size after the databases changes are made.

Table       Schema      Column
Agent      dbo      LastName
Agent      dbo      FirstName
Agent      dbo      CorpName
Agent      dbo      DateOfBirth
Agent      dbo      SSN
Agent      dbo      TIN
AgentAddress      dbo      AddrLine1
AgentAddress      dbo      AddrLine2
AgentAddress      dbo      City
First let's try the simplest method: a string of embedded REPLACEs.  If that performs OK, we're good.  If not, we'll have to try to do something else that can be made to run faster.

The basic statement would be:

SELECT
column_data,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    column_data,
'a', 'X'),'b', 'X'),'c', 'X'),'d', 'X'),'e', 'X'),'f', 'X'),'g', 'X'),'h', 'X'),'i', 'X'),'j', 'X'),
'k', 'X'),'l', 'X'),'m', 'X'),'n', 'X'),'o', 'X'),'p', 'X'),'q', 'X'),'r', 'X'),'s', 'X'),'t', 'X'),
'u', 'X'),'v', 'X'),'w', 'X'),'x', 'X'),'y', 'X'),'z', 'X'),
'0', '9'),'1', '9'),'2', '9'),'3', '9'),'4', '9'),'5', '9'),'6', '9'),'7', '9'),'8', '9'),'9', '9')
FROM (
    SELECT 'afl;kj0[-2/;3bnlkrjq _nrt98a[-w0jaZsv' AS column_data
) AS test_data


Then we just have to put that into a proc and have it UPDATE *every* applicable column on a table in *one* UPDATE statement.  While that will generate more log activity at once, it will be vastly more efficient than multiple passes thru the same table, which will be a performance nightmare on large tables.

If you'd like sample code on doing that within a proc, just let me know.
Avatar of JGH5

ASKER

i tried running the replace script on one table and it did not make the changes.  could you show me a proc with this instead?
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of JGH5

ASKER

Thank you for your help, the Update script works.