Robust Field Cleaning Function

dsackerContract ERP Admin/Consultant
CERTIFIED EXPERT
Published:
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within REPLACE functions). Over time this can be quite unreadable. It can also become a maintenance chore where this is not a single one-off effort.

This happened in a recent question here on EE. I'd like to share the solution that came out of that thread. I have written a function called udf_CleanDataFn. It can be invoke within any query as follows:
 
SELECT dbo.udf_CleanDataFn(YourColumn, @Options)

@Options is a varchar(255) field, which can accept any of the following "options":
 
  • ltrim - This will left-trim the value in YourColumn.
  • rtrim - This will right-trim the value in YourColumn.
  • trim - This will both left- and right-trim the value.
  • nocomma - This will remove any and all comma's.
  • nospace - This will remove any and all spaces.
  • nopunc - This will remove all standard punctuations (!@#$% etc)
  • nonum - This will remove all numbers (0-9).
  • noalpha - This will remove all alpha characters (A-Z and a-z).
  • alphaonly - This will remove everything except alpha characters.
  • numonly - This will remove everything except numbers.

Following are a few working examples that will demonstrate the different results . This first example takes out only the commas:
 
SELECT dbo.udf_CleanDataFn('This !  is *  a  13,21  @  test', 'nocomma')
Result: "This ! is * a 1321 @ test"

This example removes all punctuations and numbers:
 
SELECT dbo.udf_CleanDataFn('This !  is *  a  13,21  @  test', 'nopunc, nonum')
Result: "This  is  a   test"

This example removes the alphas and spaces:
 
SELECT dbo.udf_CleanDataFn('    Spaces 123 all 456 around    ', 'noalpha, nospace')
Result: 123456

Where there is very large data, this function can be inefficient, but where small amounts of data need to be cleaned, this can be leveraged on many alpha-numeric fields. My bench tests showed that this could update about 200,000 records a minute. It is not fast compared to an elaborate SQL statement, but you get the benefits of leverage and ease of maintenance where the processing time is acceptable.

Here is the code:
 
CREATE FUNCTION udf_CleanDataFn (
                          @Value      varchar(255),
                          @Options    varchar(255) )
                      RETURNS varchar(255)
                      AS
                      BEGIN
                          IF ISNULL(@Value, '') = ''
                              RETURN ''
                      
                          DECLARE @DelimOptions varchar(257)
                      
                          SET @Options = ISNULL(@Options, 'trim')
                          SET @DelimOptions = ',' + @Options + ','
                      
                          WHILE CHARINDEX(' ', @DelimOptions) > 0
                              SET @DelimOptions = REPLACE(@DelimOptions, ' ', '')
                      
                          WHILE CHARINDEX(CHAR(9), @DelimOptions) > 0
                              SET @DelimOptions = REPLACE(@DelimOptions, CHAR(9), '')
                      
                          IF CHARINDEX(',ltrim,', @DelimOptions) > 0
                              SET @Value = LTRIM(@Value)
                      
                          IF CHARINDEX(',rtrim,', @DelimOptions) > 0
                              SET @Value = RTRIM(@Value)
                      
                          IF CHARINDEX(',trim,', @DelimOptions) > 0
                              SET @Value = RTRIM(LTRIM(@Value))
                      
                          IF CHARINDEX(',nocomma,', @DelimOptions) > 0
                              SET @Value = REPLACE(@Value, ',', '')
                      
                          SET @Value = REPLACE(@Value, CHAR(9), ' ')
                          WHILE CHARINDEX('  ', @Value) > 0
                              SET @Value = REPLACE(@Value, '  ', ' ')
                      
                          IF CHARINDEX(',nospace,', @DelimOptions) > 0
                              WHILE CHARINDEX(' ', @Value) > 0
                                  SET @Value = REPLACE(@Value, ' ', '')
                      
                          DECLARE @asc        tinyint,
                                  @chr        char(1),
                                  @len        tinyint,
                                  @ndx        smallint,
                                  @NewValue   varchar(255),
                                  @use        tinyint,
                                  @WkField    varchar(255)
                      
                          SET @len = LEN(@Value)
                          SET @ndx = -1
                          SET @NewValue = ''
                          SET @WkField  = '"' + @Value + '"'
                      
                          WHILE @ndx <= @len 
                          BEGIN
                              SET @ndx = @ndx + 1
                              SET @chr = SUBSTRING(@Value, @ndx, 1)
                              SET @asc = ASCII(@chr)
                              SET @use = 0
                      
                              IF @asc BETWEEN 32 AND 126
                                  SET @use = 1
                                             
                              IF CHARINDEX(',nopunc,', @DelimOptions) > 0
                                  IF @asc < 48 OR @asc BETWEEN 58 AND 64 OR @asc BETWEEN 91 AND 96 OR @asc > 122
                                      IF @asc <> 32
                                          SET @use = 0
                      
                              IF CHARINDEX(',nonum,', @DelimOptions) > 0
                                  IF @chr LIKE '[0-9]'
                                      SET @use = 0
                      
                              IF CHARINDEX(',noalpha,', @DelimOptions) > 0
                                  IF @chr LIKE '[A-Za-z]'
                                      SET @use = 0
                      
                              IF CHARINDEX(',alphaonly,', @DelimOptions) > 0
                                  IF @chr NOT LIKE '[A-Za-z]'
                                      SET @use = 0
                      
                              IF CHARINDEX(',numonly,', @DelimOptions) > 0 OR CHARINDEX(',numericonly,', @DelimOptions) > 0 OR CHARINDEX(',numberonly,', @DelimOptions) > 0
                                  IF @chr NOT LIKE '[0-9]'
                                      SET @use = 0
                      
                              IF @use = 1
                                  SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
                          END
                      
                          RETURN @NewValue
                      END

Open in new window

Thank you for reading my article. Your feedback is welcome. If you liked this article or are interested in more like these, please click the Yes button for: Was this article helpful?
1
3,021 Views
dsackerContract ERP Admin/Consultant
CERTIFIED EXPERT

Comments (11)

dsackerContract ERP Admin/Consultant
CERTIFIED EXPERT

Author

Commented:
It is a little bit limited, but works fine. You may notice I'm using a little bit in the WHILE loop via the LIKE comparison.
Hi, can this remove carriage return char(13) and line feed char(10)?
dsackerContract ERP Admin/Consultant
CERTIFIED EXPERT

Author

Commented:
Yes. It removes those by default.
ok thanks, i will give it a go. thanks for sharing this by the way !
dsackerContract ERP Admin/Consultant
CERTIFIED EXPERT

Author

Commented:
You're welcome. If it proves helpful, please click the "Helpful" button. :)

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.