Robust Field Cleaning Function

dsackerContract ERP Admin/Consultant
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within REPLACE functions). Over time this can be quite unreadable. It can also become a maintenance chore where this is not a single one-off effort.

This happened in a recent question here on EE. I'd like to share the solution that came out of that thread. I have written a function called udf_CleanDataFn. It can be invoke within any query as follows:
SELECT dbo.udf_CleanDataFn(YourColumn, @Options)

@Options is a varchar(255) field, which can accept any of the following "options":
  • ltrim - This will left-trim the value in YourColumn.
  • rtrim - This will right-trim the value in YourColumn.
  • trim - This will both left- and right-trim the value.
  • nocomma - This will remove any and all comma's.
  • nospace - This will remove any and all spaces.
  • nopunc - This will remove all standard punctuations (!@#$% etc)
  • nonum - This will remove all numbers (0-9).
  • noalpha - This will remove all alpha characters (A-Z and a-z).
  • alphaonly - This will remove everything except alpha characters.
  • numonly - This will remove everything except numbers.

Following are a few working examples that will demonstrate the different results . This first example takes out only the commas:
SELECT dbo.udf_CleanDataFn('This !  is *  a  13,21  @  test', 'nocomma')
Result: "This ! is * a 1321 @ test"

This example removes all punctuations and numbers:
SELECT dbo.udf_CleanDataFn('This !  is *  a  13,21  @  test', 'nopunc, nonum')
Result: "This  is  a   test"

This example removes the alphas and spaces:
SELECT dbo.udf_CleanDataFn('    Spaces 123 all 456 around    ', 'noalpha, nospace')
Result: 123456

Where there is very large data, this function can be inefficient, but where small amounts of data need to be cleaned, this can be leveraged on many alpha-numeric fields. My bench tests showed that this could update about 200,000 records a minute. It is not fast compared to an elaborate SQL statement, but you get the benefits of leverage and ease of maintenance where the processing time is acceptable.

Here is the code:
    @Value      varchar(255),
    @Options    varchar(255) )
RETURNS varchar(255)
    IF ISNULL(@Value, '') = ''
        RETURN ''

    DECLARE @DelimOptions varchar(257)

    SET @Options = ISNULL(@Options, 'trim')
    SET @DelimOptions = ',' + @Options + ','

    WHILE CHARINDEX(' ', @DelimOptions) > 0
        SET @DelimOptions = REPLACE(@DelimOptions, ' ', '')

    WHILE CHARINDEX(CHAR(9), @DelimOptions) > 0
        SET @DelimOptions = REPLACE(@DelimOptions, CHAR(9), '')

    IF CHARINDEX(',ltrim,', @DelimOptions) > 0
        SET @Value = LTRIM(@Value)

    IF CHARINDEX(',rtrim,', @DelimOptions) > 0
        SET @Value = RTRIM(@Value)

    IF CHARINDEX(',trim,', @DelimOptions) > 0
        SET @Value = RTRIM(LTRIM(@Value))

    IF CHARINDEX(',nocomma,', @DelimOptions) > 0
        SET @Value = REPLACE(@Value, ',', '')

    SET @Value = REPLACE(@Value, CHAR(9), ' ')
    WHILE CHARINDEX('  ', @Value) > 0
        SET @Value = REPLACE(@Value, '  ', ' ')

    IF CHARINDEX(',nospace,', @DelimOptions) > 0
        WHILE CHARINDEX(' ', @Value) > 0
            SET @Value = REPLACE(@Value, ' ', '')

    DECLARE @asc        tinyint,
            @chr        char(1),
            @len        tinyint,
            @ndx        smallint,
            @NewValue   varchar(255),
            @use        tinyint,
            @WkField    varchar(255)

    SET @len = LEN(@Value)
    SET @ndx = -1
    SET @NewValue = ''
    SET @WkField  = '"' + @Value + '"'

    WHILE @ndx <= @len 
        SET @ndx = @ndx + 1
        SET @chr = SUBSTRING(@Value, @ndx, 1)
        SET @asc = ASCII(@chr)
        SET @use = 0

        IF @asc BETWEEN 32 AND 126
            SET @use = 1
        IF CHARINDEX(',nopunc,', @DelimOptions) > 0
            IF @asc < 48 OR @asc BETWEEN 58 AND 64 OR @asc BETWEEN 91 AND 96 OR @asc > 122
                IF @asc <> 32
                    SET @use = 0

        IF CHARINDEX(',nonum,', @DelimOptions) > 0
            IF @chr LIKE '[0-9]'
                SET @use = 0

        IF CHARINDEX(',noalpha,', @DelimOptions) > 0
            IF @chr LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX(',alphaonly,', @DelimOptions) > 0
            IF @chr NOT LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX(',numonly,', @DelimOptions) > 0 OR CHARINDEX(',numericonly,', @DelimOptions) > 0 OR CHARINDEX(',numberonly,', @DelimOptions) > 0
            IF @chr NOT LIKE '[0-9]'
                SET @use = 0

        IF @use = 1
            SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)

    RETURN @NewValue

Open in new window

Thank you for reading my article. Your feedback is welcome. If you liked this article or are interested in more like these, please click the Yes button for: Was this article helpful?
dsackerContract ERP Admin/Consultant

Comments (11)

dsackerContract ERP Admin/Consultant


It is a little bit limited, but works fine. You may notice I'm using a little bit in the WHILE loop via the LIKE comparison.
Hi, can this remove carriage return char(13) and line feed char(10)?
dsackerContract ERP Admin/Consultant


Yes. It removes those by default.
ok thanks, i will give it a go. thanks for sharing this by the way !
dsackerContract ERP Admin/Consultant


You're welcome. If it proves helpful, please click the "Helpful" button. :)

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.

Get access with a 7-day free trial.
You Belong in the World's Smartest IT Community