Create a function in Sql 2012 to do the following cleaning of a field

can I create a common function in Sql that does the following

remove commas from a field
returns ltrim on the field
replaces hidden characters, Tabs REPLACE(your_column, char(9), '')  replace(replace([your_column],char(10),' '),char(13),' ')
leaves only 1 space instead of >1 between words

I am having to do this all piecemeal so w#hould love to do this with just one command ie

update table
set field_name = newfunction(field_name)

this could save so much time
Chris MichalczukConsultantAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dsackerContract ERP Admin/ConsultantCommented:
Perhaps something like this might work. You would use it as follow:

SELECT dbo.udf_CleanDataFn(YourField) AS FieldName
If you want to also RTRIM, simply uncomment line 10.

CREATE FUNCTION udf_CleanDataFn (@Value varchar(255))
RETURNS varchar(255)
AS
BEGIN
    IF ISNULL(@Value, '') = ''
        RETURN ''

    SET @Value = REPLACE(@Value, ',', '')
    SET @Value = LTRIM(@Value)
--  SET @Value = RTRIM(@Value)

    SET @Value = REPLACE(@Value, CHAR(9), ' ')
    WHILE CHARINDEX('  ', @Value) > 0
        SET @Value = REPLACE(@Value, '  ', ' ')

    DECLARE @ndx        smallint,
            @NewValue   varchar(255)

    SET @ndx = 0
    SET @NewValue = ''

    WHILE @ndx < LEN(@Value)
    BEGIN
        SET @ndx = @ndx + 1
        IF ASCII(SUBSTRING(@Value, @ndx, 1)) BETWEEN 32 AND 126
            SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
    END

    RETURN @NewValue
END

Open in new window

Chris MichalczukConsultantAuthor Commented:
Hi thanks haven't tried it but will in a moment

how would I encorporate this into it too
this is one of the commands I do as sql script

update [SLXAddresses] set [M_Address4] =replace(replace(replace([M_Address4],char(10),' '),char(13),' '),char(9), ' ')
arnoldCommented:
Presumably, you've already taken steps to make sure new entries will not include these entries.

You could create an insert/update trigger that will do what you are looking for including making sure future entries will not include these or other characters.
The update tablename set fieldname=fielddname
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

Chris MichalczukConsultantAuthor Commented:
sadly I dont have control over that this is around a datawarehouse I working with the data and doing some cleaning !
dsackerContract ERP Admin/ConsultantCommented:
A trigger solution would only help one table, as this function can work against any table, whether INSERTs, UPDATEs or SELECTs.

In a similar manner as the SELECT example in my post, the UPDATE would look like:

UPDATE [SLXAddresses]
SET [M_Address4] = dbo.udf_CleanDataFn([M_Address4])
Chris MichalczukConsultantAuthor Commented:
Hi dsacker just to confirm the function would also  remove char(13),' '), ,char(10),' ') too
I'm not a programmer but have a basic sql knowledge

update [SLXAddresses] set [M_Address4] =replace(replace(replace([M_Address4],char(10),' '),char(13),' '),char(9), ' ')
dsackerContract ERP Admin/ConsultantCommented:
Yes it will. The following section does that:

WHILE @ndx < LEN(@Value)
BEGIN
    SET @ndx = @ndx + 1
    IF ASCII(SUBSTRING(@Value, @ndx, 1)) BETWEEN 32 AND 126
        SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
END

If goes character by character, making sure they are between ascii values 32 and 126 (visible, printable characters).

One thought: This function is quite restricted to your specifications. In the future should you want to clean a field and keep the commas, this function would need to remove that from the code. Or you might prefer a second parameter where you could specify certain options.

But I assume you need this for a specific set of tasks.
dsackerContract ERP Admin/ConsultantCommented:
Before you UPDATE a table, you could verify that the values will be what you expect by running the following:

SELECT  '"' + M_Address4 + '"' AS OldAddress,
        "'" + dbo.udf_CleanDataFn(M_Address4) + '"' AS NewAddress
FROM    SLXAddresses

Open in new window

I enclosed both values in double-quotes. That way you can take a cursory glance at the expected results before doing an UPDATE, making sure there are no spaces on the ends.

I recommend you uncomment the RTRIM in line 10. Since you are updating addresses, you surely wouldn't want spaces before (LTRIM) or after (RTRIM) the value.
arnoldCommented:
@dsacker, true, IMHO, it is preferred to make sure the data is being added is validated to avoid this type of issue

Your function can be used within the trigger as well.

Though the function to be universal, needs to be modular, defining which characters need to be replaced.
If this is a view, there might be other data fields that might need to be cleaned up of other .....
dsackerContract ERP Admin/ConsultantCommented:
@arnold, agreed. That was one of my points in an earlier post.

I recommend adding a second parameter called @Options varchar(255). I do this in many of my own functions and procs, where I can give varying options, such as nocommas, rtrim, debug, etc., all of which I take certain types of actions when I see them.

Only issue with triggers is the author doesn't have access to change tables, only to modify the data. And I'm assuming he is dealing with legacy data, which is a one-off exercise. A trigger would not be needed longer than the data cleanup, which is a bit overkill, wouldn't you think?

By the way, Chris, regarding my SELECT statement above, here is a correction to that code to first review the data before updating it:

SELECT  '"' + M_Address4 + '"' AS OldAddress,
        '"' + dbo.udf_CleanDataFn(M_Address4) + '"' AS NewAddress
FROM    SLXAddresses

Open in new window

Chris MichalczukConsultantAuthor Commented:
thanks got a lot of cleaning on legacy - trying this now so will feed back what happened. seems to work at mo but haven't validated as yet but will. I'll leave the thread open and report back when I've finished.
Scott PletcherSenior DBACommented:
You could put it in a function, but since it's just a string of replaces, I'd just use them in-line:

SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(LTRIM(your_column),
    ',', ''),
    CHAR(9), ''),
    CHAR(10), ' '),
    CHAR(13), ' '),
    SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=bell; must be any char that is NOT in the data
    CHAR(7) + SPACE(1), ''),
    CHAR(7), SPACE(1))
FROM (
    SELECT '   a        9   a    [   h  8   a  Z' as your_column
) AS test_data
dsackerContract ERP Admin/ConsultantCommented:
@arnold, taking your thought about a more robust function, I enhanced the udf_CleanDataFn. It now accepts a 2nd parameter, which allows for various options. You might like it.

@chris michalczuk, for your purposes, you'd call the function below as follows:

SELECT dbo.udf_CleanDataFn(YourField, 'nocomma,trim') AS FieldName
CREATE FUNCTION udf_CleanDataFn (
    @Value      varchar(255),
    @Options    varchar(255) )
RETURNS varchar(255)
AS
BEGIN
    IF ISNULL(@Value, '') = ''
        RETURN ''

    SET @Options = ISNULL(@Options, 'trim')

    IF CHARINDEX('ltrim', @Options) > 0
        SET @Value = LTRIM(@Value)

    IF CHARINDEX('rtrim', @Options) > 0
        SET @Value = RTRIM(@Value)

    IF CHARINDEX('trim', @Options) > 0 AND CHARINDEX('ltrim', @Options) = 0 AND CHARINDEX('rtrim', @Options) = 0
        SET @Value = RTRIM(LTRIM(@Value))

    IF CHARINDEX('nocomma', @Options) > 0
        SET @Value = REPLACE(@Value, ',', '')

    SET @Value = REPLACE(@Value, CHAR(9), ' ')
    WHILE CHARINDEX('  ', @Value) > 0
        SET @Value = REPLACE(@Value, '  ', ' ')

    IF CHARINDEX('nospace', @Options) > 0
        WHILE CHARINDEX(' ', @Value) > 0
            SET @Value = REPLACE(@Value, ' ', '')

    DECLARE @asc        tinyint,
            @chr        char(1),
            @len        tinyint,
            @ndx        smallint,
            @NewValue   varchar(255),
            @use        tinyint,
            @WkField    varchar(255)

    SET @len = LEN(@Value)
    SET @ndx = -1
    SET @NewValue = ''
    SET @WkField  = '"' + @Value + '"'

    WHILE @ndx <= @len 
    BEGIN
        SET @ndx = @ndx + 1
        SET @chr = SUBSTRING(@Value, @ndx, 1)
        SET @asc = ASCII(@chr)
        SET @use = 0

        IF @asc BETWEEN 32 AND 126
            SET @use = 1
                       
        IF CHARINDEX('nopunc', @Options) > 0
            IF @asc < 48 OR @asc BETWEEN 58 AND 64 OR @asc BETWEEN 91 AND 96 OR @asc > 122
                IF @asc <> 32
                    SET @use = 0

        IF CHARINDEX('nonum', @Options) > 0
            IF @chr LIKE '[0-9]'
                SET @use = 0

        IF CHARINDEX('noalpha', @Options) > 0
            IF @chr LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX('alphaonly', @Options) > 0
            IF @chr NOT LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX('numonly', @Options) > 0 OR CHARINDEX('numericonly', @Options) > 0 OR CHARINDEX('numberonly', @Options) > 0
            IF @chr NOT LIKE '[0-9]'
                SET @use = 0

        IF @use = 1
            SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
    END

    RETURN @NewValue
END

Open in new window

Scott PletcherSenior DBACommented:
That function code is extremely inefficient.  You may not want to use it against large data sets, or particularly very large data sets.  If you have large data, be sure to do large volume tests before deploying.
dsackerContract ERP Admin/ConsultantCommented:
Agreed, the function is not as efficient as straight SQL, but the author's question is titled, "Create a function ...", which is pretty much what this is. His opening post stressed the desire to use a user-defined function rather than code a lengthy SELECT statement. I've taken it as a given that the author cognitively recognizes and prefers the ease over the processing time and can appreciate that preference.

I bench tested this function against a material transaction table which has millions of rows, and it came it at about 200,000 records per minute.

Below is a further expansion of the function, per some positive and constructive feedback here on EE:

CREATE FUNCTION udf_CleanDataFn (
    @Value      varchar(255),
    @Options    varchar(255) )
RETURNS varchar(255)
AS
BEGIN
    IF ISNULL(@Value, '') = ''
        RETURN ''

    DECLARE @DelimOptions varchar(257)

    SET @Options = ISNULL(@Options, 'trim')
    SET @DelimOptions = ',' + @Options + ','

    WHILE CHARINDEX(' ', @DelimOptions) > 0
        SET @DelimOptions = REPLACE(@DelimOptions, ' ', '')

    WHILE CHARINDEX(CHAR(9), @DelimOptions) > 0
        SET @DelimOptions = REPLACE(@DelimOptions, CHAR(9), '')

    IF CHARINDEX(',ltrim,', @DelimOptions) > 0
        SET @Value = LTRIM(@Value)

    IF CHARINDEX(',rtrim,', @DelimOptions) > 0
        SET @Value = RTRIM(@Value)

    IF CHARINDEX(',trim,', @DelimOptions) > 0
        SET @Value = RTRIM(LTRIM(@Value))

    IF CHARINDEX(',nocomma,', @DelimOptions) > 0
        SET @Value = REPLACE(@Value, ',', '')

    SET @Value = REPLACE(@Value, CHAR(9), ' ')
    WHILE CHARINDEX('  ', @Value) > 0
        SET @Value = REPLACE(@Value, '  ', ' ')

    IF CHARINDEX(',nospace,', @DelimOptions) > 0
        WHILE CHARINDEX(' ', @Value) > 0
            SET @Value = REPLACE(@Value, ' ', '')

    DECLARE @asc        tinyint,
            @chr        char(1),
            @len        tinyint,
            @ndx        smallint,
            @NewValue   varchar(255),
            @use        tinyint,
            @WkField    varchar(255)

    SET @len = LEN(@Value)
    SET @ndx = -1
    SET @NewValue = ''
    SET @WkField  = '"' + @Value + '"'

    WHILE @ndx <= @len 
    BEGIN
        SET @ndx = @ndx + 1
        SET @chr = SUBSTRING(@Value, @ndx, 1)
        SET @asc = ASCII(@chr)
        SET @use = 0

        IF @asc BETWEEN 32 AND 126
            SET @use = 1
                       
        IF CHARINDEX(',nopunc,', @DelimOptions) > 0
            IF @asc < 48 OR @asc BETWEEN 58 AND 64 OR @asc BETWEEN 91 AND 96 OR @asc > 122
                IF @asc <> 32
                    SET @use = 0

        IF CHARINDEX(',nonum,', @DelimOptions) > 0
            IF @chr LIKE '[0-9]'
                SET @use = 0

        IF CHARINDEX(',noalpha,', @DelimOptions) > 0
            IF @chr LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX(',alphaonly,', @DelimOptions) > 0
            IF @chr NOT LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX(',numonly,', @DelimOptions) > 0 OR CHARINDEX(',numericonly,', @DelimOptions) > 0 OR CHARINDEX(',numberonly,', @DelimOptions) > 0
            IF @chr NOT LIKE '[0-9]'
                SET @use = 0

        IF @use = 1
            SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
    END

    RETURN @NewValue
END

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Scott PletcherSenior DBACommented:
Functions should be as efficient as possible, including getting rid of variables unless they are absolutely required.  Best is just a single RETURN statement:

CREATE FUNCTION dbo.ScrubDataField (
    @data varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
RETURN (
    SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(LTRIM(@data),
         ',', ''),
         CHAR(9), ''),
         CHAR(10), ' '),
         CHAR(13), ' '),
         SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=must be char that is NOT in the input data
         CHAR(7) + SPACE(1), ''),
         CHAR(7), SPACE(1))
)
END --FUNCTION
Chris MichalczukConsultantAuthor Commented:
this has seriously saved me time
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.