Link to home
Start Free TrialLog in
Avatar of Chris Michalczuk
Chris MichalczukFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Create a function in Sql 2012 to do the following cleaning of a field

can I create a common function in Sql that does the following

remove commas from a field
returns ltrim on the field
replaces hidden characters, Tabs REPLACE(your_column, char(9), '')  replace(replace([your_column],char(10),' '),char(13),' ')
leaves only 1 space instead of >1 between words

I am having to do this all piecemeal so w#hould love to do this with just one command ie

update table
set field_name = newfunction(field_name)

this could save so much time
Avatar of dsacker
dsacker
Flag of United States of America image

Perhaps something like this might work. You would use it as follow:

SELECT dbo.udf_CleanDataFn(YourField) AS FieldName
If you want to also RTRIM, simply uncomment line 10.

CREATE FUNCTION udf_CleanDataFn (@Value varchar(255))
RETURNS varchar(255)
AS
BEGIN
    IF ISNULL(@Value, '') = ''
        RETURN ''

    SET @Value = REPLACE(@Value, ',', '')
    SET @Value = LTRIM(@Value)
--  SET @Value = RTRIM(@Value)

    SET @Value = REPLACE(@Value, CHAR(9), ' ')
    WHILE CHARINDEX('  ', @Value) > 0
        SET @Value = REPLACE(@Value, '  ', ' ')

    DECLARE @ndx        smallint,
            @NewValue   varchar(255)

    SET @ndx = 0
    SET @NewValue = ''

    WHILE @ndx < LEN(@Value)
    BEGIN
        SET @ndx = @ndx + 1
        IF ASCII(SUBSTRING(@Value, @ndx, 1)) BETWEEN 32 AND 126
            SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
    END

    RETURN @NewValue
END

Open in new window

Avatar of Chris Michalczuk

ASKER

Hi thanks haven't tried it but will in a moment

how would I encorporate this into it too
this is one of the commands I do as sql script

update [SLXAddresses] set [M_Address4] =replace(replace(replace([M_Address4],char(10),' '),char(13),' '),char(9), ' ')
Presumably, you've already taken steps to make sure new entries will not include these entries.

You could create an insert/update trigger that will do what you are looking for including making sure future entries will not include these or other characters.
The update tablename set fieldname=fielddname
sadly I dont have control over that this is around a datawarehouse I working with the data and doing some cleaning !
A trigger solution would only help one table, as this function can work against any table, whether INSERTs, UPDATEs or SELECTs.

In a similar manner as the SELECT example in my post, the UPDATE would look like:

UPDATE [SLXAddresses]
SET [M_Address4] = dbo.udf_CleanDataFn([M_Address4])
Hi dsacker just to confirm the function would also  remove char(13),' '), ,char(10),' ') too
I'm not a programmer but have a basic sql knowledge

update [SLXAddresses] set [M_Address4] =replace(replace(replace([M_Address4],char(10),' '),char(13),' '),char(9), ' ')
Yes it will. The following section does that:

WHILE @ndx < LEN(@Value)
BEGIN
    SET @ndx = @ndx + 1
    IF ASCII(SUBSTRING(@Value, @ndx, 1)) BETWEEN 32 AND 126
        SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
END

If goes character by character, making sure they are between ascii values 32 and 126 (visible, printable characters).

One thought: This function is quite restricted to your specifications. In the future should you want to clean a field and keep the commas, this function would need to remove that from the code. Or you might prefer a second parameter where you could specify certain options.

But I assume you need this for a specific set of tasks.
Before you UPDATE a table, you could verify that the values will be what you expect by running the following:

SELECT  '"' + M_Address4 + '"' AS OldAddress,
        "'" + dbo.udf_CleanDataFn(M_Address4) + '"' AS NewAddress
FROM    SLXAddresses

Open in new window

I enclosed both values in double-quotes. That way you can take a cursory glance at the expected results before doing an UPDATE, making sure there are no spaces on the ends.

I recommend you uncomment the RTRIM in line 10. Since you are updating addresses, you surely wouldn't want spaces before (LTRIM) or after (RTRIM) the value.
@dsacker, true, IMHO, it is preferred to make sure the data is being added is validated to avoid this type of issue

Your function can be used within the trigger as well.

Though the function to be universal, needs to be modular, defining which characters need to be replaced.
If this is a view, there might be other data fields that might need to be cleaned up of other .....
@arnold, agreed. That was one of my points in an earlier post.

I recommend adding a second parameter called @Options varchar(255). I do this in many of my own functions and procs, where I can give varying options, such as nocommas, rtrim, debug, etc., all of which I take certain types of actions when I see them.

Only issue with triggers is the author doesn't have access to change tables, only to modify the data. And I'm assuming he is dealing with legacy data, which is a one-off exercise. A trigger would not be needed longer than the data cleanup, which is a bit overkill, wouldn't you think?

By the way, Chris, regarding my SELECT statement above, here is a correction to that code to first review the data before updating it:

SELECT  '"' + M_Address4 + '"' AS OldAddress,
        '"' + dbo.udf_CleanDataFn(M_Address4) + '"' AS NewAddress
FROM    SLXAddresses

Open in new window

thanks got a lot of cleaning on legacy - trying this now so will feed back what happened. seems to work at mo but haven't validated as yet but will. I'll leave the thread open and report back when I've finished.
You could put it in a function, but since it's just a string of replaces, I'd just use them in-line:

SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(LTRIM(your_column),
    ',', ''),
    CHAR(9), ''),
    CHAR(10), ' '),
    CHAR(13), ' '),
    SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=bell; must be any char that is NOT in the data
    CHAR(7) + SPACE(1), ''),
    CHAR(7), SPACE(1))
FROM (
    SELECT '   a        9   a    [   h  8   a  Z' as your_column
) AS test_data
@arnold, taking your thought about a more robust function, I enhanced the udf_CleanDataFn. It now accepts a 2nd parameter, which allows for various options. You might like it.

@chris michalczuk, for your purposes, you'd call the function below as follows:

SELECT dbo.udf_CleanDataFn(YourField, 'nocomma,trim') AS FieldName
CREATE FUNCTION udf_CleanDataFn (
    @Value      varchar(255),
    @Options    varchar(255) )
RETURNS varchar(255)
AS
BEGIN
    IF ISNULL(@Value, '') = ''
        RETURN ''

    SET @Options = ISNULL(@Options, 'trim')

    IF CHARINDEX('ltrim', @Options) > 0
        SET @Value = LTRIM(@Value)

    IF CHARINDEX('rtrim', @Options) > 0
        SET @Value = RTRIM(@Value)

    IF CHARINDEX('trim', @Options) > 0 AND CHARINDEX('ltrim', @Options) = 0 AND CHARINDEX('rtrim', @Options) = 0
        SET @Value = RTRIM(LTRIM(@Value))

    IF CHARINDEX('nocomma', @Options) > 0
        SET @Value = REPLACE(@Value, ',', '')

    SET @Value = REPLACE(@Value, CHAR(9), ' ')
    WHILE CHARINDEX('  ', @Value) > 0
        SET @Value = REPLACE(@Value, '  ', ' ')

    IF CHARINDEX('nospace', @Options) > 0
        WHILE CHARINDEX(' ', @Value) > 0
            SET @Value = REPLACE(@Value, ' ', '')

    DECLARE @asc        tinyint,
            @chr        char(1),
            @len        tinyint,
            @ndx        smallint,
            @NewValue   varchar(255),
            @use        tinyint,
            @WkField    varchar(255)

    SET @len = LEN(@Value)
    SET @ndx = -1
    SET @NewValue = ''
    SET @WkField  = '"' + @Value + '"'

    WHILE @ndx <= @len 
    BEGIN
        SET @ndx = @ndx + 1
        SET @chr = SUBSTRING(@Value, @ndx, 1)
        SET @asc = ASCII(@chr)
        SET @use = 0

        IF @asc BETWEEN 32 AND 126
            SET @use = 1
                       
        IF CHARINDEX('nopunc', @Options) > 0
            IF @asc < 48 OR @asc BETWEEN 58 AND 64 OR @asc BETWEEN 91 AND 96 OR @asc > 122
                IF @asc <> 32
                    SET @use = 0

        IF CHARINDEX('nonum', @Options) > 0
            IF @chr LIKE '[0-9]'
                SET @use = 0

        IF CHARINDEX('noalpha', @Options) > 0
            IF @chr LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX('alphaonly', @Options) > 0
            IF @chr NOT LIKE '[A-Za-z]'
                SET @use = 0

        IF CHARINDEX('numonly', @Options) > 0 OR CHARINDEX('numericonly', @Options) > 0 OR CHARINDEX('numberonly', @Options) > 0
            IF @chr NOT LIKE '[0-9]'
                SET @use = 0

        IF @use = 1
            SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
    END

    RETURN @NewValue
END

Open in new window

That function code is extremely inefficient.  You may not want to use it against large data sets, or particularly very large data sets.  If you have large data, be sure to do large volume tests before deploying.
ASKER CERTIFIED SOLUTION
Avatar of dsacker
dsacker
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Functions should be as efficient as possible, including getting rid of variables unless they are absolutely required.  Best is just a single RETURN statement:

CREATE FUNCTION dbo.ScrubDataField (
    @data varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
RETURN (
    SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(LTRIM(@data),
         ',', ''),
         CHAR(9), ''),
         CHAR(10), ' '),
         CHAR(13), ' '),
         SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=must be char that is NOT in the input data
         CHAR(7) + SPACE(1), ''),
         CHAR(7), SPACE(1))
)
END --FUNCTION
this has seriously saved me time