Chris Michalczuk
asked on
Create a function in Sql 2012 to do the following cleaning of a field
can I create a common function in Sql that does the following
remove commas from a field
returns ltrim on the field
replaces hidden characters, Tabs REPLACE(your_column, char(9), '') replace(replace([your_colu mn],char(1 0),' '),char(13),' ')
leaves only 1 space instead of >1 between words
I am having to do this all piecemeal so w#hould love to do this with just one command ie
update table
set field_name = newfunction(field_name)
this could save so much time
remove commas from a field
returns ltrim on the field
replaces hidden characters, Tabs REPLACE(your_column, char(9), '') replace(replace([your_colu
leaves only 1 space instead of >1 between words
I am having to do this all piecemeal so w#hould love to do this with just one command ie
update table
set field_name = newfunction(field_name)
this could save so much time
ASKER
Hi thanks haven't tried it but will in a moment
how would I encorporate this into it too
this is one of the commands I do as sql script
update [SLXAddresses] set [M_Address4] =replace(replace(replace([ M_Address4 ],char(10) ,' '),char(13),' '),char(9), ' ')
how would I encorporate this into it too
this is one of the commands I do as sql script
update [SLXAddresses] set [M_Address4] =replace(replace(replace([
Presumably, you've already taken steps to make sure new entries will not include these entries.
You could create an insert/update trigger that will do what you are looking for including making sure future entries will not include these or other characters.
The update tablename set fieldname=fielddname
You could create an insert/update trigger that will do what you are looking for including making sure future entries will not include these or other characters.
The update tablename set fieldname=fielddname
ASKER
sadly I dont have control over that this is around a datawarehouse I working with the data and doing some cleaning !
A trigger solution would only help one table, as this function can work against any table, whether INSERTs, UPDATEs or SELECTs.
In a similar manner as the SELECT example in my post, the UPDATE would look like:
UPDATE [SLXAddresses]
SET [M_Address4] = dbo.udf_CleanDataFn([M_Add ress4])
In a similar manner as the SELECT example in my post, the UPDATE would look like:
UPDATE [SLXAddresses]
SET [M_Address4] = dbo.udf_CleanDataFn([M_Add
ASKER
Hi dsacker just to confirm the function would also remove char(13),' '), ,char(10),' ') too
I'm not a programmer but have a basic sql knowledge
update [SLXAddresses] set [M_Address4] =replace(replace(replace([ M_Address4 ],char(10) ,' '),char(13),' '),char(9), ' ')
I'm not a programmer but have a basic sql knowledge
update [SLXAddresses] set [M_Address4] =replace(replace(replace([
Yes it will. The following section does that:
WHILE @ndx < LEN(@Value)
BEGIN
SET @ndx = @ndx + 1
IF ASCII(SUBSTRING(@Value, @ndx, 1)) BETWEEN 32 AND 126
SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
END
If goes character by character, making sure they are between ascii values 32 and 126 (visible, printable characters).
One thought: This function is quite restricted to your specifications. In the future should you want to clean a field and keep the commas, this function would need to remove that from the code. Or you might prefer a second parameter where you could specify certain options.
But I assume you need this for a specific set of tasks.
WHILE @ndx < LEN(@Value)
BEGIN
SET @ndx = @ndx + 1
IF ASCII(SUBSTRING(@Value, @ndx, 1)) BETWEEN 32 AND 126
SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
END
If goes character by character, making sure they are between ascii values 32 and 126 (visible, printable characters).
One thought: This function is quite restricted to your specifications. In the future should you want to clean a field and keep the commas, this function would need to remove that from the code. Or you might prefer a second parameter where you could specify certain options.
But I assume you need this for a specific set of tasks.
Before you UPDATE a table, you could verify that the values will be what you expect by running the following:
I recommend you uncomment the RTRIM in line 10. Since you are updating addresses, you surely wouldn't want spaces before (LTRIM) or after (RTRIM) the value.
SELECT '"' + M_Address4 + '"' AS OldAddress,
"'" + dbo.udf_CleanDataFn(M_Address4) + '"' AS NewAddress
FROM SLXAddresses
I enclosed both values in double-quotes. That way you can take a cursory glance at the expected results before doing an UPDATE, making sure there are no spaces on the ends.I recommend you uncomment the RTRIM in line 10. Since you are updating addresses, you surely wouldn't want spaces before (LTRIM) or after (RTRIM) the value.
@dsacker, true, IMHO, it is preferred to make sure the data is being added is validated to avoid this type of issue
Your function can be used within the trigger as well.
Though the function to be universal, needs to be modular, defining which characters need to be replaced.
If this is a view, there might be other data fields that might need to be cleaned up of other .....
Your function can be used within the trigger as well.
Though the function to be universal, needs to be modular, defining which characters need to be replaced.
If this is a view, there might be other data fields that might need to be cleaned up of other .....
@arnold, agreed. That was one of my points in an earlier post.
I recommend adding a second parameter called @Options varchar(255). I do this in many of my own functions and procs, where I can give varying options, such as nocommas, rtrim, debug, etc., all of which I take certain types of actions when I see them.
Only issue with triggers is the author doesn't have access to change tables, only to modify the data. And I'm assuming he is dealing with legacy data, which is a one-off exercise. A trigger would not be needed longer than the data cleanup, which is a bit overkill, wouldn't you think?
By the way, Chris, regarding my SELECT statement above, here is a correction to that code to first review the data before updating it:
I recommend adding a second parameter called @Options varchar(255). I do this in many of my own functions and procs, where I can give varying options, such as nocommas, rtrim, debug, etc., all of which I take certain types of actions when I see them.
Only issue with triggers is the author doesn't have access to change tables, only to modify the data. And I'm assuming he is dealing with legacy data, which is a one-off exercise. A trigger would not be needed longer than the data cleanup, which is a bit overkill, wouldn't you think?
By the way, Chris, regarding my SELECT statement above, here is a correction to that code to first review the data before updating it:
SELECT '"' + M_Address4 + '"' AS OldAddress,
'"' + dbo.udf_CleanDataFn(M_Address4) + '"' AS NewAddress
FROM SLXAddresses
ASKER
thanks got a lot of cleaning on legacy - trying this now so will feed back what happened. seems to work at mo but haven't validated as yet but will. I'll leave the thread open and report back when I've finished.
You could put it in a function, but since it's just a string of replaces, I'd just use them in-line:
SELECT REPLACE(REPLACE(REPLACE(RE PLACE(REPL ACE(REPLAC E(REPLACE( LTRIM(your _column),
',', ''),
CHAR(9), ''),
CHAR(10), ' '),
CHAR(13), ' '),
SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=bell; must be any char that is NOT in the data
CHAR(7) + SPACE(1), ''),
CHAR(7), SPACE(1))
FROM (
SELECT ' a 9 a [ h 8 a Z' as your_column
) AS test_data
SELECT REPLACE(REPLACE(REPLACE(RE
',', ''),
CHAR(9), ''),
CHAR(10), ' '),
CHAR(13), ' '),
SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=bell; must be any char that is NOT in the data
CHAR(7) + SPACE(1), ''),
CHAR(7), SPACE(1))
FROM (
SELECT ' a 9 a [ h 8 a Z' as your_column
) AS test_data
@arnold, taking your thought about a more robust function, I enhanced the udf_CleanDataFn. It now accepts a 2nd parameter, which allows for various options. You might like it.
@chris michalczuk, for your purposes, you'd call the function below as follows:
@chris michalczuk, for your purposes, you'd call the function below as follows:
SELECT dbo.udf_CleanDataFn(YourFi eld, 'nocomma,trim') AS FieldName
CREATE FUNCTION udf_CleanDataFn (
@Value varchar(255),
@Options varchar(255) )
RETURNS varchar(255)
AS
BEGIN
IF ISNULL(@Value, '') = ''
RETURN ''
SET @Options = ISNULL(@Options, 'trim')
IF CHARINDEX('ltrim', @Options) > 0
SET @Value = LTRIM(@Value)
IF CHARINDEX('rtrim', @Options) > 0
SET @Value = RTRIM(@Value)
IF CHARINDEX('trim', @Options) > 0 AND CHARINDEX('ltrim', @Options) = 0 AND CHARINDEX('rtrim', @Options) = 0
SET @Value = RTRIM(LTRIM(@Value))
IF CHARINDEX('nocomma', @Options) > 0
SET @Value = REPLACE(@Value, ',', '')
SET @Value = REPLACE(@Value, CHAR(9), ' ')
WHILE CHARINDEX(' ', @Value) > 0
SET @Value = REPLACE(@Value, ' ', ' ')
IF CHARINDEX('nospace', @Options) > 0
WHILE CHARINDEX(' ', @Value) > 0
SET @Value = REPLACE(@Value, ' ', '')
DECLARE @asc tinyint,
@chr char(1),
@len tinyint,
@ndx smallint,
@NewValue varchar(255),
@use tinyint,
@WkField varchar(255)
SET @len = LEN(@Value)
SET @ndx = -1
SET @NewValue = ''
SET @WkField = '"' + @Value + '"'
WHILE @ndx <= @len
BEGIN
SET @ndx = @ndx + 1
SET @chr = SUBSTRING(@Value, @ndx, 1)
SET @asc = ASCII(@chr)
SET @use = 0
IF @asc BETWEEN 32 AND 126
SET @use = 1
IF CHARINDEX('nopunc', @Options) > 0
IF @asc < 48 OR @asc BETWEEN 58 AND 64 OR @asc BETWEEN 91 AND 96 OR @asc > 122
IF @asc <> 32
SET @use = 0
IF CHARINDEX('nonum', @Options) > 0
IF @chr LIKE '[0-9]'
SET @use = 0
IF CHARINDEX('noalpha', @Options) > 0
IF @chr LIKE '[A-Za-z]'
SET @use = 0
IF CHARINDEX('alphaonly', @Options) > 0
IF @chr NOT LIKE '[A-Za-z]'
SET @use = 0
IF CHARINDEX('numonly', @Options) > 0 OR CHARINDEX('numericonly', @Options) > 0 OR CHARINDEX('numberonly', @Options) > 0
IF @chr NOT LIKE '[0-9]'
SET @use = 0
IF @use = 1
SET @NewValue = @NewValue + SUBSTRING(@Value, @ndx, 1)
END
RETURN @NewValue
END
That function code is extremely inefficient. You may not want to use it against large data sets, or particularly very large data sets. If you have large data, be sure to do large volume tests before deploying.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Functions should be as efficient as possible, including getting rid of variables unless they are absolutely required. Best is just a single RETURN statement:
CREATE FUNCTION dbo.ScrubDataField (
@data varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
RETURN (
SELECT REPLACE(REPLACE(REPLACE(RE PLACE(REPL ACE(REPLAC E(REPLACE( LTRIM(@dat a),
',', ''),
CHAR(9), ''),
CHAR(10), ' '),
CHAR(13), ' '),
SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=must be char that is NOT in the input data
CHAR(7) + SPACE(1), ''),
CHAR(7), SPACE(1))
)
END --FUNCTION
CREATE FUNCTION dbo.ScrubDataField (
@data varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
RETURN (
SELECT REPLACE(REPLACE(REPLACE(RE
',', ''),
CHAR(9), ''),
CHAR(10), ' '),
CHAR(13), ' '),
SPACE(2), SPACE(1) + CHAR(7)), --CHAR(7)=must be char that is NOT in the input data
CHAR(7) + SPACE(1), ''),
CHAR(7), SPACE(1))
)
END --FUNCTION
ASKER
this has seriously saved me time
Open in new window