Link to home
Create AccountLog in
Avatar of andybrooke

asked on

Data cleansing dates MSSQL 200833 R2


I have about 4 million records in a table. I have noticed that dates are a mess, I have 2 columns that contain datetimes. Unfortunately the datatype is varchar! which doesn't help.

I have dates like (yyyy-mm-dd hh:mm:ss), (dd/mm/yyyy hh:mm:ss) and (mm/dd/yyyy/ hh:mm:ss).

I need to somehow clean this up, whats the best way to do it?

Avatar of rshq
Flag of Iran, Islamic Republic of image

 I think you can use CAST command to convert varchar to datetime.
Avatar of andybrooke


Did you read my issue?
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
how do I do what you suggest?
>> mm/dd/yyyy/ hh:mm:ss <<

I'm guessing the final "/" is a typo and that it's just mm/dd/yyyy.

For dates where the values themselves make only one format valid, it's pretty straightforward.

The problem is with a date like: "07/04/2012" -- is that April 7 or July 4?  
Either one could be valid.  Perhaps you could use another column in the table to help determine which is correct.

I'll assume in my code that if one date column is stored as mm/dd or dd/mm, that the other date in the same row will be also.

With any luck, ISDATE() = 1 will be true for most rows, and you can do a straight CAST w/o any other specifications.
Are you planning on adding a new column to the table to hold the converted datetime?

Or for now should I plan on storing it in another table with the key column(s) as a link/join?
yes a new column.
SET DATEFORMAT mdy --or dmy if that is more applicable to your location

UPDATE dbo.tablename
    new_column1 = CASE WHEN ISDATE(old_column1) = 1 THEN old_column1 ELSE NULL END,
    new_column2 = CASE WHEN ISDATE(old_column2) = 1 THEN old_column2 ELSE NULL END
    ISDATE(old_column1) = 1 OR ISDATE(old_column2) = 1

SET DATEFORMAT dmy --check secondary format for date, if didn't match primary format

UPDATE dbo.tablename
    new_column1 = CASE WHEN new_column1 IS NULL AND ISDATE(old_column1) = 1 THEN old_column1 ELSE new_column1 END,
    new_column2 = CASE WHEN new_column2 IS NULL AND ISDATE(old_column2) = 1 THEN old_column2 ELSE new_column2 END
    (new_column1 IS NULL AND ISDATE(old_column1) = 1) OR
    (new_column2 IS NULL AND ISDATE(old_column2) = 1)

-- values below need reviewed by a person, since SQL can't treat them as a datetime
FROM dbo.tablename
    new_column1 IS NULL OR new_column2 IS NULL