Link to home
Create AccountLog in
Avatar of andybrooke
andybrooke

asked on

Data cleansing dates MSSQL 200833 R2

Hi,

I have about 4 million records in a table. I have noticed that dates are a mess, I have 2 columns that contain datetimes. Unfortunately the datatype is varchar! which doesn't help.

I have dates like (yyyy-mm-dd hh:mm:ss), (dd/mm/yyyy hh:mm:ss) and (mm/dd/yyyy/ hh:mm:ss).

I need to somehow clean this up, whats the best way to do it?

Thanks
Avatar of rshq
rshq
Flag of Iran, Islamic Republic of image

Hi
 I think you can use CAST command to convert varchar to datetime.
Avatar of andybrooke
andybrooke

ASKER

Did you read my issue?
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
how do I do what you suggest?
thanks
>> mm/dd/yyyy/ hh:mm:ss <<

I'm guessing the final "/" is a typo and that it's just mm/dd/yyyy.


For dates where the values themselves make only one format valid, it's pretty straightforward.

The problem is with a date like: "07/04/2012" -- is that April 7 or July 4?  
Either one could be valid.  Perhaps you could use another column in the table to help determine which is correct.

I'll assume in my code that if one date column is stored as mm/dd or dd/mm, that the other date in the same row will be also.

With any luck, ISDATE() = 1 will be true for most rows, and you can do a straight CAST w/o any other specifications.
Are you planning on adding a new column to the table to hold the converted datetime?

Or for now should I plan on storing it in another table with the key column(s) as a link/join?
yes a new column.
SET DATEFORMAT mdy --or dmy if that is more applicable to your location

UPDATE dbo.tablename
SET
    new_column1 = CASE WHEN ISDATE(old_column1) = 1 THEN old_column1 ELSE NULL END,
    new_column2 = CASE WHEN ISDATE(old_column2) = 1 THEN old_column2 ELSE NULL END
WHERE
    ISDATE(old_column1) = 1 OR ISDATE(old_column2) = 1


SET DATEFORMAT dmy --check secondary format for date, if didn't match primary format

UPDATE dbo.tablename
SET
    new_column1 = CASE WHEN new_column1 IS NULL AND ISDATE(old_column1) = 1 THEN old_column1 ELSE new_column1 END,
    new_column2 = CASE WHEN new_column2 IS NULL AND ISDATE(old_column2) = 1 THEN old_column2 ELSE new_column2 END
WHERE
    (new_column1 IS NULL AND ISDATE(old_column1) = 1) OR
    (new_column2 IS NULL AND ISDATE(old_column2) = 1)


-- values below need reviewed by a person, since SQL can't treat them as a datetime
SELECT *
FROM dbo.tablename
WHERE
    new_column1 IS NULL OR new_column2 IS NULL