Find DataTypes of a staging table

Posted on 2014-03-31
Medium Priority
Last Modified: 2014-05-14
I have a staging table that has been imported from excel, every column is varchar(255).  I need to move the data into a new table that has the correct datatypes for each column.  I need a best guess at least on the column datatype and we never know how many columns the table is going to be, we only know it will be called staging_[something].  The datatypes I need to check for are int, datetime, float, decimal, varchar, bit.  What is the best way to do this in T-SQL.  Thanks
Question by:nutnut
  • 2
  • 2
LVL 70

Expert Comment

by:Scott Pletcher
ID: 39966999
Profiling the columns for int, float, decimal and bit is fairly straightforward.

Checking datetime vs general varchar could be very difficult.  Do you have some type of restrictions on how the date/datetime could be entered?

Author Comment

ID: 39967316
Yes we do we have maybe 5 different common ways. I was thinking as long as the bulk of the column have a 13,14,15 or 16 in its a date because all dates will be from 2013,14,15 and 16?
LVL 35

Expert Comment

by:David Todd
ID: 39967778

Can you look at Excel and from there guess at the datatype?

LVL 70

Accepted Solution

Scott Pletcher earned 2000 total points
ID: 39967954
I've got very limited time now, but I think the code below will provide you a decent start.  You seemed to have a better handle on testing for date than I did -- I would want specific patterns -- so I left that for you.

My code presumes you want to look at *every* row before deciding on the type.  If there are too many rows, you'll probably want to add a "TOP (nnnn)" expression.

I've only coded column1, but column2 would follow exactly the same pattern, and so on for column3, column4, etc..

I'd SELECT these results into a temp table, either one row for all columns or one row per column, and analyze the types there.

The basic analysis would be something like:
  -- if total_count = bit_ok_count and sign_location_max = 0 and float_location_final_max = 0  then  type = bit
  -- if total_count = int_ok_count and sign_location_max IN (0, 1) and float_location_final_max = 0  then  type = int
  --  if total_count = decimal_ok_count and sign_location_max IN (0, 1) and float_location_final_max = 0  then  type = decimal
  --  if total_count = datetime_ok_count  then  type = datetime
  -- else  type = varchar.
    SUM(1) AS Total_Count,
    MIN(sign_location) AS sign_location_min,
    MAX(sign_location) AS sign_location_max,
    MIN(float_location_final) AS float_location_final_min,
    MAX(float_location_final) AS float_location_final_max,
    SUM(CASE WHEN column1_mod_final NOT LIKE '%[^0123456789]%' THEN 1 ELSE 0 END) AS int_ok_count,
    SUM(CASE WHEN column1_mod_final NOT LIKE '%[^0-9.]%' AND column1_mod_final NOT LIKE '%.%.%' THEN 1 ELSE 0 END)
        AS decimal_ok_count,
    SUM(CASE WHEN column1_mod_final IN ('0', '1', 'False', 'True') THEN 1 ELSE 0 END) AS bit_ok_count
    --add your date testing logic here AS datetime_ok_count
FROM ( --dbo.staging_table1
    SELECT '12e+5' AS column1
) AS staging_table1
    SELECT PATINDEX('%[+-]%', column1) AS sign_location,
        PATINDEX('%[DE]%', column1) AS float_location_prelim
) AS ca10
    SELECT SUBSTRING(column1, sign_location, 1) AS [sign]
) AS ca20
    SELECT CASE WHEN sign_location > 0 THEN STUFF(column1, sign_location, 1, '') ELSE column1 END AS column1_mod_1
) AS ca30
    SELECT CASE WHEN float_location_prelim > 0 AND
        SUBSTRING(column1_mod_1, float_location_prelim + 1, 99) NOT LIKE '%[^0123456789]%' AND
        (sign_location = 0 OR sign_location > float_location_prelim) THEN float_location_prelim ELSE 0 END
        AS float_location_final
) AS ca4
    SELECT CASE WHEN float_location_final > 0 THEN STUFF(column1_mod_1, float_location_final, 1, '')
        ELSE column1_mod_1 END AS column1_mod_final
) AS ca6

Author Closing Comment

ID: 40065392
Great answer thanks!!

Featured Post

A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Windocks is an independent port of Docker's open source to Windows.   This article introduces the use of SQL Server in containers, with integrated support of SQL Server database cloning.
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
SQL Database Recovery Software repairs the MDF & NDF Files, corrupted due to hardware related issues or software related errors. Provides preview of recovered database objects and allows saving in either MSSQL, CSV, HTML or XLS format. Ensures recov…

607 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question