I seem to spend a lot of time wrestling with SSIS data flows that have Excel sources. The problem comes where a column contains mixed data types. Specifically a recurring scenario is where an Excel worksheet that I get asked to import into the database contains a column with mostly numeric values but in fact the field in question is a string data type, with a smaller number of alpha numeric codes in later rows.
I understand that Excel helpfully(!?) determines the data type based on a sampling of the first few values (the first 8 by default I think). I also know about adding the IMEX=1 option to the data connection string to inhibit this behaviour. However, I think the IMEX option is of limited value because my observation is that Excel still decides on double float unless it finds at least one non numeric value in its sampling phase. The trouble is that the Excel files I'm asked to import contain thousands of rows and you never know where non numeric data may lurk.
By way of example, consider a very basic Excel worksheet with a single column heading and 9 values, the last of which is non numeric:
I can run a SSIS data flow that happily puts all 9 rows into a SQL table with the following schema definition:
CREATE TABLE TestTable (
id nvarchar(50) null)
There is no error in the SSIS package but the SQL table looks like this after the load:
I'm sure there are some experienced SSIS developers out there that are well used to dealing with this kind of ETL issue. Can anyone please offer me some pearls of wisdom to prevent me wasting so much time trying to get data loaded into the database correctly? I need a solution that is hands free and reliable because in some cases the data loads will be automated and continue after I've left the company.
Thanks in advance.