Link to home
Create AccountLog in
Avatar of irb56
irb56

asked on

Importing data with mixed types

Hi,

I seem to spend a lot of time wrestling with SSIS data flows that have Excel sources. The problem comes where a column contains mixed data types. Specifically a recurring scenario is where an Excel worksheet that I get asked to import into the database contains a column with mostly numeric values but in fact the field in question is a string data type, with a smaller number of alpha numeric codes in later rows.

I understand that Excel helpfully(!?) determines the data type based on a sampling of the first few values (the first 8 by default I think). I also know about adding the IMEX=1 option to the data connection string to inhibit this behaviour. However, I think the IMEX option is of limited value because my observation is that Excel still decides on double float unless it finds at least one non numeric value in its sampling phase. The trouble is that the Excel files I'm asked to import contain thousands of rows and you never know where non numeric data may lurk.

By way of example, consider a very basic Excel worksheet with a single column heading and 9 values, the last of which is non numeric:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
ABC123

I can run a SSIS data flow that happily puts all 9 rows into a SQL table with the following schema definition:
CREATE TABLE TestTable (
id nvarchar(50) null)

There is no error in the SSIS package but the SQL table looks like this after the load:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
NULL

I'm sure there are some experienced SSIS developers out there that are well used to dealing with this kind of ETL issue. Can anyone please offer me some pearls of wisdom to prevent me wasting so much time trying to get data loaded into the database correctly? I need a solution that is hands free and reliable because in some cases the data loads will be automated and continue after I've left the company.

Thanks in advance.
Avatar of Member_2_3698648
Member_2_3698648
Flag of Australia image

You can put in a Data Conversion step in your process (probably right after the datasource would be good.
Avatar of Reza Rad
right click on excel connection manager,
go to properties window
add IMEX=1 to end of ConnectionString property, like this:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\testfolder\myFile.xls;Extended Properties="Excel 8.0;HDR=YES;IMEX=1";



note that you can set data type of this column as DT_STR or DT_WSTR in the output columns of excel source, this can be non-numeric
Hi,

If datatype of columns is not matched then use conversion task and convert appropriate to save to table.
Avatar of irb56
irb56

ASKER

Thanks all for the suggestions. Has anyone tried creating an xls/xlsx file from the sample given and tried importing it via SSIS? I'm not sure if it's just me but here's my observations.

My Excel source connection is:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\SQL_Stuff\TestImport.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

The Excel Source output column (id) defaults to data type double precision float.

I've added a Data Conversion transformation task in between the Excel source and the OLE DB destination, which creates a copy of id as a Unicode string.

I've put a data viewer on the pipeline between the Excel source and the Data Conversion transform. This reveals that the cell containing ABC123 comes out of the Excel source as NULL without generating any failure of the Excel source. I have tried changing the Excel Source output column from double precision float to DT_WSTR and repeated the SSIS package run but the data viewer reveals no change.

The upshot is that the OLE DB destination gets 9 rows inserted, with all numeric values preserved and the ABC123 value lost and replaced by NULL.

I've tried explicitly formatting the column in Excel to text, but this makes no difference either. The only thing that I can find to make it work properly is to save the file to CSV and import from the CSV. Is this the best solution, to always ensure such an Excel file is converted to CSV, perhaps via some tool that can automate the task if needed?
could you upload your package and sample excel source file here? I can take a look and try to find out problem
Avatar of irb56

ASKER

Hi,

As requested, please find attached Excel file and SSIS package. I think I may have found some other solutions but haven't got any time left tonight to check them out properly. For reference, take a look at the following URL http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62697/.

Thanks for your help.

P.S. Please note that dtsx files can't be uploaded, so you'll need to manually rename the SSIS package to correct the extension.
Package1.doc
TestImport.xls
I checked your data flow task, I put a data viewer between excel source and data conversion transformation, and everything was correct. look at attachment, the last row fetched correctly.

1.jpg
I removed excel source and data conversion transform,
and create new excel source pointing to the excel connection manager, and this time all thing was correct, look at attachment:

2.jpg
Avatar of irb56

ASKER

Hi,

Thanks for looking at this. Something is different between our environments because the MS Jet driver always turns the 9th row value to NULL in my package, rendering the Data Conversion task useless. This may be because I use OpenOffice rather than proper Excel. When I create a second column (named new_id), which is derived from the id column using the IF function combined with ISNUMBER and TEXT functions, you can see from the screenshot below that this fixes the issue.

 User generated image
I know it is also possible to bulk load data into a SQL DB from an Excel file using the script task, which may be the answer for ETL tasks involving Excel data sources that need to be automated. If someone could walk me through creating a script task to achieve this, that'll be great. I would prefer Visual Basic as the language.

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of irb56
irb56

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Avatar of irb56

ASKER

Closing this one now as I've worked out my own solution. No-one seemed to see the problem I was having which was probably due to bad communication on my part. Thanks for all who offered advice and apologies for any time wasted.