Solved

Importing data with mixed types

Posted on 2010-11-09
12
1,323 Views
Last Modified: 2013-11-10
Hi,

I seem to spend a lot of time wrestling with SSIS data flows that have Excel sources. The problem comes where a column contains mixed data types. Specifically a recurring scenario is where an Excel worksheet that I get asked to import into the database contains a column with mostly numeric values but in fact the field in question is a string data type, with a smaller number of alpha numeric codes in later rows.

I understand that Excel helpfully(!?) determines the data type based on a sampling of the first few values (the first 8 by default I think). I also know about adding the IMEX=1 option to the data connection string to inhibit this behaviour. However, I think the IMEX option is of limited value because my observation is that Excel still decides on double float unless it finds at least one non numeric value in its sampling phase. The trouble is that the Excel files I'm asked to import contain thousands of rows and you never know where non numeric data may lurk.

By way of example, consider a very basic Excel worksheet with a single column heading and 9 values, the last of which is non numeric:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
ABC123

I can run a SSIS data flow that happily puts all 9 rows into a SQL table with the following schema definition:
CREATE TABLE TestTable (
id nvarchar(50) null)

There is no error in the SSIS package but the SQL table looks like this after the load:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
NULL

I'm sure there are some experienced SSIS developers out there that are well used to dealing with this kind of ETL issue. Can anyone please offer me some pearls of wisdom to prevent me wasting so much time trying to get data loaded into the database correctly? I need a solution that is hands free and reliable because in some cases the data loads will be automated and continue after I've left the company.

Thanks in advance.
0
Comment
Question by:irb56
12 Comments
 
LVL 3

Expert Comment

by:adammet04
ID: 34098684
You can put in a Data Conversion step in your process (probably right after the datasource would be good.
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34099323
right click on excel connection manager,
go to properties window
add IMEX=1 to end of ConnectionString property, like this:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\testfolder\myFile.xls;Extended Properties="Excel 8.0;HDR=YES;IMEX=1";



0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34099339
note that you can set data type of this column as DT_STR or DT_WSTR in the output columns of excel source, this can be non-numeric
0
 
LVL 21

Expert Comment

by:Alpesh Patel
ID: 34102418
Hi,

If datatype of columns is not matched then use conversion task and convert appropriate to save to table.
0
 

Author Comment

by:irb56
ID: 34106943
Thanks all for the suggestions. Has anyone tried creating an xls/xlsx file from the sample given and tried importing it via SSIS? I'm not sure if it's just me but here's my observations.

My Excel source connection is:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\SQL_Stuff\TestImport.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

The Excel Source output column (id) defaults to data type double precision float.

I've added a Data Conversion transformation task in between the Excel source and the OLE DB destination, which creates a copy of id as a Unicode string.

I've put a data viewer on the pipeline between the Excel source and the Data Conversion transform. This reveals that the cell containing ABC123 comes out of the Excel source as NULL without generating any failure of the Excel source. I have tried changing the Excel Source output column from double precision float to DT_WSTR and repeated the SSIS package run but the data viewer reveals no change.

The upshot is that the OLE DB destination gets 9 rows inserted, with all numeric values preserved and the ABC123 value lost and replaced by NULL.

I've tried explicitly formatting the column in Excel to text, but this makes no difference either. The only thing that I can find to make it work properly is to save the file to CSV and import from the CSV. Is this the best solution, to always ensure such an Excel file is converted to CSV, perhaps via some tool that can automate the task if needed?
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34106967
could you upload your package and sample excel source file here? I can take a look and try to find out problem
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:irb56
ID: 34107360
Hi,

As requested, please find attached Excel file and SSIS package. I think I may have found some other solutions but haven't got any time left tonight to check them out properly. For reference, take a look at the following URL http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62697/.

Thanks for your help.

P.S. Please note that dtsx files can't be uploaded, so you'll need to manually rename the SSIS package to correct the extension.
Package1.doc
TestImport.xls
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34107525
I checked your data flow task, I put a data viewer between excel source and data conversion transformation, and everything was correct. look at attachment, the last row fetched correctly.

1.jpg
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34107573
I removed excel source and data conversion transform,
and create new excel source pointing to the excel connection manager, and this time all thing was correct, look at attachment:

2.jpg
0
 

Author Comment

by:irb56
ID: 34131390
Hi,

Thanks for looking at this. Something is different between our environments because the MS Jet driver always turns the 9th row value to NULL in my package, rendering the Data Conversion task useless. This may be because I use OpenOffice rather than proper Excel. When I create a second column (named new_id), which is derived from the id column using the IF function combined with ISNUMBER and TEXT functions, you can see from the screenshot below that this fixes the issue.

 Screenshot
I know it is also possible to bulk load data into a SQL DB from an Excel file using the script task, which may be the answer for ETL tasks involving Excel data sources that need to be automated. If someone could walk me through creating a script task to achieve this, that'll be great. I would prefer Visual Basic as the language.

Thanks.
0
 

Accepted Solution

by:
irb56 earned 0 total points
ID: 34131553
I just cracked it with the following script task (adapted from the solution I found on http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62697/)

The Excel derived column method using the TEXT function seems like the simplest approach when the data load is a one off job but for automated repeatable jobs where you cannot control the Excel source file, this script method looks like it'll do the job.


Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.Data.OleDb

Imports System.Data.SqlClient



<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _

<System.CLSCompliantAttribute(False)> _

Partial Public Class ScriptMain

	Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase



	Enum ScriptResults

		Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success

		Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure

	End Enum

	



	

	Public Sub Main()

		'

        Dim myExcelFile As OleDbConnection

        Dim mySQLDB As SqlConnection

        Dim myOLEDBCmd As OleDbCommand

        Dim myOLEDBDataReader As OleDbDataReader

        Dim myBulkCopy As SqlBulkCopy



        myExcelFile = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Test\TestImport.xls;Extended Properties=""Excel 8.0;HDR=NO;IMEX=1""")

        mySQLDB = New SqlConnection("Data Source=localhost;Initial Catalog=Test;Integrated Security=True")

        myExcelFile.Open()

        mySQLDB.Open()

        myOLEDBCmd = New OleDbCommand(Dts.Variables("QueryString").Value, myExcelFile)

        myOLEDBDataReader = myOLEDBCmd.ExecuteReader

        myBulkCopy = New SqlBulkCopy(mySQLDB)

        myBulkCopy.DestinationTableName = "TestTable"

        myBulkCopy.WriteToServer(myOLEDBDataReader)

		'

		Dts.TaskResult = ScriptResults.Success

	End Sub



End Class

Open in new window

0
 

Author Closing Comment

by:irb56
ID: 34171199
Closing this one now as I've worked out my own solution. No-one seemed to see the problem I was having which was probably due to bad communication on my part. Thanks for all who offered advice and apologies for any time wasted.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to shrink a transaction log file down to a reasonable size.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now