Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Importing data with mixed types

Posted on 2010-11-09
12
Medium Priority
?
1,439 Views
Last Modified: 2013-11-10
Hi,

I seem to spend a lot of time wrestling with SSIS data flows that have Excel sources. The problem comes where a column contains mixed data types. Specifically a recurring scenario is where an Excel worksheet that I get asked to import into the database contains a column with mostly numeric values but in fact the field in question is a string data type, with a smaller number of alpha numeric codes in later rows.

I understand that Excel helpfully(!?) determines the data type based on a sampling of the first few values (the first 8 by default I think). I also know about adding the IMEX=1 option to the data connection string to inhibit this behaviour. However, I think the IMEX option is of limited value because my observation is that Excel still decides on double float unless it finds at least one non numeric value in its sampling phase. The trouble is that the Excel files I'm asked to import contain thousands of rows and you never know where non numeric data may lurk.

By way of example, consider a very basic Excel worksheet with a single column heading and 9 values, the last of which is non numeric:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
ABC123

I can run a SSIS data flow that happily puts all 9 rows into a SQL table with the following schema definition:
CREATE TABLE TestTable (
id nvarchar(50) null)

There is no error in the SSIS package but the SQL table looks like this after the load:
id
1000001
1000002
1000003
1000004
1000005
1000006
1000007
1000008
NULL

I'm sure there are some experienced SSIS developers out there that are well used to dealing with this kind of ETL issue. Can anyone please offer me some pearls of wisdom to prevent me wasting so much time trying to get data loaded into the database correctly? I need a solution that is hands free and reliable because in some cases the data loads will be automated and continue after I've left the company.

Thanks in advance.
0
Comment
Question by:irb56
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
12 Comments
 
LVL 3

Expert Comment

by:adammet04
ID: 34098684
You can put in a Data Conversion step in your process (probably right after the datasource would be good.
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34099323
right click on excel connection manager,
go to properties window
add IMEX=1 to end of ConnectionString property, like this:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\testfolder\myFile.xls;Extended Properties="Excel 8.0;HDR=YES;IMEX=1";



0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34099339
note that you can set data type of this column as DT_STR or DT_WSTR in the output columns of excel source, this can be non-numeric
0
Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

 
LVL 21

Expert Comment

by:Alpesh Patel
ID: 34102418
Hi,

If datatype of columns is not matched then use conversion task and convert appropriate to save to table.
0
 

Author Comment

by:irb56
ID: 34106943
Thanks all for the suggestions. Has anyone tried creating an xls/xlsx file from the sample given and tried importing it via SSIS? I'm not sure if it's just me but here's my observations.

My Excel source connection is:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\SQL_Stuff\TestImport.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

The Excel Source output column (id) defaults to data type double precision float.

I've added a Data Conversion transformation task in between the Excel source and the OLE DB destination, which creates a copy of id as a Unicode string.

I've put a data viewer on the pipeline between the Excel source and the Data Conversion transform. This reveals that the cell containing ABC123 comes out of the Excel source as NULL without generating any failure of the Excel source. I have tried changing the Excel Source output column from double precision float to DT_WSTR and repeated the SSIS package run but the data viewer reveals no change.

The upshot is that the OLE DB destination gets 9 rows inserted, with all numeric values preserved and the ABC123 value lost and replaced by NULL.

I've tried explicitly formatting the column in Excel to text, but this makes no difference either. The only thing that I can find to make it work properly is to save the file to CSV and import from the CSV. Is this the best solution, to always ensure such an Excel file is converted to CSV, perhaps via some tool that can automate the task if needed?
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34106967
could you upload your package and sample excel source file here? I can take a look and try to find out problem
0
 

Author Comment

by:irb56
ID: 34107360
Hi,

As requested, please find attached Excel file and SSIS package. I think I may have found some other solutions but haven't got any time left tonight to check them out properly. For reference, take a look at the following URL http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62697/.

Thanks for your help.

P.S. Please note that dtsx files can't be uploaded, so you'll need to manually rename the SSIS package to correct the extension.
Package1.doc
TestImport.xls
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34107525
I checked your data flow task, I put a data viewer between excel source and data conversion transformation, and everything was correct. look at attachment, the last row fetched correctly.

1.jpg
0
 
LVL 30

Expert Comment

by:Reza Rad
ID: 34107573
I removed excel source and data conversion transform,
and create new excel source pointing to the excel connection manager, and this time all thing was correct, look at attachment:

2.jpg
0
 

Author Comment

by:irb56
ID: 34131390
Hi,

Thanks for looking at this. Something is different between our environments because the MS Jet driver always turns the 9th row value to NULL in my package, rendering the Data Conversion task useless. This may be because I use OpenOffice rather than proper Excel. When I create a second column (named new_id), which is derived from the id column using the IF function combined with ISNUMBER and TEXT functions, you can see from the screenshot below that this fixes the issue.

 Screenshot
I know it is also possible to bulk load data into a SQL DB from an Excel file using the script task, which may be the answer for ETL tasks involving Excel data sources that need to be automated. If someone could walk me through creating a script task to achieve this, that'll be great. I would prefer Visual Basic as the language.

Thanks.
0
 

Accepted Solution

by:
irb56 earned 0 total points
ID: 34131553
I just cracked it with the following script task (adapted from the solution I found on http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/62697/)

The Excel derived column method using the TEXT function seems like the simplest approach when the data load is a one off job but for automated repeatable jobs where you cannot control the Excel source file, this script method looks like it'll do the job.


Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Data.OleDb
Imports System.Data.SqlClient

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
	Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

	Enum ScriptResults
		Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
		Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
	End Enum
	

	
	Public Sub Main()
		'
        Dim myExcelFile As OleDbConnection
        Dim mySQLDB As SqlConnection
        Dim myOLEDBCmd As OleDbCommand
        Dim myOLEDBDataReader As OleDbDataReader
        Dim myBulkCopy As SqlBulkCopy

        myExcelFile = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Test\TestImport.xls;Extended Properties=""Excel 8.0;HDR=NO;IMEX=1""")
        mySQLDB = New SqlConnection("Data Source=localhost;Initial Catalog=Test;Integrated Security=True")
        myExcelFile.Open()
        mySQLDB.Open()
        myOLEDBCmd = New OleDbCommand(Dts.Variables("QueryString").Value, myExcelFile)
        myOLEDBDataReader = myOLEDBCmd.ExecuteReader
        myBulkCopy = New SqlBulkCopy(mySQLDB)
        myBulkCopy.DestinationTableName = "TestTable"
        myBulkCopy.WriteToServer(myOLEDBDataReader)
		'
		Dts.TaskResult = ScriptResults.Success
	End Sub

End Class

Open in new window

0
 

Author Closing Comment

by:irb56
ID: 34171199
Closing this one now as I've worked out my own solution. No-one seemed to see the problem I was having which was probably due to bad communication on my part. Thanks for all who offered advice and apologies for any time wasted.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question