In a previous article I've shown you how to import data from an Excel sheet using the OPENROWSET() function
. And I concluded by stating that it's not the best option when automating your data import.
Today I'll repeat the Excel data import process by using SQL Server Integration Services, also known as SSIS.
I'll be using SQL Server 2008 R2, but I'm quite sure that the process is very similar to the first release of 2008, and even to 2005. The Excel file that I will be importing is the one used in my previous article, and I'll also refer to some parts of that article, so you may want to have a read over that one when something here isn't clear.
Furthermore I'm using a Windows 7 64-bit machine, with the ACE 14 driver
(beta) installed. To avoid any discussion about versions and for my own (future) reference, here's the result of a SELECT @@VERSION:
Microsoft SQL Server 2008 R2 (CTP) - 10.50.1352.12 (X64) Oct 30 2009 18:06:48 Copyright (c) Microsoft Corporation Enterprise Evaluation Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )
Create SSIS Package To Import Excel Data
Usually you will start by creating a new package in an Integration Services project, add an Excel source to a new Data Flow, throw in some Data Flow Transformations and end your flow with an OLE DB Destination connecting to your SQL Server.
But that's not the approach that I'll take in this article. I'll make use of a shortcut (and meanwhile I'm showing you how well integrated some components really are).
First, I'm creating a new database called ExcelImport
, using the Management Studio (aka SSMS). Once the database is created, right-click on it and choose Tasks > Import Data.
This will open up the SQL Server Import And Export Data Wizard
. Like all good wizards, it starts with a Welcome
screen containing an introductory text about its purpose - something about "create simple packages that import and export data between many popular data formats including databases, spreadsheets" - and so on. It also includes a checkbox that says "Do not show this starting page again". That's my favorite object on the page :-) Okay, I agree, the page is useful for people who have never seen the wizard before and who may have opened it up by accident, but that's about as far as its use goes methinks.
So, do whatever you like with the checkbox and click Next. That opens up the Choose a Data Source
In that screen you have several options in the Data Source dropdown. The one we're interested in is called Microsoft Excel. Once that option is selected, the controls further down the screen change into what is shown in the screenshot. Select your file and the right version, in my case I've got an Excel 2007 file. If your sheet contains a header row, activate the First row has column names
Clicking Next will open up the Choose a Destination
In that screen, select Microsoft OLE DB Provider for SQL Server
as Destination. Ensure that your SQL Server instance is the right one in the Server Name
dropdown and the Authentication
is filled out as expected. The correct database should be selected by default because we did a right-click on it to start the wizard.
Another Next click opens up the Specify Table Copy or Query
Here we can choose between either retrieving the full table - all rows, all columns - or writing a query ourselves. I'll go for the first option.
Click Next once more to open the Select Source Tables and Views
screen. That name seems a bit weird in the context of an Excel import but I guess that's not really important here. Just read it as "Select the sheet that you'd like to import".
I'll go for the ProductList$ Source. The sheet in my Excel file is called ProductList. Note that the Destination is editable - I'm changing the destination table to dbo.ProductList (with the dollar sign removed).
If you'd like to view your data right now you can hit the Preview button. It opens up a window such as this one:
A more interesting button is the one called Edit Mappings
. Clicking that button opens a screen that lets you change the destination table's schema. By default, all columns that seem to be numeric are mapped to a type of float and all the others are mapped to nvarchar with a length of 255. Depending on your situation you can either leave it as it is (in case you're just loading a staging table and want to handle data conversions later on) or you should review each column (for instance when you're erasing and loading the full table every night and this is the actual table being used by other processes).
whenever a column contains a blank cell, its type will be nvarchar(255), even when all other values are numeric. Also, if you don't need Unicode support, don't use nvarchar but change it to varchar instead.
Here's the Column Mappings
screen with some of the defaults changed:
Everything that I changed has been indicated using the yellow marker. I've changed some field types and lenghts, made one field non-nullable and adapted the destination name.
Clicking OK followed by Next brings us to the following screen, Review Data Type Mapping
This screen gives another overview of all the columns that we're planning to import. Note that each column of which we've changed the type has gotten a nice yellow warning sign. This happens because a data conversion needs to occur, and there's always something that can go wrong when converting data. The On Error
and On Truncation
columns show the action that should happen when such an event occurs. We'll leave them all at Use Global
. The Global settings are located at the bottom of the screen and are both set to Fail
. That's the best option at the moment (the only other one is Ignore but that means you won't get any notification in case of an error or truncation problem).
After clicking Next we end up at the Save and Run Package
I have activated the Save SSIS Package
checkbox and chose the Do not save sensitive data
option. By default it is saved in SQL Server, which is actually the MSDB. That's good because we're going to examine the contents of the package later on so we want to keep it.
The Run immediately
checkbox was activated by default. This will execute the package in the last step of the wizard.
Another Next click and we're on the Save SSIS Package
Here we can give our package a decent name, such as ExcelImport. You can also see the server on which I'm saving the package.
The final Next click brings us to the Complete the Wizard
We can see a short overview of what we've configured in the previous steps. Click Finish to execute and save the package!
And we've successfully executed the package!
Taking A Closer Look At The SSIS Package
When connecting to the Integration Services server (through SSMS for instance), I now have a new package in the root of the MSDB folder.
It's located there because I chose to save the package to SQL Server.
Adding An Existing Package To An SSIS Project
We're now going to open it in the Business Intelligence Development Studio (aka BIDS). So, open the BIDS and create a new SSIS Project (or use an existing one, doesn't really matter).
Once the project is open, right-click the SSIS Packages folder in the Solution Explorer and select Add Existing Package.
That opens up the Add Copy of Existing Package
do you see that SSIS Import and Export Wizard
option in the previous screenshot? That's right, the wizard that we've used extensively in the earlier part of this article can be launched from here as well.
Select SSIS Package Store
as location of the package and enter the name of your server in the second dropdown. Once that is done you can click the button with the ellipsis and select your package under the MSDB node.
Clicking OK will add the package to your project in the Solution Explorer. Double-click it to open it up.
The Control Flow
In the Control Flow you can see two components: a SQL Task that contains a CREATE TABLE statement and a Data Flow.
Here's what the CREATE TABLE statement looks like:
As you can see, the table is created using the column names and types just like we configured them through the wizard.
Important to note here is that the Control Flow does not take anything else into account. For instance, what happens if we execute the package twice? It will fail because the table already exists!
In case you don't believe me, just try it out!
The Data Flow
Opening up the Data Flow we see that it contains three components: an Excel source component that uses an Excel Connection Manager to connect to our now well-known Excel sheet, a Data Conversion Transformation to take care of the conversions that we requested and an OLE DB Destination that uses an OLE DB Connection Manager to connect to our SQL Server.
Important to note here is that whenever an issue occurs, such as a conversion problem, the flow will fail.
In production environments, certain events need to be taken into account. The purpose of this article was just to show you how you can use a wizard to generate an SSIS package for you. You can now use this package as the basis for a well-developed Excel Import template.
With this article I hope to have shown you how to use Integration Services to import Excel data, and also that the Management Studio knows how to use other SQL Server components, such as SSIS, quite well.
If you're running into some issues while using the wizard, or you just like reading what I write, check out my follow-up article covering some common pitfalls
PS: NO or YES? (sounds better when you speak it out loud ;-)
Originally appeared at my blog: http://blog.hoegaerden.be/2010/04/20/importing-excel-data-using-integration-services
Microsoft Support: How to import data from Excel to SQL Server
How to: Run the SQL Server Import and Export Wizard