I am developing a SSIS package to perform an ETL operation on an external Excel file, loading data from it each day into a SQL table. The Excel file is updated by users and has a few characteristics that my SSIS package needs to handle. Firstly, it has a column named [Description] which contains claim references. The format of the claim number varies. At the top of the file most of the claim references present as 4 digit integers (e.g. 1234) but further down this gives way to a longer format that is composed of three numbers separated by the forward slash character "/" (e.g. 2010/123456/01). Secondly, the users frequently skip rows when inputting new records to create logical record groups. All the data is held in [Sheet1$] and there are a total of 3234 rows in [Sheet1$] including the header row at the top and all the intermittent blank rows within the records. When I make a copy of the Excel file and sort on [Description] I can see that there are 2247 records with a claim reference that is not blank and these are the records I want to pull into the SQL table, leaving the others behind.
I know that Excel tries to guess data types, which is unhelpful to my SSIS data conversion. Therefore, using a tip from one of the experts on this forum from a previous question, I have modified the Excel connection string to use the IMEX=1 option:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\\EDL-PDC\All\Claims Banking.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
Next I wrote the following query for my Excel source to filter out the unwanted records with a NULL claim reference:
[Description] AS claim_no,
[Date] AS recovery_date,
[Amount] AS amount
WHERE [Description] IS NOT NULL;
Next I placed a Data Conversion step in my data flow in between the Excel data source and OLE DB destination. The Data Conversion makes a copy of the Description (claim_no) column as type string with length 30.
Finally the claim_no copy column from the Data Conversion is mapped to my claim_no column in the SQL table, which is of type varchar(30).
When I run the package, it completes without error but loads only 875 rows into SQL. All of these rows have a claim_no in the format XXXX (e.g. 1234). When I remove the WHERE clause from my Excel source SQL command text I get 3233 rows but most of the claim numbers have converted to NULL.
Is anyone able to offer a likely explanation for this and a fix? I should mention that the ETL involves two other columns from the Excel file (recovery_date and amount) both of which are included in the Data Conversion as types date and currency. There are also redundant columns in the Excel file that I ignore in the Excel source SQL command text.
Any helpful pointers to where I may be going wrong will be gratefully received.
Many thanks in advance.
P.S. I've posted this in the MS SQL Server 2008 zone (I'm using SQL Server 2008 R2) because I cannot find a SSIS zone (the search for SSIS or Integration Services in the Zones area yields nothing). Please put me straight if there is a dedicated SSIS zone so I may use this in the future.