Link to home
Start Free TrialLog in
Avatar of canuckconsulting
canuckconsultingFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Using SSIS to process Excel where data not in columns

We have a folder full of excel files in a similar format shown in the image below.  The excel includes both text in Mandarin as well as photos which we need to capture into a DB.  What is the best way to do this?
DailyInspect.png
Avatar of Jim Horn
Jim Horn
Flag of United States of America image

Yuk.  I'd avoid this like the plague, and instead request that the technical contact of this source data send you a well-defined .csv file with this data.

The problem with Excel as a data source is that users can edit it a thousand ways that would cause a well-defined source-to-target mapping, which is what every ETL tool needs, to fail.  Most ETL developers prefer not to babysit source files to scrub them clean whenever an automated import fails.
Avatar of canuckconsulting

ASKER

Thanks Jim.

The bad news is the file is generated from a 3rd party system so we can't get it in another format.  The good news is as it's generated it is consistent.  So we can rely on data being consistent.
Consistency is the key. By default, the Excel source sniffs - I believe - the first 8 rows of data and makes a decision for you about data types. There are several workarounds, but the best solution, if you feel you must use the Excel source, is to make sure that the first rows of the worksheet are representative of the entire worksheet.

There may actually be a way to tell the Excel source which types to use, but I personally quit using that type of connection rather than deal with the complexity. If there is a way, it may be a case of the cure being worse than the disease.

If the Excel source chooses the right types for you, an SSIS task/data flow should be simple to construct.

In a production batch load scenario you may have an alternative to using SSIS to import the data. You can read from Excel directly into SQL Server using OPEN ROWSET. I have done this before, successfully, and precise control over the column data types is possible. You must, however, locate the working directory somewhere where the SQL Server service has read permissions, and it should preferably be on a server, not a workstation. If that is something you want to experiment with, I should be able to locate an example.
By the way, I am not exactly sure how picture data is represented through an ODBC connection to a workbook. The pictures may appear as their own worksheets, or something like that. So there may really be two problems to solve, finding all the data and ensuring that the source data types are correct. If you provide a sample, I could take a peek into what the Excel source does with it.
ASKER CERTIFIED SOLUTION
Avatar of Megan Brooks
Megan Brooks
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial