canuckconsulting
asked on
Using SSIS to process Excel where data not in columns
We have a folder full of excel files in a similar format shown in the image below. The excel includes both text in Mandarin as well as photos which we need to capture into a DB. What is the best way to do this?
DailyInspect.png
DailyInspect.png
ASKER
Thanks Jim.
The bad news is the file is generated from a 3rd party system so we can't get it in another format. The good news is as it's generated it is consistent. So we can rely on data being consistent.
The bad news is the file is generated from a 3rd party system so we can't get it in another format. The good news is as it's generated it is consistent. So we can rely on data being consistent.
Consistency is the key. By default, the Excel source sniffs - I believe - the first 8 rows of data and makes a decision for you about data types. There are several workarounds, but the best solution, if you feel you must use the Excel source, is to make sure that the first rows of the worksheet are representative of the entire worksheet.
There may actually be a way to tell the Excel source which types to use, but I personally quit using that type of connection rather than deal with the complexity. If there is a way, it may be a case of the cure being worse than the disease.
If the Excel source chooses the right types for you, an SSIS task/data flow should be simple to construct.
In a production batch load scenario you may have an alternative to using SSIS to import the data. You can read from Excel directly into SQL Server using OPEN ROWSET. I have done this before, successfully, and precise control over the column data types is possible. You must, however, locate the working directory somewhere where the SQL Server service has read permissions, and it should preferably be on a server, not a workstation. If that is something you want to experiment with, I should be able to locate an example.
There may actually be a way to tell the Excel source which types to use, but I personally quit using that type of connection rather than deal with the complexity. If there is a way, it may be a case of the cure being worse than the disease.
If the Excel source chooses the right types for you, an SSIS task/data flow should be simple to construct.
In a production batch load scenario you may have an alternative to using SSIS to import the data. You can read from Excel directly into SQL Server using OPEN ROWSET. I have done this before, successfully, and precise control over the column data types is possible. You must, however, locate the working directory somewhere where the SQL Server service has read permissions, and it should preferably be on a server, not a workstation. If that is something you want to experiment with, I should be able to locate an example.
By the way, I am not exactly sure how picture data is represented through an ODBC connection to a workbook. The pictures may appear as their own worksheets, or something like that. So there may really be two problems to solve, finding all the data and ensuring that the source data types are correct. If you provide a sample, I could take a peek into what the Excel source does with it.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
The problem with Excel as a data source is that users can edit it a thousand ways that would cause a well-defined source-to-target mapping, which is what every ETL tool needs, to fail. Most ETL developers prefer not to babysit source files to scrub them clean whenever an automated import fails.