Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 944
  • Last Modified:

Columns in delimited file may vary when input to SSIS

Hi everyone.

A delimited file is being sent to us from another company. The file is supposed to have 10 columns in each row. We are going to process the file using SSIS (2005).

Question - how do we handle the file if some of the rows are bad - are missing one or more columns?

If my package reads the file using a flat file source, when I run the package with a file where some of the rows have fewer than the expected 10 columns, my package abends on the flat file source task.

All we can think of doing is writing .net code to process the file as the first step of the package or even outside the package, to remove bad rows from the file before it hits SSIS.  

Thank you.
0
glen_esq
Asked:
glen_esq
  • 2
1 Solution
 
kselviaRetiredCommented:
not a great solution but here goes

Rather than writing your own, download gawk.exe (google for it) to search for rows with 10 commas in it and redirect the output to a good input file, then load that file.

Or bulk insert each row into a single varchar column table.

bulk insert mytable
from 'c:\path\file.csv'

create view v_my_good_table
select * from mytable where len(replace(dta,',')) = len(dta) - 10

exec master..xp_cmdshell "select * from v_my_good_table" queryout "goodfile.csv" -S server -U user -P pwd '

Then load goodfile.csv using your package

Or -- just parse the single column data using charindex and substring and load the columns yourself for rows that are correct



0
 
yellowjetskiCommented:
You can try to use the Exception Row feature in SSIS to handle the bad rows.

Overview: SSIS will read each row assuming it qualifies to the pre-defined columns. When there's a bad row of any reason (incorrect number of columns, invalid data type, etc.) You can have SSIS write the bad row only to a exception file and the package continues to process the next line without failing the whole package.

Let me know if you have any questions regarding my solution.
0
 
glen_esqAuthor Commented:
Thank you both very much for your responses.

yellowjetski - more info please on your solution. I have been testing with a very simple package. A flat file data source which reads my delimited file, and on its failure precedence constraint, I have attached a flat file destination.

If I remove the column delimiter (in our case a tilde '~') and the corresponding data, in other words, my input file has only 9 columns instead of 10, the flat file source bombs, the package proceeds no further - the tlat file source turns red, that's all she wrote.

Can you give me more info on the Exception Row feature? To which type of Control Flow or Data Flow task are you referring?
0
 
glen_esqAuthor Commented:
Should have added more info - If I set both Error and Truncation error conditions to redirect row, my package will not fail (perhaps this is the Exception Row you are referring to), however, SSIS doesn't handle this as well as I'd like.

I have 8 rows of input, the 6th row I removed one column and its associated delimiter leaving it with 6 columns. I have data viewers on the success and failure constraints from my flat file source. Rows 1 to 5 and 8 are processed successfully, and rows 6 and 7 go to the flat file destination specified on the failure constraint of my flat file source. I'm losing the 6th row (which is missing a column) and the 7th row which contains good data, but SSIS grabs anyway). I don't want to lose the 7th row in these situations.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now