Solved

SQL SSIS import fails due to bad format on rows

Posted on 2007-12-04
15
2,017 Views
Last Modified: 2013-11-30
SQL SSIS imports fail if any of the rows are misformatted.  Access simply logs the rows it cannot import and skips them.  SQL Management Studio fails the whole operation.  

Is there any way to set Management Studio up so that it just puts aside bad rows?  I know this can be done in SSIS package code, but that is a pain.  Anyways, even that fails the whole package for many format issues.

I have been looking high and low for an answer to this issue and cannot find one.

Thanks,
John
0
Comment
Question by:JFMConcepts
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 3
  • 3
15 Comments
 
LVL 8

Expert Comment

by:i2mental
ID: 20402677
Are you talking about SSIS or Management Studio? You mentioned both. Maybe you're talking about the import wizard?

It's only going to be available by editing the SSIS package. You can use the import wizard and save the package if you like. Firstly, you can click on the control flow step in the package that has the fail potential and hit f4 for properties. In the properties window you can set the option as to whether you should fail the package or error or not.

Secondly, you should be setting up error outputs on the package steps to account for the logging of bad data. There's not much to it once you get in the habit. It's just a matter of setting a connection in the package to where you want the bad data outputted, then connecting the red arrow of a data flow step to a destination using that connections. There are options when you connect the steps as to what kind of general errors you want sent through that data flow.
0
 

Author Comment

by:JFMConcepts
ID: 20403212
I guess I referred to SSIS as integration services, which I counted the import wizard to be a part of.  At any rate  I was looking for something that could be done right in the wizard.  It's a major time killer and pain to save the package and then open it in Visual Studio and then set it up to handle and log all of the bad rows.  Also, even when I have done that, I've experienced total failure when there is bad data, like a hex value for a line break inside a field.

These are all things that Access handles just fine, but that is only good for small datasets.  I was looking for equivalent functionality in SQL's import wizard.

Thanks,
John
0
 
LVL 8

Expert Comment

by:i2mental
ID: 20403257
There is no equivelant functionality of just cutting and pasting and creating an errors table. I use access when I'm dealing with a lot of excel import that will only be done once. If it's going to be something repeated I take the time to build a re-usable package to handle it.
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 

Author Comment

by:JFMConcepts
ID: 20434472
These files are too large for Access to handle.  I managed this by breaking the file into 3 sections and doing each in Access, but that is a ridiculous solution to have to go through.  The problem is that there are inconsistent column delimeters in the file.  The goal is that I want to put aside the "bad" rows and import the good ones.

I saved the import wizard package as you recommended and created a new Integration Services solution.  I added the .dtsx file that I saved.  I clicked on the "Data Flow Task" and put "FailPacakgeOnFailure" to "False".  What else do I need to do?  

Here's what I get as an output:

SSIS package "TestEtsBadData.dtsx" starting.
Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.
Information: 0x40043006 at Data Flow Task, DTS.Pipeline: Prepare for Execute phase is beginning.
Information: 0x40043007 at Data Flow Task, DTS.Pipeline: Pre-Execute phase is beginning.
Information: 0x402090DC at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: The processing of file "M:\Clients\ETS\Data\Data Append 20071105\Output_110607\tmpETS_withLineBreaks_20071203.txt" has started.
Information: 0x4004300C at Data Flow Task, DTS.Pipeline: Execute phase is beginning.
Error: 0xC0202055 at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: The column delimiter for column "Column 130" was not found.
Error: 0xC0202092 at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: An error occurred while processing file "M:\Clients\ETS\Data\Data Append 20071105\Output_110607\tmpETS_withLineBreaks_20071203.txt" on data row 2.
Error: 0xC0047038 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on component "Source - tmpETS_withLineBreaks_20071203_txt" (1) returned error code 0xC0202092.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_THREADFAILED.  Thread "SourceThread0" has exited with error code 0xC0047038.  There may be error messages posted before this with more information on why the thread has exited.
Error: 0xC0047039 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_THREADCANCELLED.  Thread "WorkThread0" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.  There may be error messages posted before this with more information on why the thread was cancelled.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_THREADFAILED.  Thread "WorkThread0" has exited with error code 0xC0047039.  There may be error messages posted before this with more information on why the thread has exited.
Information: 0x40043008 at Data Flow Task, DTS.Pipeline: Post Execute phase is beginning.
Information: 0x402090DD at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: The processing of file "M:\Clients\ETS\Data\Data Append 20071105\Output_110607\tmpETS_withLineBreaks_20071203.txt" has ended.
Information: 0x402090DF at Data Flow Task, Destination - tmpETS_withLineBreaks_20071203 [534]: The final commit for the data insertion has started.
Information: 0x402090E0 at Data Flow Task, Destination - tmpETS_withLineBreaks_20071203 [534]: The final commit for the data insertion has ended.
Information: 0x40043009 at Data Flow Task, DTS.Pipeline: Cleanup phase is beginning.
Information: 0x4004300B at Data Flow Task, DTS.Pipeline: "component "Destination - tmpETS_withLineBreaks_20071203" (534)" wrote 0 rows.
Task failed: Data Flow Task
SSIS package "TestEtsBadData.dtsx" finished: Success.
0
 

Author Comment

by:JFMConcepts
ID: 20443177
If anyone has any news on this it would be SUPER APPRECIATED.

It seems that SSIS has shifted dramatically from one bug in DTS which accepted the data and filled in the end columns that were missing with NULLs to a new bug that accounts for column delimiters ahead of row delimiters.

If the whole row is bad and dosen't fit the schema is there a way in SSIS to stop it from failing and just kick the row number aside or log it as Access does?

Thanks,
John
0
 
LVL 8

Expert Comment

by:i2mental
ID: 20443296
In your Destination object in the SSIS package, go to the Error Output section. Change the values in the Error Out columns to "Ignore". This should just bypass anything other than major unrecoverable errors.

You also have the option of using the BULK INSERT if this is just data from a text file. Using that with the MAXERRORS parameter can ignore errors as well.
0
 

Author Comment

by:JFMConcepts
ID: 20444006
I tried that, and still it failed.  The missing column delimeter is the killer.  Bulk insert fails as well, claiming the file does not exist.  I think this is actually a problem with the schema as well.
0
 

Author Comment

by:JFMConcepts
ID: 21385998
Any takers on this issue???
0
 

Accepted Solution

by:
JFMConcepts earned 0 total points
ID: 21990305
I have found that there is no resolution on this formatting issue.  It is a nasty bug.  I wrote a program that split the file into smaller sections and then used Access to import the data into an Access table, then importing the data into SQL.  It was a huge pain in the butt and took a ton of time.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22058768
>>It is a nasty bug. <<
Actually no.  It is like that by design, for performance reasons all the rows have to match.  This is well documented and unlikely to change.

>>It was a huge pain in the butt and took a ton of time.<<
Yep.  That is the way it is when you have bad data.
0
 

Author Comment

by:JFMConcepts
ID: 22072010
I'll explain a little more completely.

As with most things, Microsoft calls it a feature and I am aware it is by design and for performance purposes.  That said, while this does speed performance when data is well formatted, it fails entire packages for bad rows.  The reality is that data is imperfect and poorly formatted rows occur.  The main delimeter for a row is only discernable if you read the line through regardless of the finding that the column delimeters are not all present or that there are too many.  This is why the performance issue comes in.  DTS in 2000 first read the whole set and found the row delimeters.  Then it reread the data and parsed the columns, causing 2 reads of the data to import it.  2005's SSIS begins with column delimeters and then requires the row delimeter be in the same place with the same number of column delimeters for each row.  Any deviation in this and the entire package fails, not just that row.

You can call this a "feature" or you can call it a bug.  I call it a bug because 2005 claims that it can handle bad rows and in fact has robust options for what to do with bad rows.  The problem is that it is unable to distinguish the rows because of the choice to make a single read, and in fact does not account for all the scenarios that make a row "bad".  Even with the single read, if an end of row delimeter was found prematurely or even late, failing the package is not necessary.  That row could simply be flagged as "bad" and handled according to the SSIS bad row options.  This is an oversight on the part of the progamming team that designed SSIS.  I appreciate the gain in efficency, provided that it achieves the task at hand and improves the software.  This change however removes capability which previously exisited in the name of efficency.  Efficency in reading the data set is worthless to me if it actually costs me more time and reduces the value of the tool.

Microsoft already had a proper parsing routine to handle this situation.  DTS 2000 uses it and Access does it even better than that.  The proper process would be to catch the error and isolate the bad data section and move on without failing the entire package.

It was huge pain in the butt, and frankly didn't need to be.  The answer is not "that's the way it is".  The answer is for developers at Microsoft to make better decisions based on the reality of requirements and existing conditions.  They also would be wise to increase the value of their future tools rather than throwing the baby out with the bathwater so to speak.
0
 

Author Comment

by:JFMConcepts
ID: 22072040
By the way, it is also a bug because the error message does not in any way reflect the problem.  Finding out that the rows don't line up with the same number of columns is difficult on large files.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22075347
>>They also would be wise to increase the value of their future tools rather than throwing the baby out with the bathwater so to speak.<<
You can call them many things and I am sure you will recall clearly when MS "abandoned" VB.  The truth of the matter is that if you ahve followed MS for more than a couple of years, they are an incredibly astute company and obviously made the decision (wise IMHO) to get better performance at the expense of not being able to convert bad data.  Frankly, I am OK with that and I have close to 100 DTS packages that need to be converted to SSIS.

>>By the way, it is also a bug because the error message...<<
Again, you can call it whatever you consider best, trust me, I am sure they have heard worse.  But then when you stop and think, you will perhaps realize that MS is unlikely to go back to the lax way they were doing things with DTS, whether you like it or not.  That is when you come to terms with what choices you have and make a pragmatic judgement call.  But I believe I am preaching to the crowd as you appear to have already made that decision and moved on.

There is one other course you could have taken, which in my view is short-sighted, and that is to continue to use DTS in SQL Server 2005.  The reason, I say it is short-sighted is becuase it is a deprecated tool and you should not expect any support for it in SQL Server 2008.

I wish you the best of luck.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22075372
P.S. As a Microsoft SQL Server MVP, I can contact the product development group and confirm whether they have any plans to change the design in SSIS, but I would not hold your breath.
0
 

Author Comment

by:JFMConcepts
ID: 22079181
I certainly don't mind them pulling up the reigns when there is bad data present.  That is much better than letting it go through silently as it did in DTS 2000.  There is no doubt that SSIS is far better than DTS was.  My request would be twofold:

1. Display the proper error message (e.g. The number of columns found in each row do not match; the file is improperly formatted)

2. Have an option to configure the package to continue until the next row delimeter regardless of its position and flag the row as bad and kick it to the bad row handling options.

I'm not suggesting that they let bad data in, but certainly give options to handle it and throw accurate error messages.

Thanks,
John
0

Featured Post

The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
I have a large data set and a SSIS package. How can I load this file in multi threading?
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

622 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question