Solved

SQL SSIS import fails due to bad format on rows

Posted on 2007-12-04
15
1,944 Views
Last Modified: 2013-11-30
SQL SSIS imports fail if any of the rows are misformatted.  Access simply logs the rows it cannot import and skips them.  SQL Management Studio fails the whole operation.  

Is there any way to set Management Studio up so that it just puts aside bad rows?  I know this can be done in SSIS package code, but that is a pain.  Anyways, even that fails the whole package for many format issues.

I have been looking high and low for an answer to this issue and cannot find one.

Thanks,
John
0
Comment
Question by:JFMConcepts
  • 9
  • 3
  • 3
15 Comments
 
LVL 8

Expert Comment

by:i2mental
ID: 20402677
Are you talking about SSIS or Management Studio? You mentioned both. Maybe you're talking about the import wizard?

It's only going to be available by editing the SSIS package. You can use the import wizard and save the package if you like. Firstly, you can click on the control flow step in the package that has the fail potential and hit f4 for properties. In the properties window you can set the option as to whether you should fail the package or error or not.

Secondly, you should be setting up error outputs on the package steps to account for the logging of bad data. There's not much to it once you get in the habit. It's just a matter of setting a connection in the package to where you want the bad data outputted, then connecting the red arrow of a data flow step to a destination using that connections. There are options when you connect the steps as to what kind of general errors you want sent through that data flow.
0
 

Author Comment

by:JFMConcepts
ID: 20403212
I guess I referred to SSIS as integration services, which I counted the import wizard to be a part of.  At any rate  I was looking for something that could be done right in the wizard.  It's a major time killer and pain to save the package and then open it in Visual Studio and then set it up to handle and log all of the bad rows.  Also, even when I have done that, I've experienced total failure when there is bad data, like a hex value for a line break inside a field.

These are all things that Access handles just fine, but that is only good for small datasets.  I was looking for equivalent functionality in SQL's import wizard.

Thanks,
John
0
 
LVL 8

Expert Comment

by:i2mental
ID: 20403257
There is no equivelant functionality of just cutting and pasting and creating an errors table. I use access when I'm dealing with a lot of excel import that will only be done once. If it's going to be something repeated I take the time to build a re-usable package to handle it.
0
 

Author Comment

by:JFMConcepts
ID: 20434472
These files are too large for Access to handle.  I managed this by breaking the file into 3 sections and doing each in Access, but that is a ridiculous solution to have to go through.  The problem is that there are inconsistent column delimeters in the file.  The goal is that I want to put aside the "bad" rows and import the good ones.

I saved the import wizard package as you recommended and created a new Integration Services solution.  I added the .dtsx file that I saved.  I clicked on the "Data Flow Task" and put "FailPacakgeOnFailure" to "False".  What else do I need to do?  

Here's what I get as an output:

SSIS package "TestEtsBadData.dtsx" starting.
Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.
Information: 0x40043006 at Data Flow Task, DTS.Pipeline: Prepare for Execute phase is beginning.
Information: 0x40043007 at Data Flow Task, DTS.Pipeline: Pre-Execute phase is beginning.
Information: 0x402090DC at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: The processing of file "M:\Clients\ETS\Data\Data Append 20071105\Output_110607\tmpETS_withLineBreaks_20071203.txt" has started.
Information: 0x4004300C at Data Flow Task, DTS.Pipeline: Execute phase is beginning.
Error: 0xC0202055 at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: The column delimiter for column "Column 130" was not found.
Error: 0xC0202092 at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: An error occurred while processing file "M:\Clients\ETS\Data\Data Append 20071105\Output_110607\tmpETS_withLineBreaks_20071203.txt" on data row 2.
Error: 0xC0047038 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on component "Source - tmpETS_withLineBreaks_20071203_txt" (1) returned error code 0xC0202092.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_THREADFAILED.  Thread "SourceThread0" has exited with error code 0xC0047038.  There may be error messages posted before this with more information on why the thread has exited.
Error: 0xC0047039 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_THREADCANCELLED.  Thread "WorkThread0" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.  There may be error messages posted before this with more information on why the thread was cancelled.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_THREADFAILED.  Thread "WorkThread0" has exited with error code 0xC0047039.  There may be error messages posted before this with more information on why the thread has exited.
Information: 0x40043008 at Data Flow Task, DTS.Pipeline: Post Execute phase is beginning.
Information: 0x402090DD at Data Flow Task, Source - tmpETS_withLineBreaks_20071203_txt [1]: The processing of file "M:\Clients\ETS\Data\Data Append 20071105\Output_110607\tmpETS_withLineBreaks_20071203.txt" has ended.
Information: 0x402090DF at Data Flow Task, Destination - tmpETS_withLineBreaks_20071203 [534]: The final commit for the data insertion has started.
Information: 0x402090E0 at Data Flow Task, Destination - tmpETS_withLineBreaks_20071203 [534]: The final commit for the data insertion has ended.
Information: 0x40043009 at Data Flow Task, DTS.Pipeline: Cleanup phase is beginning.
Information: 0x4004300B at Data Flow Task, DTS.Pipeline: "component "Destination - tmpETS_withLineBreaks_20071203" (534)" wrote 0 rows.
Task failed: Data Flow Task
SSIS package "TestEtsBadData.dtsx" finished: Success.
0
 

Author Comment

by:JFMConcepts
ID: 20443177
If anyone has any news on this it would be SUPER APPRECIATED.

It seems that SSIS has shifted dramatically from one bug in DTS which accepted the data and filled in the end columns that were missing with NULLs to a new bug that accounts for column delimiters ahead of row delimiters.

If the whole row is bad and dosen't fit the schema is there a way in SSIS to stop it from failing and just kick the row number aside or log it as Access does?

Thanks,
John
0
 
LVL 8

Expert Comment

by:i2mental
ID: 20443296
In your Destination object in the SSIS package, go to the Error Output section. Change the values in the Error Out columns to "Ignore". This should just bypass anything other than major unrecoverable errors.

You also have the option of using the BULK INSERT if this is just data from a text file. Using that with the MAXERRORS parameter can ignore errors as well.
0
 

Author Comment

by:JFMConcepts
ID: 20444006
I tried that, and still it failed.  The missing column delimeter is the killer.  Bulk insert fails as well, claiming the file does not exist.  I think this is actually a problem with the schema as well.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:JFMConcepts
ID: 21385998
Any takers on this issue???
0
 

Accepted Solution

by:
JFMConcepts earned 0 total points
ID: 21990305
I have found that there is no resolution on this formatting issue.  It is a nasty bug.  I wrote a program that split the file into smaller sections and then used Access to import the data into an Access table, then importing the data into SQL.  It was a huge pain in the butt and took a ton of time.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22058768
>>It is a nasty bug. <<
Actually no.  It is like that by design, for performance reasons all the rows have to match.  This is well documented and unlikely to change.

>>It was a huge pain in the butt and took a ton of time.<<
Yep.  That is the way it is when you have bad data.
0
 

Author Comment

by:JFMConcepts
ID: 22072010
I'll explain a little more completely.

As with most things, Microsoft calls it a feature and I am aware it is by design and for performance purposes.  That said, while this does speed performance when data is well formatted, it fails entire packages for bad rows.  The reality is that data is imperfect and poorly formatted rows occur.  The main delimeter for a row is only discernable if you read the line through regardless of the finding that the column delimeters are not all present or that there are too many.  This is why the performance issue comes in.  DTS in 2000 first read the whole set and found the row delimeters.  Then it reread the data and parsed the columns, causing 2 reads of the data to import it.  2005's SSIS begins with column delimeters and then requires the row delimeter be in the same place with the same number of column delimeters for each row.  Any deviation in this and the entire package fails, not just that row.

You can call this a "feature" or you can call it a bug.  I call it a bug because 2005 claims that it can handle bad rows and in fact has robust options for what to do with bad rows.  The problem is that it is unable to distinguish the rows because of the choice to make a single read, and in fact does not account for all the scenarios that make a row "bad".  Even with the single read, if an end of row delimeter was found prematurely or even late, failing the package is not necessary.  That row could simply be flagged as "bad" and handled according to the SSIS bad row options.  This is an oversight on the part of the progamming team that designed SSIS.  I appreciate the gain in efficency, provided that it achieves the task at hand and improves the software.  This change however removes capability which previously exisited in the name of efficency.  Efficency in reading the data set is worthless to me if it actually costs me more time and reduces the value of the tool.

Microsoft already had a proper parsing routine to handle this situation.  DTS 2000 uses it and Access does it even better than that.  The proper process would be to catch the error and isolate the bad data section and move on without failing the entire package.

It was huge pain in the butt, and frankly didn't need to be.  The answer is not "that's the way it is".  The answer is for developers at Microsoft to make better decisions based on the reality of requirements and existing conditions.  They also would be wise to increase the value of their future tools rather than throwing the baby out with the bathwater so to speak.
0
 

Author Comment

by:JFMConcepts
ID: 22072040
By the way, it is also a bug because the error message does not in any way reflect the problem.  Finding out that the rows don't line up with the same number of columns is difficult on large files.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22075347
>>They also would be wise to increase the value of their future tools rather than throwing the baby out with the bathwater so to speak.<<
You can call them many things and I am sure you will recall clearly when MS "abandoned" VB.  The truth of the matter is that if you ahve followed MS for more than a couple of years, they are an incredibly astute company and obviously made the decision (wise IMHO) to get better performance at the expense of not being able to convert bad data.  Frankly, I am OK with that and I have close to 100 DTS packages that need to be converted to SSIS.

>>By the way, it is also a bug because the error message...<<
Again, you can call it whatever you consider best, trust me, I am sure they have heard worse.  But then when you stop and think, you will perhaps realize that MS is unlikely to go back to the lax way they were doing things with DTS, whether you like it or not.  That is when you come to terms with what choices you have and make a pragmatic judgement call.  But I believe I am preaching to the crowd as you appear to have already made that decision and moved on.

There is one other course you could have taken, which in my view is short-sighted, and that is to continue to use DTS in SQL Server 2005.  The reason, I say it is short-sighted is becuase it is a deprecated tool and you should not expect any support for it in SQL Server 2008.

I wish you the best of luck.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 22075372
P.S. As a Microsoft SQL Server MVP, I can contact the product development group and confirm whether they have any plans to change the design in SSIS, but I would not hold your breath.
0
 

Author Comment

by:JFMConcepts
ID: 22079181
I certainly don't mind them pulling up the reigns when there is bad data present.  That is much better than letting it go through silently as it did in DTS 2000.  There is no doubt that SSIS is far better than DTS was.  My request would be twofold:

1. Display the proper error message (e.g. The number of columns found in each row do not match; the file is improperly formatted)

2. Have an option to configure the package to continue until the next row delimeter regardless of its position and flag the row as bad and kick it to the bad row handling options.

I'm not suggesting that they let bad data in, but certainly give options to handle it and throw accurate error messages.

Thanks,
John
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now