Solved

BULK INSERT / FORMAT FILE / COMPUTED COLUMNS

Posted on 2016-07-27
14
30 Views
Last Modified: 2016-10-07
Simple explanation: I have a table with several computed columns. I am attempting to BULK INSERT data into the table. For simplicity sake, assume the table schema and format file look like this:
CREATE TABLE t1(
    col1 varchar(10),
    col2 varchar(10),
    col3 as col1 + '-' + col2,
    col4 varchar(10)
    col5 varchar(10)
)

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="col1" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="2" NAME="col2" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="3" NAME="col4" xsi:type="SQLVARYCHAR"/>
  <COLUMN SOURCE="4" NAME="col5" xsi:type="SQLVARYCHAR"/>
 </ROW>
</BCPFORMAT>

Open in new window

When data is loaded, it is as if col3 tries to load into the computed column (no errors) and col4 contains data for col5 and col5 is null. For each computed column in the table, data is "shifted" to the left by one column/field, which leaves the columns at the end with null values.
Any idea how to fix this in a BULK INSERT? I know I can use INSERT / SELECT FROM OPENROWSET(..., but would rather stay with the bulk insert as I'd have to revise code for 25 files/tables.

Thanks!
0
Comment
Question by:dbbishop
  • 5
  • 3
  • 2
  • +2
14 Comments
 
LVL 40

Assisted Solution

by:Sharath
Sharath earned 83 total points (awarded by participants)
Comment Utility
If possible, can you load the data with BULK INSERT into a stage table without col3 (computed column). Then use the stage table and load data into your target table.
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 166 total points (awarded by participants)
Comment Utility
With xml, it's probably easiest to create a view of the table with only non-computed columns, then import into the view.  SQL should be able to handle that just fine.

CREATE VIEW t1_import_view
AS
SELECT col1, col2, col4, col5
FROM t1
0
 
LVL 65

Assisted Solution

by:Jim Horn
Jim Horn earned 168 total points (awarded by participants)
Comment Utility
Similar to the above, how 'bout an Execute SQL statement that drops the computed columns, then do your load, then another Execute SQL statement that re-adds the computed columns?

>col3 tries to load into the computed column (no errors)
Sounds like a mapping error that can easily be avoided if the computed column doesn't exist in the first place.

>For each computed column in the table, data is "shifted" to the left by one column/field
This type of error will happen if your file uses text file delimeters, say "this is a text field", and inside that text field is the same character, say "this "comment" throws an error".
0
 
LVL 15

Author Comment

by:dbbishop
Comment Utility
So basically, are you both saying that there is no way to explicitly import into a table with computed columns?
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 166 total points (awarded by participants)
Comment Utility
I don't believe so using xml, unless it's the last column(s) in the table that you want to skip.  I guess, then, another option would be to move all computed columns to the end of the table.
0
 
LVL 65

Accepted Solution

by:
Jim Horn earned 168 total points (awarded by participants)
Comment Utility
>So basically, are you both saying that there is no way to explicitly import into a table with computed columns?
I can't speak to the XML aspects of this question, but my response is 'Yes you can, but it's not really a good idea'.

Perhaps a better idea would be to create a 'staging' data with only the columns needed, and import into that table.  Then do all your data quality checks, and when that's done INSERT from that table into your final destination table.
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 15

Author Comment

by:dbbishop
Comment Utility
I'm bulk loading into a staging table. I don't really want to create a pre-staging table, load it, then transfer data (millions of rows) into the 'staging' table. Views might be doable. Moving computed columns to the end of the table might be doable, and just changing the query to an INSERT INTO/SELECT FROM OPENROWSET(BULK...) might be the best way to go (no additional objects to maintain.

It just sucks that the documentation tells you how to skip columns in the XML, but it apparently doesn't work for computed columns. YEAH M$!!
0
 
LVL 15

Author Comment

by:dbbishop
Comment Utility
St. Jimbo: two things, the columns on the end being null are a result of the 'shifting' that is taking place because of computed columns. If there is 1 computed column in the table, the last column of the table will be NULL. If there are 4 computed columns, the last 4 will be null. It has nothing to do with delimiters in the files.
Second, you say, "I can't speak to the XML aspects of this question, but my response is 'Yes you can, but it's not really a good idea'. " Why would it not be a good idea to skip computed columns? That statement doesn't make sense to me.
0
 
LVL 142

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 83 total points (awarded by participants)
Comment Utility
according to the specs, it should work
https://msdn.microsoft.com/en-us/library/ms179250.aspx

as alternative, I would suggest a view without the computrd col, or drop the column snd add it back afterwards as brute force solution
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Btw, I suggested importing into a view w/o the computed column(s) in the very first post.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
sorry, I started eriting, and on posting, network was lost ( walking on the beach ...)
post submitted when network came back...
0
 
LVL 15

Assisted Solution

by:dbbishop
dbbishop earned 0 total points (awarded by participants)
Comment Utility
Angel,
Yeah, it should but I think computed columns throw something into the mix that is not accounted for. I've done format files in the past, but not with computed columns. I've skipped columns (and fields) before (not on this job), so when it didn't work, that link was the first place I looked.

I opted for changing the code to use INSERT INTO (column list) / SELECT * FROM OPENROWSET(BULK...)m and leaving the computed columns out of the inset list. Works like a charm.

The ETL process (in place before I got here) drops and recreates ALL staging objects tied to the client schema (tables, functions, views, etc.) on each run and revising all that code made this the simpler process. I wish I could have gotten a definitive answer regarding skipping columns when they are computed, because the standard documentation certainly does not apply.

Thanks for all the alternate suggestions. I just don't think they'd have been best in our case, although if I had to go with one, it probably would have been the views.
0
 
LVL 15

Author Comment

by:dbbishop
Comment Utility
not that I have any objection with the suggested point split, I believe my solution of using OPENROWSET with INSERT/SELECT was the most productive.

There were some good suggestions, but overall, the idea of loading 25+ million rows of data into a pre-staging staging table and then loading the staging table from that did not seem very productive. I ran several tests (the purpose of this whole 'experiment' was to optimize the current process) and that proved to be the most efficient way to process the data.  

Although it may have been beneficial to just move the computed columns to the end of the table schema, changing the schema would have been difficult, and one of the requirements was to not make schema changes if possible.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Performance is the key factor for any successful data integration project, knowing the type of transformation that you’re using is the first step on optimizing the SSIS flow performance, by utilizing the correct transformation or the design alternat…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now