asked on

Insert CSV file into database

I have a CSV file, sample attached, that I want to bulk upload into a SQL 7 database. I know how to do a bulk upload but on this occassion the issue I have is in how to insert the 12th column of data (75 76 15 61 17 7 16 82 31 56 86) seperately, which is data seperated by the spaces, based on the 13th column ID (127952-001).
E.G.
insert into the anal_06 table into the acode column, 75 (from column 12) and into the patno_anal column of the anal_06 table, 127952-001 (from column 13)
then
insert into the anal_06 table into the acode column, 76 and into the patno_anal column of the anal_06 table, 127952-001
and so on and so on until all inserted

Any ideas?
f1011db.txt

nmcdermaid

I don't quite understand your logic. Is that two rows in the source that you have described above? If so, there is no special logic there; column 12 and column 13 are going into the same column in the same table every time.
Perhaps you could explain a little further.
If there is in fact some special logic, you have a couple of options:
1. If the logic is at row level, and all data stays in the same table (just a different field) then you may be able to create a calculated column in the table
2. Create a trigger on the destination table which farms out column 12 as required and bulk load into the table (this is not recommended as a trigger will can kill the bulk load performance and make things complicated)
3. Bulk load into a staging table and then run a stored procedure afterwards which performs your logic.
4. Do it all in SSIS

UOC

ASKER

I want to insert each number as seperated by a space from column 12 into its own row in the anal_06 table -

75, 127952-001
76, 127952-001
...
...
...

What I don't know how to do is differentiate each number on column 12 using the spaces.

I hope that is clearer.

nmcdermaid

In your sample text file, you have data like this at the end:
86 127952-001
86 127954-001
In your example just then you have commas.
So I will work from your example file.
I assume the number 86 is column 12 and 127952-001 is column 13?
What do you mean 'as seperated by a space' if this space is in the source file can you show me where it is by replacing it with an underscore. If you want to convert the source data and add a space to it can you let me know?
Do two files in the source text file result in only two rows in the destination table?

UOC

ASKER

Column 12 isn't just number 86 but 75 76 15 61 17 7 16 82 31 56 86.
From that you can see all the numbers are seperated by a space - 75 space 76 space 15 space etc right through to 86.
As mentioned above I want to insert each number from column 12 into its own row in the anal_06 table of the database with the patient number (column 13) -
acode patno_anal
75 127952-001
76 127952-001
15 127952-001
etc

I hope this is clear.

Sharath S

1) What is your SQL Server version?
2) Column 11 has 11 numbers (75 76 15 61 17 7 16 82 31 56 86). Is it fixed (the number of values)?
My suggestion is first bulk insert the data into a temp table and do the processing (as you want record for each value in your 11 th column). After implementing your logic, you can insert the records into your actual table.
Answer my questions and let me know if you need assisitance in implementing the logic.

UOC

ASKER

1) SQL 7
2) No, it is not fixed, it could be between 1 and 20.

I still nedd help with this as I can't see how to differentiate each number in column 12 and insert them in the table.

nmcdermaid

So one line in your text file can result in many lines being written to table. You will definitely need to bulk load into a staging table and then use a stored procedure to unpivot into rows. Are you happy with this process? You can include the BULK INSERT within the stored procedure so its all done in one place.

UOC

ASKER

I know how to bulk insert into a table but need help getting the individual numbers from the column (column 12) into the final table as seperate entries.

Sharath S

Is it CSV file or a text file?
The attachment is a text file but you mentioned as CSV.

UOC

ASKER

It is csv.

nmcdermaid

The SQL of your current bulk insert table would help but I'll show you how to do it with a generic table. You simply use a cross join to turn one row into 12, then use a case statement to put the different columns into one row. I will go through this bit by bit so open up Query Analyzer and do the following:

Run this:

SELECT RowNumber
FROM
(
SELECT 1 As RowNumber
UNION
SELECT 2 As RowNumber
UNION
SELECT 3 As RowNumber
UNION
SELECT 4 As RowNumber
UNION
SELECT 5 As RowNumber
UNION
SELECT 6 As RowNumber
UNION
SELECT 7 As RowNumber
UNION
SELECT 8 As RowNumber
UNION
SELECT 9 As RowNumber
UNION
SELECT 10 As RowNumber
UNION
SELECT 11 As RowNumber
UNION
SELECT 12 As RowNumber
) UnpivotTable

Note that it just returns a table with twelve rows. Alternatively you could create a permanent table with this data in it (its much neater) Now cross join that to the table that you bulk insert into:

SELECT UnpivotTable.RowNumber, B.*
FROM YourBulkInsertTable B,
(
SELECT 1 As RowNumber
UNION
SELECT 2 As RowNumber
UNION
SELECT 3 As RowNumber
UNION
SELECT 4 As RowNumber
UNION
SELECT 5 As RowNumber
UNION
SELECT 6 As RowNumber
UNION
SELECT 7 As RowNumber
UNION
SELECT 8 As RowNumber
UNION
SELECT 9 As RowNumber
UNION
SELECT 10 As RowNumber
UNION
SELECT 11 As RowNumber
UNION
SELECT 12 As RowNumber
) UnpivotTable

Note that this has basically multipled your bulk insert table by twelve, as we haven't specified a join between the tables, it's done a cross join. This gives us the twelve rows we require (which we can redue with a WHERE statement later on). Now we just have to split Column 12 out into its individual pieces. This can be done in a number of ways. One way is to:

1. Add 12 individual nullable columns to your bulk insert table
2. Bulk insert the data as usual, ensuring that the nullable columns remain NULL
3. Run an update query on your bulk insert table to split column 12 out to its individual columns
4. Now we can combine with the above query to pop the 12 individual columns into the right place.

To find the various pieces of column twelve you can use the CHARINDEX and SUBSTRING functions. It gets very ugly but it works. If the bits of data are guaranteed to always be two character then it is much easier but I won't assume that.
For example:

SELECT
C12,
LEFT(C12,CHARINDEX(' ',C12)-1) As Value1,
SUBSTRING(C12,CHARINDEX(' ',C12)+1,CHARINDEX(' ',C12,CHARINDEX(' ',C12)+1)-CHARINDEX(' ',C12)-1) As Value2
FROM
(SELECT '75 76 15 61 17 7 16 82 31 56 86' As C12) A
Its very ugly and the worse thing is that it gets exponentially worse as you move along the column.

Now, something to think about: Now that you are doing some more complex processing it is best to bulk load into a 'staging' table, perform processing on that table and then load the processed data into the proper table.

So there are all the jigsaw pieces. In summary what I suggest is:
1. Bulk load into a staging table with extra columns which will be populated with those extra columns
2. I write you a stored proc which will load up those extra columns (rather than doing an enormous inline bunch of CHARINDEX's
3. You then use the cross join trick (create a table with a bunch of rows - more than 12 in case you need more later on), some CASE statements and a WHERE clause to transfer the data from the staging table to the real table.
I want to make sure this is OK with you before I do any code.

nmcdermaid

I see you mentioned a 'final table' sp maybe you already have the staging bit worked out.
Can you post the create table statements for your staging table and your final table and I'll fill in the missing bits.

UOC

ASKER

I use a DTS script to get the data from the csv file to the holding table called 'temp_olddata_upload'. All the other fields in the table I can put into their respective tables but the data in column 12 I cannot do so at the moment it is sitting in the above table in a fields called 'acode' in varchar format.

The attached code snippet shows the data being uploaed to the table and the other commands to put the other fields in their final tables. The command missing is that to put the column 12 and 13 into the anal_06 table.

From the code you have put up, will that work if some of the fields are empty because as I mentioned earlier, ther can be as little as one acode or as many as 20 so if that code is run with less will it still work?

CREATE PROCEDURE ps_endo_olddata AS
EXEC master..xp_cmdshell 'COPY /Y \\file10\Shared\EndoLab\CSV\OldHeaderFile.TXT + \\file10\Shared\EndoLab\CSV\old_data.csv \\file10\Shared\EndoLab\CSV\OutputFile_old.CSV'
EXEC master..xp_cmdshell 'dtsrun /Schmedsdb1 /Ut1 /Pt2 /NDTS__Endo_upload_csv_to_temp_olddata' 
EXEC master..xp_cmdshell 'dtsrun /Schmedsdb1 /Ut1 /Pt2 /NDTS__Endo_delete_temp_olddataupload' 
EXEC master..xp_cmdshell 'dtsrun /Schmedsdb1 /Ut1 /Pt2 /NDTS__Endo_temp_olddata_to_patient_06_table' 
EXEC master..xp_cmdshell 'dtsrun /Schmedsdb1 /Ut1 /Pt2 /NDTS__Endo_temp_olddata_to_result_06' 
EXEC master..xp_cmdshell 'dtsrun /Schmedsdb1 /Ut1 /Pt2 /NDTS__Endo_temp_olddata_to_sample_06' 
EXEC master..xp_cmdshell 'dtsrun /Schmedsdb1 /Ut1 /Pt2 /NDTS__Endo_temp_olddata_to_cust_06'

Open in new window

nmcdermaid

Yep it'll still work, once you have the fields split out, you just use a where clause to filter out the rows where its blank.
Once we have split the single field into 12 fields (or less), then we'll have, for example, acode_1, acode_2, acode_3 etc. populated. When we've run out of acodes, the acode field will be NULL. So if we only have 3 acodes then acode_4 to acode_12 will be NULL. (thats what the SP that I will write you will do)
Then we do this:
SELECT A.acode_row, B.Column13
FROM
(
SELECT
CASE C.RowNumber
WHEN 1 THEN B.acode_1
WHEN 2 THEN B.acode_2
WHEN 3 THEN B.acode_3
WHEN 4 THEN B.acode_4
WHEN 5 THEN B.acode_5
WHEN 6 THEN B.acode_6
etc...
END As acode_row
FROM
temp_olddata_upload B,
CrossJoinTable C
) A
WHERE A.acode_row IS NOT NULL
That will return 12 rows for each, but if a particular row is NULL then it doesn't return it (due to the where clause). So if there is only one value, then acode_1 will have a value but the rest will be NULL, and RowNumber 2 - 12 will be NULL, and the WHERE clause will filter them out.

Note that you have to explicitly create the columns in your import table, and you have to explicitly name them in the CASE statement.
So now we need to
1. Know the name of column13 in temp_olddata_upload
2. Create a table called CrossJoinTable, which just has 12 rows and a single column called RowNumber
3. Add varchar nullable columns acode_1 to acode_12 (may as well go up to 20 or 30 if you feel this is likely) to temp_olddata_upload
4. Verify that you can still load data into this table with these new columns
Then I'll write a stored proc or an UPDATE statement to farm acode out to acode_1, acode_2, acode_3 etc. in temp_olddata_upload.

UOC

ASKER

The info as required:
1) column 13 is called 'acode'
2) CrossTableJoin created but not sure what you mean by 12 rows and single column. I have created a column called 'RowNumber' and made it numeric but not sure about what you mean by 12 rows.
3) Done
4) Yes, still working

nmcdermaid

Can you run the below code and verify that it is populating your acode_1, 2 ,3 etc. fields. I have only done four queries. Hopefully you can follow the thread and do the other eight. I've tested that it works when there are less than twelve acode values.
Once you have all your acode_* fields populated, run this:

SELECT A.acode_row, B.patno_anal
FROM
(
SELECT
CASE C.RowNumber
WHEN 1 THEN B.acode_1
WHEN 2 THEN B.acode_2
WHEN 3 THEN B.acode_3
WHEN 4 THEN B.acode_4
WHEN 5 THEN B.acode_5
WHEN 6 THEN B.acode_6
etc...
END As acode_row
FROM
temp_olddata_upload B,
CrossJoinTable C
) A
WHERE A.acode_row IS NOT NULL
and verify that it is returning what you expect.
Then get back to me and we'll put all the pieces together.

update dbo.temp_olddata_upload
set 
acode_1 = RTRIM(LEFT(acode,PATINDEX('% %',acode))), 
acode_2 = LTRIM(RIGHT(acode,LEN(acode)-PATINDEX('% %',acode)))
from dbo.temp_olddata_upload
where PATINDEX('% %',acode) > 0
 
update dbo.temp_olddata_upload
set 
acode_2 = RTRIM(LEFT(acode_2,PATINDEX('% %',acode_2))), 
acode_3 = LTRIM(RIGHT(acode_2,LEN(acode_2)-PATINDEX('% %',acode_2)))
from dbo.temp_olddata_upload
where PATINDEX('% %',acode_2) > 0
 
update dbo.temp_olddata_upload
set 
acode_3 = RTRIM(LEFT(acode_3,PATINDEX('% %',acode_3))), 
acode_4 = LTRIM(RIGHT(acode_3,LEN(acode_3)-PATINDEX('% %',acode_3)))
from dbo.temp_olddata_upload
where PATINDEX('% %',acode_3) > 0
 
update dbo.temp_olddata_upload
set 
acode_4 = RTRIM(LEFT(acode_4,PATINDEX('% %',acode_4))), 
acode_5 = LTRIM(RIGHT(acode_4,LEN(acode_4)-PATINDEX('% %',acode_4)))
from dbo.temp_olddata_upload
where PATINDEX('% %',acode_4) > 0

Open in new window

nmcdermaid

hmm my prior comment wasn't posted.... CrossJoinTable should have twelve rows in it, with the numbers 1 to 12. The 'RowNumber' data type should preferably be tinyint. (allows numbers 1 - 255). Numeric allows decimal places but we really don't need that. The purpose of CrossJoinTable is to replace that big UNION subtable statement which I pasted earlier.

Sharath S

I have created this function in SQL Server 2008 (and tested in 2005 also). If you face any issue while executing this in SQL 7, let me know.

1) First bulk insert the data from CSV file into a staging table.
   I assume that the table name is StagingTable1.
 
2) Now create anothe table with and indentity column as you don't have any column to identify each record uniquely in your StagingTable1.
   I created a table like this.
   
   create table StagingTable2 (RowNum int identity,
                               col1 int,
                               col2 varchar(10),
                               col3 varchar(5),
                               col4 datetime,
                               col5 int,
                               col6 varchar(1),
                               col7 varchar(10),
                               col8 int,
                               col9 decimal(10,4),
                               col10 varchar(10),
                               col11 int,
                               acode varchar(100),
                               patno_anal varchar(20))
 
3) Insert the records from StagingTable1 into StagingTable2.
   
   insert StagingTable2
   select * from StagingTable1
 
   Now you have records in StagingTable2 with identity colum RowNum for each record. 
   I just displayed two records here.
   
   select * from StagingTable2
   
   RowNum	col1	col2	col3	col4	col5	col6	col7	col8	col9	col10	col11	acode	patno_anal
   1	1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	75 76 15 61 17 7 16 82 31 56 86	127952-001
   2	1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	75 76 15 61 17 7 16 82 31 56 86	127954-001
 
4) Create a function like this.
 
   alter FUNCTION [dbo].[udf_PivotParameters1]
       (
         @RowNum int
       )
   RETURNS @ReturnList TABLE
       (
         FieldValue VARCHAR(1000)
       )
   AS BEGIN
       DECLARE @ParamaterList varchar(2000)
       
       DECLARE @ArrayList TABLE
           (
             FieldValue VARCHAR(1000)
           )
       DECLARE @Value VARCHAR(1000)
       DECLARE @CurrentPosition INT
       DECLARE @Delimiter varchar(2)
       set @Delimiter = ','
       select @ParamaterList = acode from StagingTable2 where RowNum = @RowNum
       select @ParamaterList = replace(@ParamaterList,' ',',')
       SET @ParamaterList = LTRIM(RTRIM(@ParamaterList))
           + CASE WHEN RIGHT(@ParamaterList, 1) = @Delimiter THEN ''
                  ELSE @Delimiter
             END
       SET @CurrentPosition = ISNULL(CHARINDEX(@Delimiter, @ParamaterList, 1), 0)  
   
       IF @CurrentPosition = 0
           INSERT  INTO @ArrayList ( FieldValue )
                   SELECT  @ParamaterList
       ELSE
           BEGIN
               WHILE @CurrentPosition > 0
                   BEGIN
                       SET @Value = LTRIM(RTRIM(LEFT(@ParamaterList,
                                                     @CurrentPosition - 1))) --make sure a value exists between the delimiters
                       IF LEN(@ParamaterList) > 0
                           AND @CurrentPosition <= LEN(@ParamaterList)
                           BEGIN
                               INSERT  INTO @ArrayList ( FieldValue )
                                       SELECT  @Value
                           END
                       SET @ParamaterList = SUBSTRING(@ParamaterList,
                                                      @CurrentPosition
                                                      + LEN(@Delimiter),
                                                      LEN(@ParamaterList))
                       SET @CurrentPosition = CHARINDEX(@Delimiter,
                                                        @ParamaterList, 1)
                   END
           END
       INSERT  @ReturnList ( FieldValue )
               SELECT  FieldValue
               FROM    @ArrayList
       RETURN
      END
   
5) Call this function in your query like this.
   
   select col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,patno_anal,
          FieldValue as acode
     from StagingTable2
    cross apply dbo.udf_PivotParameters1(RowNum)
  
  
  col1	col2	col3	col4	col5	col6	col7	col8	col9	col10	col11	patno_anal	acode
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	75
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	76
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	15
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	61
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	17
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	7
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	16
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	82
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	31
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	56
  1011	Arumpo	p	2000-11-28 00:00:00.000	930	h	ACTr	7	49.3200	pmol/L	732	127952-001	86
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	75
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	76
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	15
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	61
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	17
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	7
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	16
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	82
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	31
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	56
  1011	Arumpo	p	2000-05-12 00:00:00.000	930	h	ACTr	7	12.0000	pmol/L	732	127954-001	86

Open in new window

nmcdermaid

Sharath, UDF's do not exist in SQL Server 7.0. I think we have almost arrived at a solution here anyway.

Sharath S

is it? thats strange. I haven't worked in SQL 7 so thought of UDF.
Then my code is of no use for the asker. I hope you are directing the asker in correct way to achive the result. Lets wait for the asker's comments.

nmcdermaid

The poster may wish at some stage to upgrade to SQL 2008 Express (free) and use BULK INSERT to load the text data rather than DTS. Then he can use all the new functionality. But anyway lets wait to hear back.

UOC

ASKER

Hi,

Did as you said and the acode rows populated so that is good.
However I am getting the error - The column prefix 'B' does not match with a table name or alias name used in the query - when running -
SELECT A.acode_row, B.patno_anal
FROM
(
SELECT
CASE C.RowNumber
WHEN 1 THEN B.acode_1
WHEN 2 THEN B.acode_2
WHEN 3 THEN B.acode_3
WHEN 4 THEN B.acode_4
WHEN 5 THEN B.acode_5
WHEN 6 THEN B.acode_6
WHEN 7 THEN B.acode_7
WHEN 8 THEN B.acode_8
WHEN 9 THEN B.acode_9
WHEN 10 THEN B.acode_10
WHEN 11 THEN B.acode_11
WHEN 12 THEN B.acode_12
END As acode_row
FROM
temp_olddataupload B,
CrossJoinTable C
) A
WHERE A.acode_row IS NOT NULL

Also, B.pano_anal should be B.patno as in the temp_uploadolddata, it is called patno; in the destination table, anal_06, it is called patno_anal.

nmcdermaid

Sorry, wrote that wrong. See below.

SELECT patno, acode_row
FROM
(
SELECT B.patno,
CASE C.RowNumber 
WHEN 1 THEN B.acode_1
WHEN 2 THEN B.acode_2
WHEN 3 THEN B.acode_3
WHEN 4 THEN B.acode_4
WHEN 5 THEN B.acode_5
WHEN 6 THEN B.acode_6
WHEN 7 THEN B.acode_7
WHEN 8 THEN B.acode_8
WHEN 9 THEN B.acode_9
WHEN 10 THEN B.acode_10
WHEN 11 THEN B.acode_11
WHEN 12 THEN B.acode_12
END As acode_row
FROM 
temp_olddataupload B,
CrossJoinTable C
) A
WHERE acode_row IS NOT NULL

Open in new window

UOC

ASKER

That code is returning nothing, just saying '0 row(s) affected'.
I've never seen when then statements before so can't help out but getting close. Any further thoughts?

Sharath S

UOC - give me some time. will try to solve this this without using UDF.

ASKER CERTIFIED SOLUTION

nmcdermaid

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

UOC

ASKER

Excellent help and very patient too.

nmcdermaid

Glad to help. Just some final notes:
1. I see that there can be up to twenty data items in that column but I assumed 12... I'm sure you can work out how to extend to twenty
2. To implement the UPDATE statements in your DTS just run them inside an execute SQL task.
You may already know this but just making sure.