Link to home
Start Free TrialLog in
Avatar of joehodge
joehodge

asked on

sqlldr a .csv with variable length columns into an oracle(8.1) table

Hi,

I have a .csv file of the following format:

Item, Resource,20-10-2004 10:04,20-10-2004 11:00
BL00002BB      ,47    ,188,203

here there are only 2 date/time columns. there can be anywhere between 100-1000 date/time columns.

Im using Oracle 8.1 and would like to know the best way to load this into a table of

Resource_Table
start_time                           End time                   Item               Resource     Qty
20-10-2004 10:04                20-10-2004  11:00     BL00002BB            47        188
20-10-2004 11:00                20-10-2004  12:00     BL00002BB            47        203

etc.

any ideas?
Avatar of joehodge
joehodge

ASKER

I've just increased the points as a solution is needed fairly urgently!
would utl_file be a better option?
You need to adapt your model to your input. So, let's think a little. When you
have two date/time columns, you label them start_time and end_time. How would you label the columns if you have say four (4) date/time columns?
Hi muskware77,

if we have four time/date columns we still only have:

Resource_Table
start_time                           End time                   Item               Resource     Qty
20-10-2004 10:04                20-10-2004  11:00     BL00002BB            47        188
20-10-2004 11:00                20-10-2004  12:00     BL00002BB            47        203
20-10-2004 11:04                20-10-2004  11:00     BL00002BB            47        188
20-10-2004 12:00                20-10-2004  12:00     BL00002BB            47        203

thanks
A sqlloader session like this one may move towards the solution:

-- sqlldr userid=joehodge/yourpwd control=c:\load\sol01.ctl log=d:\load\sqlldr01.log
--your control file (sol.01.ctl) may look like this:

LOAD DATA
INFILE .csv
INTO TABLE Resource_Table
FIELDS TERMINATED BY ','
(starttime, End_time, Item, Resource, Qty)


ASKER CERTIFIED SOLUTION
Avatar of bvanderveen
bvanderveen

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
my time is up. maybe i see you guys tomorrow.
Hi bvanderveen,

apologies i did mean a variable number of columns!

my idea was to :

have a table with as many columns (col_1, col_2 etc) as Oracle allows as a staging table

sqlldr the .csv into this table

write a oracle procedure to take the first row ( e.g. Item, Resource,20-10-2004 10:04,20-10-2004 11:00) and put this into a table called column_header_description
col_number     col_description
1                        item
2                        Resource
3                        20-10-2004  10:00
4                        20-10-2004  11:00
etc

I could then write another procedure(already sounding like a bit long winded!)
to insert into the Resource_Table:

Resource_Table
start_time                  End time                   Item                     Resource          Qty
3                                   4              col_1.value            col_1.value        col_3.value
5                                   6              col_1.value            col_1.value        col_4.value

3, 4, 5 and 6 would be taken from the column_header_description and col_1.value from the staging table

after writing it all out there must be an easier way?
I think you will run into a limitation on the number of columns Oracle allows.  You could have multiple staging tables linked in a 1 to 1 relationship, but I think you would be better served to try to make the input data more like what your final data will be (I assume you have some control over this, if not, that changes the problem).  

Look at your final design.  If you have a large number of times, I assume that there is more that start/end combinations, but some sort of intervals.  If you normalize this, you will probably have separate tables, and your program sending over output can also normalize the data to be loaded.   You can do some wizardry with sqlldr or utl_file, but I think this approach would be simpler and easier to implement and maintain.

When solving a problem, I try to start with making sure I am solving the "right" problem.

Let me know if my assumptions are wrong.
Hi,

due to limitations from the source system we have no control over the input.

the report can be run for any time frame so this could be a month or a year (which supports your comment on the number of columns Oracle allows).

I didnt quite understand the comments in your second paragraph, could you kindly elaborate
My second paragraph envisions at least two tables.  Perhaps I do not understand your data.  Is the final result going to be a table with four columns:

Resource_Table
start_time                  End time                   Item                     Resource          Qty
3                                   4              col_1.value            col_1.value        col_3.value
5                                   6              col_1.value            col_1.value        col_4.value

or will there be additional time columns in this table?  If there will be additional date/time cols, what will they mean?  Will they be start_time2, end_time2, start_time3, etc?
 
there will only ever be 5 columns in this table.

Resource_Table
start_time                  End time                   Item                     Resource          Qty
3                                   4              col_1.value            col_1.value        col_3.value
5                                   6              col_1.value            col_1.value        col_4.value
6                                   7              col_1.value            col_1.value        col_5.value
So why will your source output file have more than 2 times?  Even if it is a month's worth of data, I would expect that there would be a separate record for each time/resource/quantity.

What I would probably have is something like 2 tables:

  RESOURCE_INFO
      RESOURCE_ID  NUMBER NOT NULL,
      DESCRIPTION VARCHAR2(200)

  PRODUCTION_DATA
     RESOURCE_ID   NUMBER NOT NULL (FK) ,
     QUANTITY        NUMBER,
     START_TIME    DATE,
     END_TIME        DATE

I think you need to pre-process your data file.  I am guessing that they are sending you (in a one-month example, a record with the resource, and then quantity/start/end data for each data point.  You should use java, C,  or whatever to split this up so that each record matches somewhat your final data format.   Are my assumptions about the data file correct?


 
Hi,

the first two lines of the file look like this. A user could also select an extra day/months worth of data

Item, Resource,20-10-2004 10:04,20-10-2004 11:00,20-10-2004 12:00,20-10-2004 13:00,20-10-2004 14:00,20-10-2004 15:00,20-10-2004 16:00,20-10-2004 17:00,20-10-2004 18:00,20-10-2004 19:00,20-10-2004 20:00,20-10-2004 21:00,20-10-2004 22:00,20-10-2004 23:00,21-10-2004 00:00,21-10-2004 01:00,21-10-2004 02:00,21-10-2004 03:00,21-10-2004 04:00,21-10-2004 05:00,21-10-2004 06:00,21-10-2004 07:00,21-10-2004 08:00,21-10-2004 09:00,21-10-2004 10:00,21-10-2004 11:00,21-10-2004 12:00,21-10-2004 13:00,21-10-2004 14:00,21-10-2004 15:00,21-10-2004 16:00,21-10-2004 17:00,21-10-2004 18:00,21-10-2004 19:00,21-10-2004 20:00,21-10-2004 21:00,21-10-2004 22:00,21-10-2004 23:00,22-10-2004 00:00,22-10-2004 01:00,22-10-2004 02:00,22-10-2004 03:00,22-10-2004 04:00,22-10-2004 05:00,22-10-2004 06:00,22-10-2004 07:00,22-10-2004 08:00,22-10-2004 09:00,22-10-2004 10:00,22-10-2004 11:00,22-10-2004 12:00,22-10-2004 13:00,22-10-2004 14:00,22-10-2004 15:00,22-10-2004 16:00,22-10-2004 17:00,22-10-2004 18:00,22-10-2004 19:00,22-10-2004 20:00,22-10-2004 21:00,22-10-2004 22:00,22-10-2004 23:00,23-10-2004 00:00,23-10-2004 01:00,23-10-2004 02:00,23-10-2004 03:00,23-10-2004 04:00,23-10-2004 05:00,23-10-2004 06:00,23-10-2004 07:00,23-10-2004 08:00,23-10-2004 09:00,23-10-2004 10:00,23-10-2004 11:00,23-10-2004 12:00,23-10-2004 13:00,23-10-2004 14:00,23-10-2004 15:00,23-10-2004 16:00,23-10-2004 17:00,23-10-2004 18:00,23-10-2004 19:00,23-10-2004 20:00,23-10-2004 21:00,23-10-2004 22:00,23-10-2004 23:00,24-10-2004 00:00,24-10-2004 01:00,24-10-2004 02:00,24-10-2004 03:00,24-10-2004 04:00,24-10-2004 05:00,24-10-2004 06:00,24-10-2004 07:00,24-10-2004 08:00,24-10-2004 09:00,24-10-2004 10:00,24-10-2004 11:00,24-10-2004 12:00,24-10-2004 13:00,24-10-2004 14:00,24-10-2004 15:00,24-10-2004 16:00,24-10-2004 17:00,24-10-2004 18:00,24-10-2004 19:00,24-10-2004 20:00,24-10-2004 21:00,24-10-2004 22:00,24-10-2004 23:00,25-10-2004 00:00,25-10-2004 01:00,25-10-2004 02:00,25-10-2004 03:00,25-10-2004 04:00,25-10-2004 05:00,25-10-2004 06:00,25-10-2004 07:00,25-10-2004 08:00,25-10-2004 09:00,25-10-2004 10:00,25-10-2004 11:00,25-10-2004 12:00,25-10-2004 13:00,25-10-2004 14:00,25-10-2004 15:00,25-10-2004 16:00,25-10-2004 17:00,25-10-2004 18:00,25-10-2004 19:00,25-10-2004 20:00,25-10-2004 21:00,25-10-2004 22:00,25-10-2004 23:00,26-10-2004 00:00,26-10-2004 01:00,26-10-2004 02:00,26-10-2004 03:00,26-10-2004 04:00,26-10-2004 05:00,26-10-2004 06:00,26-10-2004 07:00,26-10-2004 08:00,26-10-2004 09:00,26-10-2004 10:00,26-10-2004 11:00,26-10-2004 12:00,26-10-2004 13:00,26-10-2004 14:00,26-10-2004 15:00,26-10-2004 16:00,26-10-2004 17:00,26-10-2004 18:00,26-10-2004 19:00,26-10-2004 20:00,26-10-2004 21:00,26-10-2004 22:00,26-10-2004 23:00,27-10-2004 00:00,27-10-2004 01:00,27-10-2004 02:00,27-10-2004 03:00,27-10-2004 04:00,27-10-2004 05:00,27-10-2004 06:00,27-10-2004 07:00,27-10-2004 08:00,27-10-2004 09:00,27-10-2004 10:00,27-10-2004 11:00,27-10-2004 12:00,27-10-2004 13:00,27-10-2004 14:00,27-10-2004 15:00,27-10-2004 16:00,27-10-2004 17:00,27-10-2004 18:00,27-10-2004 19:00,27-10-2004 20:00,27-10-2004 21:00,27-10-2004 22:00,27-10-2004 23:00,28-10-2004 00:00,28-10-2004 01:00,28-10-2004 02:00,28-10-2004 03:00,28-10-2004 04:00,28-10-2004 05:00,28-10-2004 06:00,28-10-2004 07:00,28-10-2004 08:00,28-10-2004 09:00,28-10-2004 10:00,28-10-2004 11:00,28-10-2004 12:00,28-10-2004 13:00,28-10-2004 14:00,28-10-2004 15:00,28-10-2004 16:00,28-10-2004 17:00,28-10-2004 18:00,28-10-2004 19:00,28-10-2004 20:00,28-10-2004 21:00,28-10-2004 22:00,28-10-2004 23:00,29-10-2004 00:00,29-10-2004 01:00,29-10-2004 02:00,29-10-2004 03:00,29-10-2004 04:00,29-10-2004 05:00,29-10-2004 06:00,29-10-2004 07:00,29-10-2004 08:00,29-10-2004 09:00,29-10-2004 10:00,29-10-2004 11:00,29-10-2004 12:00,29-10-2004 13:00,29-10-2004 14:00,29-10-2004 15:00,29-10-2004 16:00,29-10-2004 17:00,29-10-2004 18:00,29-10-2004 19:00,29-10-2004 20:00,29-10-2004 21:00,29-10-2004 22:00,29-10-2004 23:00,30-10-2004 00:00,30-10-2004 01:00,30-10-2004 02:00,30-10-2004 03:00,30-10-2004 04:00,30-10-2004 05:00,30-10-2004 06:00,30-10-2004 07:00,30-10-2004 08:00,30-10-2004 09:00,30-10-2004 10:00,30-10-2004 11:00,30-10-2004 12:00,30-10-2004 13:00,30-10-2004 14:00,30-10-2004 15:00,30-10-2004 16:00,30-10-2004 17:00,30-10-2004 18:00,30-10-2004 19:00,30-10-2004 20:00,30-10-2004 21:00,30-10-2004 22:00,30-10-2004 23:00,31-10-2004 00:00,31-10-2004 01:00,31-10-2004 01:00,31-10-2004 02:00,31-10-2004 03:00,31-10-2004 04:00,31-10-2004 05:00,31-10-2004 06:00,31-10-2004 07:00,31-10-2004 08:00,31-10-2004 09:00,31-10-2004 10:00,31-10-2004 11:00,31-10-2004 12:00,31-10-2004 13:00,31-10-2004 14:00,31-10-2004 15:00,31-10-2004 16:00,31-10-2004 17:00,31-10-2004 18:00,31-10-2004 19:00,31-10-2004 20:00,31-10-2004 21:00,31-10-2004 22:00,31-10-2004 23:00,01-11-2004 00:00,01-11-2004 01:00,01-11-2004 02:00,01-11-2004 03:00,01-11-2004 04:00,01-11-2004 05:00,01-11-2004 06:00,01-11-2004 07:00,01-11-2004 08:00
BL00002BB      ,35    ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,82,111,111,111,111,111,111,111,111,111,111,111,55,9,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,71,74,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,86,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
That's kind of what I thought.  I think you need to parse this, and output it into another file that is in the format you want to load.
when you say parse it, what approach would you use?

I haven't much java experience but am fairly experiencde in plsql.

my latest idea was to have a staging table with two columns

col_data         blob
col_number    number pk

and then sqlldr every line into col_data field and put a sequential number in the col_number .

then do some string manipulation on the blob.

will this be incredibly slow or should I use utl_file.get?
Hi,

thanks for your help bvanderveen.

I've decided to use utl_file.get
Hi,

happy to see you've got a solution. However, you may wish to ugrade to
Oracle9i R2 and use external table in the future. This will certainly improve the
performance and reduce the time you spend in these ETF (Extraction, Transformation and Load) efforts.

Plse see: http://www.oracle.com/technology/products/oracle9i/daily/sept19.html

Good luck.