Solved

sqlldr a .csv with variable length columns into an oracle(8.1) table

Posted on 2004-10-27
743 Views
Last Modified: 2007-12-19
Hi,

I have a .csv file of the following format:

Item, Resource,20-10-2004 10:04,20-10-2004 11:00
BL00002BB      ,47    ,188,203

here there are only 2 date/time columns. there can be anywhere between 100-1000 date/time columns.

Im using Oracle 8.1 and would like to know the best way to load this into a table of

Resource_Table
start_time                           End time                   Item               Resource     Qty
20-10-2004 10:04                20-10-2004  11:00     BL00002BB            47        188
20-10-2004 11:00                20-10-2004  12:00     BL00002BB            47        203

etc.

any ideas?
0
Question by:joehodge
    18 Comments
     

    Author Comment

    by:joehodge
    I've just increased the points as a solution is needed fairly urgently!
    0
     

    Author Comment

    by:joehodge
    would utl_file be a better option?
    0
     
    LVL 2

    Expert Comment

    by:muskware77
    You need to adapt your model to your input. So, let's think a little. When you
    have two date/time columns, you label them start_time and end_time. How would you label the columns if you have say four (4) date/time columns?
    0
     

    Author Comment

    by:joehodge
    Hi muskware77,

    if we have four time/date columns we still only have:

    Resource_Table
    start_time                           End time                   Item               Resource     Qty
    20-10-2004 10:04                20-10-2004  11:00     BL00002BB            47        188
    20-10-2004 11:00                20-10-2004  12:00     BL00002BB            47        203
    20-10-2004 11:04                20-10-2004  11:00     BL00002BB            47        188
    20-10-2004 12:00                20-10-2004  12:00     BL00002BB            47        203

    thanks
    0
     
    LVL 2

    Expert Comment

    by:muskware77
    A sqlloader session like this one may move towards the solution:

    -- sqlldr userid=joehodge/yourpwd control=c:\load\sol01.ctl log=d:\load\sqlldr01.log
    --your control file (sol.01.ctl) may look like this:

    LOAD DATA
    INFILE .csv
    INTO TABLE Resource_Table
    FIELDS TERMINATED BY ','
    (starttime, End_time, Item, Resource, Qty)


    0
     
    LVL 7

    Accepted Solution

    by:
    If I understand your question, you are not using variable-length columns, but a variable number of columns:
    >>there can be anywhere between 100-1000 date/time columns

    That makes sqlldr unsuitable for this purpose.  What does the table you are loading into look like?  I can't understand how you would map such an animal.

    But, if you have to,
      1. move the date/time columns to the end, use "trailing nullcols", which defaults null into columns not specified, or
      2. in creating the datafile, put in commas as placeholders for all columns that are null

    I would probably use utl_file or a java program to process this, but with the above, you may be able to make it work with sqlldr.
    0
     
    LVL 2

    Expert Comment

    by:muskware77
    my time is up. maybe i see you guys tomorrow.
    0
     

    Author Comment

    by:joehodge
    Hi bvanderveen,

    apologies i did mean a variable number of columns!

    my idea was to :

    have a table with as many columns (col_1, col_2 etc) as Oracle allows as a staging table

    sqlldr the .csv into this table

    write a oracle procedure to take the first row ( e.g. Item, Resource,20-10-2004 10:04,20-10-2004 11:00) and put this into a table called column_header_description
    col_number     col_description
    1                        item
    2                        Resource
    3                        20-10-2004  10:00
    4                        20-10-2004  11:00
    etc

    I could then write another procedure(already sounding like a bit long winded!)
    to insert into the Resource_Table:

    Resource_Table
    start_time                  End time                   Item                     Resource          Qty
    3                                   4              col_1.value            col_1.value        col_3.value
    5                                   6              col_1.value            col_1.value        col_4.value

    3, 4, 5 and 6 would be taken from the column_header_description and col_1.value from the staging table

    after writing it all out there must be an easier way?
    0
     
    LVL 7

    Expert Comment

    by:bvanderveen
    I think you will run into a limitation on the number of columns Oracle allows.  You could have multiple staging tables linked in a 1 to 1 relationship, but I think you would be better served to try to make the input data more like what your final data will be (I assume you have some control over this, if not, that changes the problem).  

    Look at your final design.  If you have a large number of times, I assume that there is more that start/end combinations, but some sort of intervals.  If you normalize this, you will probably have separate tables, and your program sending over output can also normalize the data to be loaded.   You can do some wizardry with sqlldr or utl_file, but I think this approach would be simpler and easier to implement and maintain.

    When solving a problem, I try to start with making sure I am solving the "right" problem.

    Let me know if my assumptions are wrong.
    0
     

    Author Comment

    by:joehodge
    Hi,

    due to limitations from the source system we have no control over the input.

    the report can be run for any time frame so this could be a month or a year (which supports your comment on the number of columns Oracle allows).

    I didnt quite understand the comments in your second paragraph, could you kindly elaborate
    0
     
    LVL 7

    Expert Comment

    by:bvanderveen
    My second paragraph envisions at least two tables.  Perhaps I do not understand your data.  Is the final result going to be a table with four columns:

    Resource_Table
    start_time                  End time                   Item                     Resource          Qty
    3                                   4              col_1.value            col_1.value        col_3.value
    5                                   6              col_1.value            col_1.value        col_4.value

    or will there be additional time columns in this table?  If there will be additional date/time cols, what will they mean?  Will they be start_time2, end_time2, start_time3, etc?
     
    0
     

    Author Comment

    by:joehodge
    there will only ever be 5 columns in this table.

    Resource_Table
    start_time                  End time                   Item                     Resource          Qty
    3                                   4              col_1.value            col_1.value        col_3.value
    5                                   6              col_1.value            col_1.value        col_4.value
    6                                   7              col_1.value            col_1.value        col_5.value
    0
     
    LVL 7

    Expert Comment

    by:bvanderveen
    So why will your source output file have more than 2 times?  Even if it is a month's worth of data, I would expect that there would be a separate record for each time/resource/quantity.

    What I would probably have is something like 2 tables:

      RESOURCE_INFO
          RESOURCE_ID  NUMBER NOT NULL,
          DESCRIPTION VARCHAR2(200)

      PRODUCTION_DATA
         RESOURCE_ID   NUMBER NOT NULL (FK) ,
         QUANTITY        NUMBER,
         START_TIME    DATE,
         END_TIME        DATE

    I think you need to pre-process your data file.  I am guessing that they are sending you (in a one-month example, a record with the resource, and then quantity/start/end data for each data point.  You should use java, C,  or whatever to split this up so that each record matches somewhat your final data format.   Are my assumptions about the data file correct?


     
    0
     

    Author Comment

    by:joehodge
    Hi,

    the first two lines of the file look like this. A user could also select an extra day/months worth of data

    Item, Resource,20-10-2004 10:04,20-10-2004 11:00,20-10-2004 12:00,20-10-2004 13:00,20-10-2004 14:00,20-10-2004 15:00,20-10-2004 16:00,20-10-2004 17:00,20-10-2004 18:00,20-10-2004 19:00,20-10-2004 20:00,20-10-2004 21:00,20-10-2004 22:00,20-10-2004 23:00,21-10-2004 00:00,21-10-2004 01:00,21-10-2004 02:00,21-10-2004 03:00,21-10-2004 04:00,21-10-2004 05:00,21-10-2004 06:00,21-10-2004 07:00,21-10-2004 08:00,21-10-2004 09:00,21-10-2004 10:00,21-10-2004 11:00,21-10-2004 12:00,21-10-2004 13:00,21-10-2004 14:00,21-10-2004 15:00,21-10-2004 16:00,21-10-2004 17:00,21-10-2004 18:00,21-10-2004 19:00,21-10-2004 20:00,21-10-2004 21:00,21-10-2004 22:00,21-10-2004 23:00,22-10-2004 00:00,22-10-2004 01:00,22-10-2004 02:00,22-10-2004 03:00,22-10-2004 04:00,22-10-2004 05:00,22-10-2004 06:00,22-10-2004 07:00,22-10-2004 08:00,22-10-2004 09:00,22-10-2004 10:00,22-10-2004 11:00,22-10-2004 12:00,22-10-2004 13:00,22-10-2004 14:00,22-10-2004 15:00,22-10-2004 16:00,22-10-2004 17:00,22-10-2004 18:00,22-10-2004 19:00,22-10-2004 20:00,22-10-2004 21:00,22-10-2004 22:00,22-10-2004 23:00,23-10-2004 00:00,23-10-2004 01:00,23-10-2004 02:00,23-10-2004 03:00,23-10-2004 04:00,23-10-2004 05:00,23-10-2004 06:00,23-10-2004 07:00,23-10-2004 08:00,23-10-2004 09:00,23-10-2004 10:00,23-10-2004 11:00,23-10-2004 12:00,23-10-2004 13:00,23-10-2004 14:00,23-10-2004 15:00,23-10-2004 16:00,23-10-2004 17:00,23-10-2004 18:00,23-10-2004 19:00,23-10-2004 20:00,23-10-2004 21:00,23-10-2004 22:00,23-10-2004 23:00,24-10-2004 00:00,24-10-2004 01:00,24-10-2004 02:00,24-10-2004 03:00,24-10-2004 04:00,24-10-2004 05:00,24-10-2004 06:00,24-10-2004 07:00,24-10-2004 08:00,24-10-2004 09:00,24-10-2004 10:00,24-10-2004 11:00,24-10-2004 12:00,24-10-2004 13:00,24-10-2004 14:00,24-10-2004 15:00,24-10-2004 16:00,24-10-2004 17:00,24-10-2004 18:00,24-10-2004 19:00,24-10-2004 20:00,24-10-2004 21:00,24-10-2004 22:00,24-10-2004 23:00,25-10-2004 00:00,25-10-2004 01:00,25-10-2004 02:00,25-10-2004 03:00,25-10-2004 04:00,25-10-2004 05:00,25-10-2004 06:00,25-10-2004 07:00,25-10-2004 08:00,25-10-2004 09:00,25-10-2004 10:00,25-10-2004 11:00,25-10-2004 12:00,25-10-2004 13:00,25-10-2004 14:00,25-10-2004 15:00,25-10-2004 16:00,25-10-2004 17:00,25-10-2004 18:00,25-10-2004 19:00,25-10-2004 20:00,25-10-2004 21:00,25-10-2004 22:00,25-10-2004 23:00,26-10-2004 00:00,26-10-2004 01:00,26-10-2004 02:00,26-10-2004 03:00,26-10-2004 04:00,26-10-2004 05:00,26-10-2004 06:00,26-10-2004 07:00,26-10-2004 08:00,26-10-2004 09:00,26-10-2004 10:00,26-10-2004 11:00,26-10-2004 12:00,26-10-2004 13:00,26-10-2004 14:00,26-10-2004 15:00,26-10-2004 16:00,26-10-2004 17:00,26-10-2004 18:00,26-10-2004 19:00,26-10-2004 20:00,26-10-2004 21:00,26-10-2004 22:00,26-10-2004 23:00,27-10-2004 00:00,27-10-2004 01:00,27-10-2004 02:00,27-10-2004 03:00,27-10-2004 04:00,27-10-2004 05:00,27-10-2004 06:00,27-10-2004 07:00,27-10-2004 08:00,27-10-2004 09:00,27-10-2004 10:00,27-10-2004 11:00,27-10-2004 12:00,27-10-2004 13:00,27-10-2004 14:00,27-10-2004 15:00,27-10-2004 16:00,27-10-2004 17:00,27-10-2004 18:00,27-10-2004 19:00,27-10-2004 20:00,27-10-2004 21:00,27-10-2004 22:00,27-10-2004 23:00,28-10-2004 00:00,28-10-2004 01:00,28-10-2004 02:00,28-10-2004 03:00,28-10-2004 04:00,28-10-2004 05:00,28-10-2004 06:00,28-10-2004 07:00,28-10-2004 08:00,28-10-2004 09:00,28-10-2004 10:00,28-10-2004 11:00,28-10-2004 12:00,28-10-2004 13:00,28-10-2004 14:00,28-10-2004 15:00,28-10-2004 16:00,28-10-2004 17:00,28-10-2004 18:00,28-10-2004 19:00,28-10-2004 20:00,28-10-2004 21:00,28-10-2004 22:00,28-10-2004 23:00,29-10-2004 00:00,29-10-2004 01:00,29-10-2004 02:00,29-10-2004 03:00,29-10-2004 04:00,29-10-2004 05:00,29-10-2004 06:00,29-10-2004 07:00,29-10-2004 08:00,29-10-2004 09:00,29-10-2004 10:00,29-10-2004 11:00,29-10-2004 12:00,29-10-2004 13:00,29-10-2004 14:00,29-10-2004 15:00,29-10-2004 16:00,29-10-2004 17:00,29-10-2004 18:00,29-10-2004 19:00,29-10-2004 20:00,29-10-2004 21:00,29-10-2004 22:00,29-10-2004 23:00,30-10-2004 00:00,30-10-2004 01:00,30-10-2004 02:00,30-10-2004 03:00,30-10-2004 04:00,30-10-2004 05:00,30-10-2004 06:00,30-10-2004 07:00,30-10-2004 08:00,30-10-2004 09:00,30-10-2004 10:00,30-10-2004 11:00,30-10-2004 12:00,30-10-2004 13:00,30-10-2004 14:00,30-10-2004 15:00,30-10-2004 16:00,30-10-2004 17:00,30-10-2004 18:00,30-10-2004 19:00,30-10-2004 20:00,30-10-2004 21:00,30-10-2004 22:00,30-10-2004 23:00,31-10-2004 00:00,31-10-2004 01:00,31-10-2004 01:00,31-10-2004 02:00,31-10-2004 03:00,31-10-2004 04:00,31-10-2004 05:00,31-10-2004 06:00,31-10-2004 07:00,31-10-2004 08:00,31-10-2004 09:00,31-10-2004 10:00,31-10-2004 11:00,31-10-2004 12:00,31-10-2004 13:00,31-10-2004 14:00,31-10-2004 15:00,31-10-2004 16:00,31-10-2004 17:00,31-10-2004 18:00,31-10-2004 19:00,31-10-2004 20:00,31-10-2004 21:00,31-10-2004 22:00,31-10-2004 23:00,01-11-2004 00:00,01-11-2004 01:00,01-11-2004 02:00,01-11-2004 03:00,01-11-2004 04:00,01-11-2004 05:00,01-11-2004 06:00,01-11-2004 07:00,01-11-2004 08:00
    BL00002BB      ,35    ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,82,111,111,111,111,111,111,111,111,111,111,111,55,9,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,107,71,74,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,86,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    0
     
    LVL 7

    Expert Comment

    by:bvanderveen
    That's kind of what I thought.  I think you need to parse this, and output it into another file that is in the format you want to load.
    0
     

    Author Comment

    by:joehodge
    when you say parse it, what approach would you use?

    I haven't much java experience but am fairly experiencde in plsql.

    my latest idea was to have a staging table with two columns

    col_data         blob
    col_number    number pk

    and then sqlldr every line into col_data field and put a sequential number in the col_number .

    then do some string manipulation on the blob.

    will this be incredibly slow or should I use utl_file.get?
    0
     

    Author Comment

    by:joehodge
    Hi,

    thanks for your help bvanderveen.

    I've decided to use utl_file.get
    0
     
    LVL 2

    Expert Comment

    by:muskware77
    Hi,

    happy to see you've got a solution. However, you may wish to ugrade to
    Oracle9i R2 and use external table in the future. This will certainly improve the
    performance and reduce the time you spend in these ETF (Extraction, Transformation and Load) efforts.

    Plse see: http://www.oracle.com/technology/products/oracle9i/daily/sept19.html

    Good luck.
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Course: MongoDB Object-Document Mapper for NodeJS

    NodeJS (JavaScript on the server) is awesome, but some developers get confused about NoSQL when it comes to working in Node with MongoDB (NoSQL database). Do you need a better explanation of how to use Node.js with MongoDB? The most popular choice is the Mongoose library.

    Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
    Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
    This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
    Via a live example, show how to restore a database from backup after a simulated disk failure using RMAN.

    933 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now