Ideas on how to load multi-value fields into Oracle object table

I am looking to design and build some data loading routines, however, the data being loaded has been extracted from a Pick database, which has multi-value fields.

The supplied data will be in a csv file and I'm told there will only be a maximum of 2 levels of multi-valuing.  So a particular field may have lots of delimited values, each of which may also have many delimited values.  The delimiters will be specific non-printable characters.

The obvious way to load this data would be to define an external table to match the csv file layout, then use pl/sql to split out the multi-value fields, which will be stored in child tables i.e. in relational tables, rather than object tables.  The net result will be a row created in a "parent" table for each record in the file and for multi-value fields, a row in a child row, and (for the 2nd level of multi-value) a row in a grand-child table (this will be related to the child, which itself is related to the parent).

On the surface, this looks like a lot of string manipulation and procedural code.  Are there any alternative ways of doing this?

I've done a little research and determined that sql*loader will load into nested tables, but I think it will only do this to one level.  Unfortunately, external tables don't support nested tables or varrays even in version 11g.
LVL 16
Who is Participating?
gatorvipConnect With a Mentor Commented:
You might be able to use SQL if you use regular expressions but I'm not sure it could work unless you knew in advance how many items you would have in each field.

You can do this in APEX (although it would involve some PL/SQL):
- use the apex_util.string_to_table function to split up the strings ( )

Something like

array1 := apex_util.string_to_table( line , ',');
array2 := apex_util.string_to_table( field, ';');
array3 := apex_util.string_to_table( subfield, ':');

and loop through each array.

Another possibility would be to build a small application that pre-processes the data dump externally (leaving the work outside of the database) then either inserts the results back into the db, or creates CSV file that are importable with external tables.
slightwv (䄆 Netminder)Connect With a Mentor Commented:
Interesting question!

The only other thing I can think of is a kludge at best.

Can you do a string replace on the delimiters to get the CSV into an XML format?  You might be able to process/load it faster through the XML calls.

Then again, if you can replace the delimiters you might be able to also create your own defined objects and get sql*loader to treat the second level values as objects.
So, are you saying your csv file might look like this?


That is...

table has 5 fields (f1..f5)  separated by commas
f1 had 4 sub fields  f1a-f1d  separated by semicolons
f1c has 3 sub-sub-fields   f1c1..f1c3 separted by colons

something like that?
if not, can you post an excerpt? along with target table structure?

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

MilleniumaireAuthor Commented:
Hi sdstuber, that's close to how I'm expecting the data to be supplied, apart from the fact that the sub fields will be using different delimiters, but your example shows the structure of the expected records.  I say expected as I haven't yet been provided with any data, only a description of the records for each file!

Just modifying your example slightly gives:


(You showed f2 as f2a.  I'm not sure if this was intentional i.e. f2a being a multi-value field with only one value, but I've changed it to f2 to try to simplify things - if that's possible ;-)

In terms of the target table structure, this will be defined by meta-data and I'm still thinking that through!  The meta-data will be used to identify, for a given file type (which I suspect may be determined by the filename), which tables the data should be inserted into, on a field by field basis.  It will also indicate which fields contain multi-values or multi-multi-values!!!  The reason for using meta-data is that there are many files on the legacy pick system and rather than creating a specific loader routine for each, a generic solution is required, which will be data driven.  This data is to be loaded into a data warehouse, which is still being designed, so again, it isn't yet exactly clear what data is to be loaded.  Using meta-data will allow the code to be written in advance of knowing exactly what the requirements are.  Dangerous I know, but that is what the client wants.

From your example I would expect the data to be loaded as follows:


  one row containing values f2 and f5 (non-multi-value columns).

This would be the "parent" table for this type of file

  multiple rows; one for value f1a, one for value f1b, one for value f1d

The other multi-values (f3a, f3c) would most likely be loaded into a different table e.g.
table_w.  Similarly, the multi values f4a and f4b into table_x.  However, it may be that some of the multi-values are related to each other and should therefore be loaded into the same table, but populate different columns.  For example one multi-value field may contain different forenames and another multi-value field contains different surnames.  In this case the position of the forename in one of the multi-value field correlates to the position of the surname in the other multi-value field.  The meta-data would identify which multi-value fields were related.
The above would be the child tables and would also need one or more columns populating with some information from the parent (f2 or f5, or both) to identify the foreign key.  In addition a sequence column may be required to produce a primary key (not sure about this yet).  It may be that some of the data inserted into the "child" table will be used along with the foreign key column to generate a primary key.  Again, it is likely that this will be defined in the meta-data.

  multiple rows for the 2nd level of multi-values; one for value f1c1 and f3b1, one for value f1c2 and f3b2 and one for value f1c3 (no equivalent f3b3 value).  

This would be the grand-child table.

In this case I've assumed that the 2nd level of multi values are related and so will be put in different columns on the same table.  This is probably unlikely for the 2nd level of multi-value but shows the concept.

It may be more likely the case (depending on how the meta-data is defined) that f3b1 and f3b2 be inserted into a completely different table (table_z).

In terms of relating rows in the grand-child table to the child table, meta data would need to be used to identify which child columns should also be loaded into the grand-child table and which resulting columns would make up the primary key.

Sorry for the complexity, but as I say, I'm still thinking about the design of this at the moment, but I'm trying to determine what the options are for implementing it.  Due to the likely complexities of using meta data to control how to process the fields in the file I can't see any option other than using pl/sql and possibly trying to use bulk arrays to minimize database traffic.

11gR1 or 11gR2?  I'm not positive it will matter, but I don't want to head down a path that you won't be able to follow?

I agree that some pl/sql might be needed, but I'm not quite ready to give up on a "mostly" sql approach to this.
by they way, you're both going to have to cover the Oracle questions for the rest of the day.  This one intrigues me, I don't anticipate getting much else done.  :)
MilleniumaireAuthor Commented:
A brand new 11gR2 database is to be created for the data warehouse.  Currently I only have access to 10gR2, but I'm looking to install a personal edition of 11gR2 until they have installed the hardware and software.  As you can tell, this is the early stages of the project.

Please don't spend too much time on this, I put the question out there in the hope that there was a better way than writing lots of "slow" pl/sql.

I'm currently investigating the use of sql*loader to load the csv data into nested tables and to process it form there.

I've been asking for some samples of data so that I can best analyse it, but it hasn't been forthcoming due to time constraints of the Pick (Data  basic) developers.
Naveen KumarConnect With a Mentor Production Manager / Application Support ManagerCommented:
Just a thought, why not use unix ( awk/perl ) to come up with a script to first get the data into proper format into different files. We are not using the DBMS SQL/PLSQL to do all the stuff required here, instead using some of the OS utilities and keep the data formatting ready for the sql loader to go from there.

I mean if the data is like this...


table_u_file.dat ---> ( which will have  one row containing values f2 and f5 (non-multi-value columns).

table_v_file.dat ---> ( which will have multiple rows; one for value f1a, one for value f1b, one for value f1d , with any other fields you need for joining/FK's etc )

and so on....

Using sql loader to load these individual files into whatever tables should be a simple thing right.

Naveen KumarProduction Manager / Application Support ManagerCommented:
I vaguely remember ( i think in the early stages of oracle 8i release) of data loading that i went through such a requirement to get into database but then our team at that time did not have unix awk/perl experts but we were good in pro*c. So we used pro*c program to do this kind of loading with batch commits.

MilleniumaireAuthor Commented:
Thanks for the suggestions guys.

The way I achieved this was to use the metadata to generate a package containing a pipelined function per file.  Each function would be passed a cursor (again generated from the description of the input file in the metadata) and would define an output array and then move the data from the cursor to the output array and write it out using PIPE ROW.  The metadata identified the multi value fields and the function parsed these into arrays before writting to the output array and being piped out.
A view was created for each function call to hide the complexity of the call. Also external tables were created by the meta data to allow the sql (passed to the function) to retrieve the data directly from the files.
It all seems to work and performance isn't too bad.
The suggestion by gatorvip was the best, however I had already written my own parse function, which does the same as apex_util.string_to_table.  I may however give the apex_util function a try and compare the performance.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.