asked on

Ideas on how to load multi-value fields into Oracle object table

I am looking to design and build some data loading routines, however, the data being loaded has been extracted from a Pick database, which has multi-value fields.

The supplied data will be in a csv file and I'm told there will only be a maximum of 2 levels of multi-valuing. So a particular field may have lots of delimited values, each of which may also have many delimited values. The delimiters will be specific non-printable characters.

The obvious way to load this data would be to define an external table to match the csv file layout, then use pl/sql to split out the multi-value fields, which will be stored in child tables i.e. in relational tables, rather than object tables. The net result will be a row created in a "parent" table for each record in the file and for multi-value fields, a row in a child row, and (for the 2nd level of multi-value) a row in a grand-child table (this will be related to the child, which itself is related to the parent).

On the surface, this looks like a lot of string manipulation and procedural code. Are there any alternative ways of doing this?

I've done a little research and determined that sql*loader will load into nested tables, but I think it will only do this to one level. Unfortunately, external tables don't support nested tables or varrays even in version 11g.

SOLUTION

slightwv (䄆 Netminder)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Sean Stuber

So, are you saying your csv file might look like this?

f1a;f1b;f1c1:f1c2:f1c3;f1d,f2a,f3a;f3b1:f3b2;f3c,f4a;f4b,f5

That is...

table has 5 fields (f1..f5) separated by commas
f1 had 4 sub fields f1a-f1d separated by semicolons
f1c has 3 sub-sub-fields f1c1..f1c3 separted by colons

something like that?
if not, can you post an excerpt? along with target table structure?

Milleniumaire

ASKER

Hi sdstuber, that's close to how I'm expecting the data to be supplied, apart from the fact that the sub fields will be using different delimiters, but your example shows the structure of the expected records. I say expected as I haven't yet been provided with any data, only a description of the records for each file!

Just modifying your example slightly gives:

f1a;f1b;f1c1:f1c2:f1c3;f1d,f2,f3a;f3b1:f3b2;f3c,f4a;f4b,f5

(You showed f2 as f2a. I'm not sure if this was intentional i.e. f2a being a multi-value field with only one value, but I've changed it to f2 to try to simplify things - if that's possible ;-)

In terms of the target table structure, this will be defined by meta-data and I'm still thinking that through! The meta-data will be used to identify, for a given file type (which I suspect may be determined by the filename), which tables the data should be inserted into, on a field by field basis. It will also indicate which fields contain multi-values or multi-multi-values!!! The reason for using meta-data is that there are many files on the legacy pick system and rather than creating a specific loader routine for each, a generic solution is required, which will be data driven. This data is to be loaded into a data warehouse, which is still being designed, so again, it isn't yet exactly clear what data is to be loaded. Using meta-data will allow the code to be written in advance of knowing exactly what the requirements are. Dangerous I know, but that is what the client wants.

From your example I would expect the data to be loaded as follows:

f1a;f1b;f1c1:f1c2:f1c3;f1d,f2,f3a;f3b1:f3b2;f3c,f4a;f4b,f5

table_u
one row containing values f2 and f5 (non-multi-value columns).

This would be the "parent" table for this type of file

table_v
multiple rows; one for value f1a, one for value f1b, one for value f1d

The other multi-values (f3a, f3c) would most likely be loaded into a different table e.g.
table_w. Similarly, the multi values f4a and f4b into table_x. However, it may be that some of the multi-values are related to each other and should therefore be loaded into the same table, but populate different columns. For example one multi-value field may contain different forenames and another multi-value field contains different surnames. In this case the position of the forename in one of the multi-value field correlates to the position of the surname in the other multi-value field. The meta-data would identify which multi-value fields were related.
The above would be the child tables and would also need one or more columns populating with some information from the parent (f2 or f5, or both) to identify the foreign key. In addition a sequence column may be required to produce a primary key (not sure about this yet). It may be that some of the data inserted into the "child" table will be used along with the foreign key column to generate a primary key. Again, it is likely that this will be defined in the meta-data.

table_y
multiple rows for the 2nd level of multi-values; one for value f1c1 and f3b1, one for value f1c2 and f3b2 and one for value f1c3 (no equivalent f3b3 value).

This would be the grand-child table.

In this case I've assumed that the 2nd level of multi values are related and so will be put in different columns on the same table. This is probably unlikely for the 2nd level of multi-value but shows the concept.

It may be more likely the case (depending on how the meta-data is defined) that f3b1 and f3b2 be inserted into a completely different table (table_z).

In terms of relating rows in the grand-child table to the child table, meta data would need to be used to identify which child columns should also be loaded into the grand-child table and which resulting columns would make up the primary key.

Sorry for the complexity, but as I say, I'm still thinking about the design of this at the moment, but I'm trying to determine what the options are for implementing it. Due to the likely complexities of using meta data to control how to process the fields in the file I can't see any option other than using pl/sql and possibly trying to use bulk arrays to minimize database traffic.

Sean Stuber

11gR1 or 11gR2? I'm not positive it will matter, but I don't want to head down a path that you won't be able to follow?

I agree that some pl/sql might be needed, but I'm not quite ready to give up on a "mostly" sql approach to this.

Sean Stuber

by they way, you're both going to have to cover the Oracle questions for the rest of the day. This one intrigues me, I don't anticipate getting much else done. :)

Milleniumaire

ASKER

A brand new 11gR2 database is to be created for the data warehouse. Currently I only have access to 10gR2, but I'm looking to install a personal edition of 11gR2 until they have installed the hardware and software. As you can tell, this is the early stages of the project.

Please don't spend too much time on this, I put the question out there in the hope that there was a better way than writing lots of "slow" pl/sql.

I'm currently investigating the use of sql*loader to load the csv data into nested tables and to process it form there.

I've been asking for some samples of data so that I can best analyse it, but it hasn't been forthcoming due to time constraints of the Pick (Data basic) developers.

ASKER CERTIFIED SOLUTION

gatorvip

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Naveen Kumar

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Naveen Kumar

I vaguely remember ( i think in the early stages of oracle 8i release) of data loading that i went through such a requirement to get into database but then our team at that time did not have unix awk/perl experts but we were good in pro*c. So we used pro*c program to do this kind of loading with batch commits.

Thanks

Milleniumaire

ASKER

Thanks for the suggestions guys.

The way I achieved this was to use the metadata to generate a package containing a pipelined function per file. Each function would be passed a cursor (again generated from the description of the input file in the metadata) and would define an output array and then move the data from the cursor to the output array and write it out using PIPE ROW. The metadata identified the multi value fields and the function parsed these into arrays before writting to the output array and being piped out.
A view was created for each function call to hide the complexity of the call. Also external tables were created by the meta data to allow the sql (passed to the function) to retrieve the data directly from the files.
It all seems to work and performance isn't too bad.
The suggestion by gatorvip was the best, however I had already written my own parse function, which does the same as apex_util.string_to_table. I may however give the apex_util function a try and compare the performance.