Solved

Ideas on how to load multi-value fields into Oracle object table

Posted on 2010-11-16
10
1,117 Views
Last Modified: 2013-12-07
I am looking to design and build some data loading routines, however, the data being loaded has been extracted from a Pick database, which has multi-value fields.

The supplied data will be in a csv file and I'm told there will only be a maximum of 2 levels of multi-valuing.  So a particular field may have lots of delimited values, each of which may also have many delimited values.  The delimiters will be specific non-printable characters.

The obvious way to load this data would be to define an external table to match the csv file layout, then use pl/sql to split out the multi-value fields, which will be stored in child tables i.e. in relational tables, rather than object tables.  The net result will be a row created in a "parent" table for each record in the file and for multi-value fields, a row in a child row, and (for the 2nd level of multi-value) a row in a grand-child table (this will be related to the child, which itself is related to the parent).

On the surface, this looks like a lot of string manipulation and procedural code.  Are there any alternative ways of doing this?

I've done a little research and determined that sql*loader will load into nested tables, but I think it will only do this to one level.  Unfortunately, external tables don't support nested tables or varrays even in version 11g.
0
Comment
Question by:Milleniumaire
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 76

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 167 total points
ID: 34145215
Interesting question!

The only other thing I can think of is a kludge at best.

Can you do a string replace on the delimiters to get the CSV into an XML format?  You might be able to process/load it faster through the XML calls.

Then again, if you can replace the delimiters you might be able to also create your own defined objects and get sql*loader to treat the second level values as objects.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34145299
So, are you saying your csv file might look like this?


f1a;f1b;f1c1:f1c2:f1c3;f1d,f2a,f3a;f3b1:f3b2;f3c,f4a;f4b,f5


That is...

table has 5 fields (f1..f5)  separated by commas
f1 had 4 sub fields  f1a-f1d  separated by semicolons
f1c has 3 sub-sub-fields   f1c1..f1c3 separted by colons

something like that?
if not, can you post an excerpt? along with target table structure?

0
 
LVL 16

Author Comment

by:Milleniumaire
ID: 34145917
Hi sdstuber, that's close to how I'm expecting the data to be supplied, apart from the fact that the sub fields will be using different delimiters, but your example shows the structure of the expected records.  I say expected as I haven't yet been provided with any data, only a description of the records for each file!

Just modifying your example slightly gives:

f1a;f1b;f1c1:f1c2:f1c3;f1d,f2,f3a;f3b1:f3b2;f3c,f4a;f4b,f5

(You showed f2 as f2a.  I'm not sure if this was intentional i.e. f2a being a multi-value field with only one value, but I've changed it to f2 to try to simplify things - if that's possible ;-)

In terms of the target table structure, this will be defined by meta-data and I'm still thinking that through!  The meta-data will be used to identify, for a given file type (which I suspect may be determined by the filename), which tables the data should be inserted into, on a field by field basis.  It will also indicate which fields contain multi-values or multi-multi-values!!!  The reason for using meta-data is that there are many files on the legacy pick system and rather than creating a specific loader routine for each, a generic solution is required, which will be data driven.  This data is to be loaded into a data warehouse, which is still being designed, so again, it isn't yet exactly clear what data is to be loaded.  Using meta-data will allow the code to be written in advance of knowing exactly what the requirements are.  Dangerous I know, but that is what the client wants.

From your example I would expect the data to be loaded as follows:

f1a;f1b;f1c1:f1c2:f1c3;f1d,f2,f3a;f3b1:f3b2;f3c,f4a;f4b,f5

table_u
  one row containing values f2 and f5 (non-multi-value columns).

This would be the "parent" table for this type of file

table_v
  multiple rows; one for value f1a, one for value f1b, one for value f1d

The other multi-values (f3a, f3c) would most likely be loaded into a different table e.g.
table_w.  Similarly, the multi values f4a and f4b into table_x.  However, it may be that some of the multi-values are related to each other and should therefore be loaded into the same table, but populate different columns.  For example one multi-value field may contain different forenames and another multi-value field contains different surnames.  In this case the position of the forename in one of the multi-value field correlates to the position of the surname in the other multi-value field.  The meta-data would identify which multi-value fields were related.
The above would be the child tables and would also need one or more columns populating with some information from the parent (f2 or f5, or both) to identify the foreign key.  In addition a sequence column may be required to produce a primary key (not sure about this yet).  It may be that some of the data inserted into the "child" table will be used along with the foreign key column to generate a primary key.  Again, it is likely that this will be defined in the meta-data.

table_y
  multiple rows for the 2nd level of multi-values; one for value f1c1 and f3b1, one for value f1c2 and f3b2 and one for value f1c3 (no equivalent f3b3 value).  

This would be the grand-child table.

In this case I've assumed that the 2nd level of multi values are related and so will be put in different columns on the same table.  This is probably unlikely for the 2nd level of multi-value but shows the concept.

It may be more likely the case (depending on how the meta-data is defined) that f3b1 and f3b2 be inserted into a completely different table (table_z).

In terms of relating rows in the grand-child table to the child table, meta data would need to be used to identify which child columns should also be loaded into the grand-child table and which resulting columns would make up the primary key.

Sorry for the complexity, but as I say, I'm still thinking about the design of this at the moment, but I'm trying to determine what the options are for implementing it.  Due to the likely complexities of using meta data to control how to process the fields in the file I can't see any option other than using pl/sql and possibly trying to use bulk arrays to minimize database traffic.


0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34145948
11gR1 or 11gR2?  I'm not positive it will matter, but I don't want to head down a path that you won't be able to follow?

I agree that some pl/sql might be needed, but I'm not quite ready to give up on a "mostly" sql approach to this.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34145968
by they way, you're both going to have to cover the Oracle questions for the rest of the day.  This one intrigues me, I don't anticipate getting much else done.  :)
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 16

Author Comment

by:Milleniumaire
ID: 34146196
A brand new 11gR2 database is to be created for the data warehouse.  Currently I only have access to 10gR2, but I'm looking to install a personal edition of 11gR2 until they have installed the hardware and software.  As you can tell, this is the early stages of the project.

Please don't spend too much time on this, I put the question out there in the hope that there was a better way than writing lots of "slow" pl/sql.

I'm currently investigating the use of sql*loader to load the csv data into nested tables and to process it form there.

I've been asking for some samples of data so that I can best analyse it, but it hasn't been forthcoming due to time constraints of the Pick (Data  basic) developers.
0
 
LVL 20

Accepted Solution

by:
gatorvip earned 167 total points
ID: 34146878
You might be able to use SQL if you use regular expressions but I'm not sure it could work unless you knew in advance how many items you would have in each field.

You can do this in APEX (although it would involve some PL/SQL):
- use the apex_util.string_to_table function to split up the strings ( http://www.sloug.org/i/doc/api073.htm )

Something like

array1 := apex_util.string_to_table( line , ',');
array2 := apex_util.string_to_table( field, ';');
array3 := apex_util.string_to_table( subfield, ':');

and loop through each array.

Another possibility would be to build a small application that pre-processes the data dump externally (leaving the work outside of the database) then either inserts the results back into the db, or creates CSV file that are importable with external tables.
0
 
LVL 28

Assisted Solution

by:Naveen Kumar
Naveen Kumar earned 166 total points
ID: 34236716
Just a thought, why not use unix ( awk/perl ) to come up with a script to first get the data into proper format into different files. We are not using the DBMS SQL/PLSQL to do all the stuff required here, instead using some of the OS utilities and keep the data formatting ready for the sql loader to go from there.

I mean if the data is like this...

f1a;f1b;f1c1:f1c2:f1c3;f1d,f2,f3a;f3b1:f3b2;f3c,f4a;f4b,f5

table_u_file.dat ---> ( which will have  one row containing values f2 and f5 (non-multi-value columns).

table_v_file.dat ---> ( which will have multiple rows; one for value f1a, one for value f1b, one for value f1d , with any other fields you need for joining/FK's etc )

and so on....

Using sql loader to load these individual files into whatever tables should be a simple thing right.

Thanks
0
 
LVL 28

Expert Comment

by:Naveen Kumar
ID: 34236726
I vaguely remember ( i think in the early stages of oracle 8i release) of data loading that i went through such a requirement to get into database but then our team at that time did not have unix awk/perl experts but we were good in pro*c. So we used pro*c program to do this kind of loading with batch commits.

Thanks
0
 
LVL 16

Author Closing Comment

by:Milleniumaire
ID: 34237413
Thanks for the suggestions guys.

The way I achieved this was to use the metadata to generate a package containing a pipelined function per file.  Each function would be passed a cursor (again generated from the description of the input file in the metadata) and would define an output array and then move the data from the cursor to the output array and write it out using PIPE ROW.  The metadata identified the multi value fields and the function parsed these into arrays before writting to the output array and being piped out.
A view was created for each function call to hide the complexity of the call. Also external tables were created by the meta data to allow the sql (passed to the function) to retrieve the data directly from the files.
It all seems to work and performance isn't too bad.
The suggestion by gatorvip was the best, however I had already written my own parse function, which does the same as apex_util.string_to_table.  I may however give the apex_util function a try and compare the performance.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
This video shows how to copy an entire tablespace from one database to another database using Transportable Tablespace functionality.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now