Hi experts,
I'm currently facing an issue with a large temporary table (let's call it T_AGG_TEMP).
I'm working on a datawarehouse DB, this denormalized table is used to build new aggregate tables that share the same scope of analysis :
the temp table avoids repeating several filters and joins in all the subsequent aggregate tables.
The table is 4 Gigs in size, and has a regular index on COMPANY_CODE. It is truncated and reloaded each an every time.
Lately the team has been trying to parallelize the aggreagte tables computing.
The parallelised table creation scripts use a loop on COMPANY_CODE, and use INSERT INTO ... SELECT ... statements.
This works rather well, except that from time to time the create as select statements freeze and tend to execute in sequence rather than in parallel, due to heavy I/O activity (disk usage).
To optimize disk usage, my first idea was to partition the temp table, hence addressing a named partition and a list (or range) of COMPANY_CODEs rather than the whole table.
Unfortunately the customer does not have the Enterprise version (and hence no partitioning, no BITMAP indexes, ... :/ ), so I'll have to cope with the abilities of the basic version.
As I tend to think that segregating the data from the different companies is a serious candidate to reduce processing time in the second phase ( building aggregate tables based on T_AGG_TEMP),
I made an attempt by creating the T_AGG_TEMP table in a cluster (sized rougly 20% bigger than the T_AGG_TEMP table) indexed on COMPANY_CODE.
The problem is that loading the clustered version of the table takes at least 3 hours (I've canceled the creation statement each time, as the estimated remaining time in TOAD kept growing),
instead of about 45 minutes for the normal indexed version (the index is not dropped upon loading and recreated afterwards, for the moment).
I'm using the exact same script (an "insert into ... select ... ).
COMPANY_CODE has about 100 different values, which makes on average 40 Mo per COMPANY_CODE.
Does this disqualify the temp table for clustering ?
I tried several values for the SIZE parameter in my CREATE CLUSTER statement, but the books online say that this parameter is maxed with the OS block size (maybe 8k on this Win32 version ? )
Did I miss something ?
How should I try to segregate my data based on COMPANY_CODE ?
Any advice welcome ...
Thanks
Hilaire
Start Free Trial