Oracle 11g Subpartition

1. Can someone give me a quick introduction on Oracle 11g subpartition?
2. I want to partition the table by two columns.  The first column is a DATE column (I would like to do an Interval Partition on this column) and the second column is a VARCHAR2 column (I want to do subpartition on this column).  Can someone give me a sample SQL to create such a table?

Thanks.
thomaszhwangAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

slightwv (䄆 Netminder) Commented:
What types of partitions are you thinking?  Range, Hash, List, ???

I'll assume the date column is primary and will be a range partition?


There are tons of examples out there on subpartitions.  The docs are always a good start:
http://docs.oracle.com/cd/E11882_01/server.112/e25523/part_admin001.htm#VLDBG1106
0
thomaszhwangAuthor Commented:
Actually the date is not the primary key.  There is no primary key on this table.

For the date column, I want to do an Interval Partition and for the string column, basically I want to partition it for each value in that string column.  If possible, I don't want to enumerate all the possible values in that string column.  Is it possible?

Thanks.
0
sventhanCommented:
check this for syntax / sample ...

create table t
  2  ( id_num   number,
  3    data     varchar2(30),
  4    seqno     as (to_number(substr( to_char( id_num, 'fm00000000000'), 5, 4 )))
  5  )
  6  partition by range(seqno)
  7  interval (1)
  8  (partition empty values less than (0))
  9  /

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5257352600346172725
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

slightwv (䄆 Netminder) Commented:
>>If possible, I don't want to enumerate all the possible values in that string column.  Is it possible?

I believe you have to enumerate for everything but a HASH partition.  Have you thought this idea through?  I'm hoping you have a set number of allowed strings or you will likely have some serious issues.

If you can provide more information about the system we might be able to offer better advice.

I assume since you want partitioning with subpartitions you are talking A LOT of data.

For the subpartitions might you be able to get by with some sort of substring? Maybe the alphabet (A-Z) as subpartitions?

If you have a set number of values for the varcahr2 column, how many distinct values?
How many rows are we talking about(total, per primary partition and per subpartition)?
0
thomaszhwangAuthor Commented:
Thanks.
0
thomaszhwangAuthor Commented:
@sventhan I want to partition by date and subpartition by the string column, but I don't see subpartition here.

@slightwv there is a dozen of unique values for the string column.  The reason why I don't want to emulate all values is because I might want to add new value into this column.  This is to say the acceptable values for this column is not pre-defined.

The other reason why I want to partition this table is because the data on this table will always be accessed by each subpartition and there will be frequent operations to drop a subpartition and reload it.

Thanks.
0
slightwv (䄆 Netminder) Commented:
You still never mentioned the amount of data you are dealing with.

Some estimates per primary partition and individual subpartitions would help a lot.

>>but I don't see subpartition here.

There is a subpartition example in the doc link I provided.  As I mentioned, there are TONS of examples on the Web.  Any of them should give you a solid foundation to build from.

>> I might want to add new value into this column.

Then make it a business process to add the new sub-partition before the 'new' row is ever inserted.

Even if you 'forget' this shouldn't be a big deal.  In later releases with automatic row movement, even if you add the new subpartiton later, everything should be fine.

I've never tried this but in theory it should be possible:  Create a DML trigger for the table that would automatically add a subpartition when a new row is inserted with a new value.

Granted with a heavy OLTP table, this probably isn't feasible and would likely cause a bunch of unnecessary overhead..

A better solution might be a database job that runs every day,week,month,???  and looks for new values in the table that creates the new subpartition when a new value is discovered.

You can query the partition views and run that against the column values.

>>and there will be frequent operations to drop a subpartition and reload it.

You typically don't drop a partition to reload it.  You may truncate it before a reload to save some time.  If it saves enough time over a simple delete depends on the amount of data involved.

You have to ask just how often this happens and will a truncate really be more efficient that a delete against the primary partition?

>>because the data on this table will always be accessed by each subpartition

I do have to ask:  Is a dozen subpartitions per primary partition really necessary?

What are you hoping to achieve by this or is it just a 'theory'?
0
thomaszhwangAuthor Commented:
>> You still never mentioned the amount of data you are dealing with.

The data volume is about 100,000 rows per day.

>> In terms of the new values to the string column.

Adding business process is not always something easy to do.  Sometimes I feel it much more difficult, comparing to solve the problem technically.

Also since I can only estimate the number of distinct values, I need to consider a way to make this solution scalable.  When you say "In later releases with automatic row movement, even if you add the new subpartiton later, everything should be fine.", I suppose this feature is still not available yet and I'm not sure when this releases will actually happen and even it happens, I'm still not sure if it will be applied to my environment.  This would be a big uncertainty to me.

I'm now considering, for the subpartition, I will use the hash partition.  I suppose this will partition the table based on the hash value of the column.  If this is true, and as long as my data is not skewed on this column, I should always use the hash partition, no matter how many distinct values do I have, right?

I'm not sure what do you mean by partition views.

>> You typically don't drop a partition to reload it.  You may truncate it before a reload to save some time.  If it saves enough time over a simple delete depends on the amount of data involved.  You have to ask just how often this happens and will a truncate really be more efficient that a delete against the primary partition?

I didn't know we can truncate on a partition/subpartition.  I suppose truncate will be more efficient than delete.  At least this is true to Vertica.  I have never worked with Oracle before.

>> I do have to ask:  Is a dozen subpartitions per primary partition really necessary?

I'm not sure if this is necessary.  That's why I'm trying to investigate.  I assume if we know we are sure to only access a specific range of the table, it should be faster, right?  Although there will be overhead maintaining the partitions.

>> What are you hoping to achieve by this or is it just a 'theory'?

I'm hoping to make my storing and retrieving the data the most efficient way.  I know the data will be loaded by each subpartition and it will happen once a day and also be retrieved by each subpartition many times a day.
0
slightwv (䄆 Netminder) Commented:
There are other Experts on the sight much stronger with partitioning than I am.  I hope the call for help will bring them in later.  If not, I'll try to reach out to them personally.

>>The data volume is about 100,000 rows per day.

Per main partition or per subpartiion?

100,000 a day per main partition really isn't a lot.  If you have decent hardware then I might think twice about subpartitions.

I guess a lot depends on the DML you are planning on making based on your 'drop' statement above.  How many of these rows will you be deleting/adding on a daily/weekly/monthly basis?

>>, I suppose this feature is still not available yet

I was thinking enable row movement which has been around for a while.  Having re-read the docs, looks like that is only for KEY updates.  I'll need to set up some test cases to see if it works for non-key values.

>>comparing to solve the problem technically.
>> no matter how many distinct values do I have, right?

I will need to do some more testing/research but I would really look into hash partitioning with an unknown number of values if range will not work.

>>I'm not sure what do you mean by partition views.

The views that describe partitions and there limits.

Check out: dba_tab_partitions.  Depending on the partition type, it will show you the range/limit values.  You can possibly then look for 'new' values to create new sub partitions.

Again, I'll need to do some testing to confirm this is possible.

>> I suppose truncate will be more efficient than delete.

Immensely more efficient.  No logging involved since it just moves the HWM (High Water Mark).  The problem with truncate is it cannot be rolled back.  If you make a mistake, you need to perform some form of recovery.

>> I assume if we know we are sure to only access a specific range of the table, it should be faster, right?  Although there will be overhead maintaining the partitions.

It will take testing.  It may or may not.  A lot of it depends...

>>retrieved by each subpartition many times a day.

Then do you need the primary partition?

You should also read up on partition indexes.  You might be able to achieve your desired results with a local partition index on the varchar2 column based on the partition on the date column.

Again, test, test ,test...  Only after testing the variations will you be able to make a sound decision.

I think I have partitioning loaded on a development database where I can try to set something up.

I'll try to set up something tomorrow that will get you started.

I also hope some other Experts show up and can offer some additional advice.  If not, I'll break out the 'club' and go find them...
0
thomaszhwangAuthor Commented:
>> Per main partition or per subpartiion?

Per main partition

>> 100,000 a day per main partition really isn't a lot.  If you have decent hardware then I might think twice about subpartitions.

I know the row count is not much and I know it should be able to handle.  I just want to take care of this using the most optimal way.

>> I guess a lot depends on the DML you are planning on making based on your 'drop' statement above.  How many of these rows will you be deleting/adding on a daily/weekly/monthly basis?

Generally, deleting/dropping would only happen when something went wrong, which is absolutely possible.  When this happens, I will need to reload the data depending on how much bad data is there.  Ideally I want to only remove the bad data.

If everything is ok, then nothing will need to be deleted/dropped.

The data will be loaded every day.

>> I will need to do some more testing/research but I would really look into hash partitioning with an unknown number of values if range will not work.

I used to work with Vertica database and it handles hash partition pretty well with millions of distinct values.

>> Immensely more efficient.  No logging involved since it just moves the HWM (High Water Mark).  The problem with truncate is it cannot be rolled back.  If you make a mistake, you need to perform some form of recovery.

That's what I thought.  That's why I prefer to use truncate over delete.

>> Then do you need the primary partition?

I thought the subpartition is subordinates to the primary partition, isn't it?  You cannot have subpartitions without primary partitions, correct?

>> You should also read up on partition indexes.  You might be able to achieve your desired results with a local partition index on the varchar2 column based on the partition on the date column.

I think I really should, but the thing is I do have other priorities.  However the main reason I don't want to consider indexing is because of the maintain overhead, especially in our "load once, read many times and read each subpartition at a time" environment.

I do appreciate your help.  Thanks.
0
slightwv (䄆 Netminder) Commented:
>> You cannot have subpartitions without primary partitions, correct?

Correct.  That was a typo (it's what I get for posting later in the evening).  I meant: do you really need the subpartitions?

>>I don't want to consider indexing is because of the maintain overhead, especially in our "load once, read many times and read each subpartition at a time" environment.

That is actually perfect for indexing.  I'm not sure what overhead you think you will be incurring.  Most of the overhead is building/updating the index.  Once it's done, then it will likely remain cached as you continually access the same rows in the partition.

I'll try to get some time today to mock some things up.  I don't have the space to load up millions of rows to get a more accurate test but I should be able to provide some things to get you started and you can build your own tests from there.
0
thomaszhwangAuthor Commented:
>> do you really need the subpartitions?

I'm not sure if I really need the subpartitions, but the question would be why not if my access pattern is always per subpartition?

>>  That is actually perfect for indexing.  I'm not sure what overhead you think you will be incurring.  Most of the overhead is building/updating the index.  Once it's done, then it will likely remain cached as you continually access the same rows in the partition.

I think indexing is perfect when you need to find SOME SPECIFIC ROWS, but if I always need all rows in a subpartition, I don't see the reason why I would want to build the index for that, no matter how trivial the overhead is.

Again, thanks for your help.
0
slightwv (䄆 Netminder) Commented:
>>but if I always need all rows in a subpartition

So you are planning on changing the query each time to use the partition clause?
0
thomaszhwangAuthor Commented:
>> So you are planning on changing the query each time to use the partition clause?

Well, I'm not quite sure what you mean by "use the partition clause".  In Vertica, when a user submits a query, the query engine will examine the WHERE clause to see if the query could be satisfied by only going through one or more specific partitions, if it can, it won't bother the other non-related partitions.  This is to say it totally depends on the WHERE clause the user submits.
0
slightwv (䄆 Netminder) Commented:
The Oracle optimizer will try it's best to see if a single partition can be used.

In Oracle it's all about blocks read off disk and the fastest way to get the necessary blocks to return the requested data.

Where I'm struggling with this is:
Say the query can be resolved by reading 10 blocks from disk.  Does it really matter if those 10 blocks are all part of the same subpartition or located by an index on the main partition?

You will need to test based on real-world simulations but I bet it will be close.

I'll start working on some examples for you soon (I hope).

>>Well, I'm not quite sure what you mean by "use the partition clause".  

In Oracle you can select from a specific partition but this requires that you enter the partition by name.
0
thomaszhwangAuthor Commented:
>> In Oracle you can select from a specific partition but this requires that you enter the partition by name.

I don't think this is possible and necessary.  Oracle should be able to figure this out itself, although it's nice to know the user has this option in some special occasions.
0
slightwv (䄆 Netminder) Commented:
Here is the first in what I feel will be several examples.

It is the basics you asked for with some of the 'magic' I suggested.  I just used a MONTH based partition for the date.  I don't recall you saying how you were planning on the date partitions so I picked MONTH.

I'll continue working on followup examples to actually place some data in the partitions/subpartitions and play with indexes to see how they compare.

The below example creates the partitions you suggested.  Note the subpartition DEFAULT.  I only created the one METRIC1 subpartition but inserted 4 different METRIC rows.

I then 'magically' discovered I need new subpartitions so I split off a couple of new ones.

This is what I was suggesting could be automated to keep you from having to manually add new subpartitions.  Just have something that scans the data looking for new metrics and generate the SQL to split it out...

drop table t purge;
CREATE TABLE t (
    DATEID DATE,
    USERID NUMBER,
    METRIC VARCHAR2(2000),
    VALUE  NUMBER(38,20)
)
partition by range(dateid) interval(numtoyminterval(1, 'MONTH')) subpartition by list(metric)
(
	--create initial partition, others will be created automatically because of the interval
	partition dateid_part1 values less than (to_date('1/1/2012', 'MM/DD/YYYY'))
	(
		subpartition dp1_metric1 values('METRIC-1'),
		subpartition dp1_others values(default)
	)
)
/

--insert some data to show that even though there is no partition for this month in 2013, a new partition is created for you.
insert into t values(sysdate,1,'METRIC-1',1);
insert into t values(sysdate,1,'METRIC-2',1);
insert into t values(sysdate,1,'METRIC-3',1);
insert into t values(sysdate,1,'METRIC-4',1);
commit;



--OH, I discovered new metrics that I need to now make subpartitions...  Just split them out.
--  Notice, METRIC-4's row above will stay in the 'default' subpartition until one is created for it.
alter table t split subpartition dp1_others into
(
		subpartition dp1_metric2 values('METRIC-2'),
		subpartition dp1_metric3 values('METRIC-3'),
		subpartition dp1_others
);


--show the partitions
select partition_name, high_value
from dba_tab_partitions
where table_name='T'
order by partition_name;


--show the subpartitions
select partition_name, subpartition_name, high_value
from dba_tab_subpartitions
where table_name='T'
order by partition_name;

Open in new window

0
slightwv (䄆 Netminder) Commented:
Might be a small problem with the example above.  When I actually loaded up more sample data, it doesn't look like all the subpartitions are being automatically generated with each new partition.

I'll keep digging.
0
thomaszhwangAuthor Commented:
Thanks.
0
slightwv (䄆 Netminder) Commented:
Disclaimer:  I suggest using sqlplus for the code examples.  I know the 'set autotrace' works in it.  Not sure about whatever GUI you might be using.


I still haven't looked into if all subpartitions can be automatically created when a new partition is created...  I'm thinking they can't be.

BUT...now is when your Oracle education takes a giant leap forward.

In the test case below, I created two tables.  One with sub partitions and one without but a local index on METRIC.

I load them both up with the EXACT same data.

I then perform the same query against both and generate the execution plans with statistics.

On my machine with the amount of sample data I generated, the one without subpartitions is 'better'.  Your mileage may vary (depending on your equipment and the amount of data you may wish to load up).

I also added examples for selecting from specific partitions and subpartitions just because I mentioned it.

Let me know what else you would like and I'll see what I can come up with.

drop table t_sub purge;
CREATE TABLE t_sub (
    DATEID DATE,
    USERID NUMBER,
    METRIC VARCHAR2(2000),
    VALUE  NUMBER(38,20)
)
partition by range(dateid) interval(numtoyminterval(1, 'MONTH')) subpartition by list(metric)
(
	partition dateid_part1 values less than (to_date('1/1/2012', 'MM/DD/YYYY'))
	(
		subpartition dp1_metric1 values('METRIC-1'),
		subpartition dp1_metric2 values('METRIC-2'),
		subpartition dp1_metric3 values('METRIC-3'),
		subpartition dp1_metric4 values('METRIC-4'),
		subpartition dp1_metric5 values('METRIC-5'),
		subpartition dp1_metric6 values('METRIC-6'),
		subpartition dp1_metric7 values('METRIC-7'),
		subpartition dp1_metric8 values('METRIC-8'),
		subpartition dp1_metric9 values('METRIC-9'),
		subpartition dp1_metric10 values('METRIC-10'),
		subpartition dp1_metric11 values('METRIC-11'),
		subpartition dp1_metric12 values('METRIC-12'),
		subpartition dp1_others values(default)
	)
);

drop table t_nosub purge;
CREATE TABLE t_nosub (
    DATEID DATE,
    USERID NUMBER,
    METRIC VARCHAR2(2000),
    VALUE  NUMBER(38,20)
)
partition by range(dateid) interval(numtoyminterval(1, 'MONTH')) subpartition by list(metric)
(
	partition dateid_part1 values less than (to_date('1/1/2012', 'MM/DD/YYYY'))
);

create index t_nosub_idx on t_nosub(metric) local;

begin
	for i in 1..10 loop
		for q in 1..1000 loop
			insert into t_sub values(add_months(to_date('12/31/2011','MM/DD/YYYY'), (i-1)),1,'METRIC-' || mod(q,10),1);
			insert into t_nosub values(add_months(to_date('12/31/2011','MM/DD/YYYY'), (i-1)),1,'METRIC-' || mod(q,10),1);
		end loop;
	end loop;
end;
/

commit;

set autotrace traceonly
select count(userid) from t_nosub where dateid = to_date('12/31/2011','MM/DD/YYYY') and metric='METRIC-5';
select count(userid) from t_sub where dateid = to_date('12/31/2011','MM/DD/YYYY') and metric='METRIC-5';
set autotrace off


--select from specific partition example
select count(*) from t_sub partition(dateid_part1);

--select from specific subpartition example
select count(*) from t_sub subpartition(dp1_metric5);

Open in new window

0
slightwv (䄆 Netminder) Commented:
While I was Googling around working on my test cases I came across this Oracle White Paper that I thought was pretty good:
http://www.oracle.com/technetwork/database/enterprise-edition/partitioning-11g-whitepaper-159443.pdf
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
thomaszhwangAuthor Commented:
Thanks.  This paper is very helpful.
0
sdstuberCommented:
>>  If possible, I don't want to enumerate all the possible values in that string column

range or hash are the way to go to avoid enumeration.
Based on your usage description, I'd go with hash

EXCEPT...

>>but if I always need all rows in a subpartition

Using hash this won't be possible.  However, as slightwv pointed out, it's probably not necessary and, might even be beneficial to NOT try, because the optimizer could parallel query into multiple subpartitions at once to pull your data.

So, I'd still recommend hash.

EXCEPT...

>>there will be frequent operations to drop a subpartition and reload it.

This is pretty much a show stopper for hash.  You can delete rows but if you want to drop/truncate then that's not going to be a viable option.

So, that leaves list or range.

List sounds ideal except you might be adding new values.    A default subpartition will catch those but then it becomes a skewed set.  Unfortunatly, you can't add a new list value if you have a default  (you can split though to create one, as shown above).  But, if you don't use a default then you might start rejecting data if the new value comes in before you get a new subpartition created. And, if new values might come along frequently this could become a headache.

Thus leaving range.
This is sort of a kludge because you don't have wide ranges of values, you have a short list of values.  But, a narrow range is just as legal as a wide range, and, once you gather statistics, the optimizer will know that each range only has a single value so it'll be "effectively" a list partition anyway.  You can add other values later on and split subpartitions if a new value comes in unexpectedly and is dropped into a range you don't want.  Again, if new values might be frequent then continually resplitting ranges isn't any more convenient than resplitting values.

One minor advantage to range would be new values could be adopted into different ranges rather than a single default, so the need to split might be less frequent.  If however, you're going to drop the subpartitions then you'll need to stay on top of the splitting.
0
thomaszhwangAuthor Commented:
Thanks sdstuber for joining us.  I got most of what you said.  Just one part:

>> Using hash this won't be possible.

Since one string value will be always hashed into the same subpartition, how could dropping or retrieve data for a specific subpartition by value inefficient?
0
thomaszhwangAuthor Commented:
Thanks.  This really helps me learn a lot about Oracle Partitioning.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.