Link to home
Start Free TrialLog in
Avatar of damienm
damienmFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Add number range dynamic partition

Hi,

I have a table

premium
-------
time_key     NUMBER(15,0)
amt          NUMBER(12,2)

Where time_key is in format YYYYMMDD (e.g.  if date was 01-Jul-17 the field would have 20170701).  This table has around 150 million rows and I would like to create a dynamic range partition by year.

Thanks for any help

Damien
Avatar of Tomas Helgi Johannsson
Tomas Helgi Johannsson
Flag of Iceland image

Hi,

You would need to create range partitions where each partition is less than 20YY0000 or YYYY0000 depending on what your data date range you have.
You could use interval of size 10000.
Have you queried the table to see how the data would be distributed across the partitions ?
You need also consider how your indexes are partitioned as well. Non partitioned indexes across the table / all partitions can produce maintenance problems.
Also consider subpartitioning if that would benefit your queries.
https://www.oracle.com/technetwork/database/partitioningguidev43-2703320.pdf
https://docs.oracle.com/database/121/VLDBG/GUID-FA29787B-8700-49EA-B80D-54D667CF2752.htm#VLDBG004
https://docs.oracle.com/database/121/VLDBG/GUID-0CAB4231-E7DB-4245-9C43-C9CA352EC298.htm#VLDBG1084

Regards,
    Tomas Helgi


Avatar of damienm

ASKER

I was thinking something like below but I am unsure how to make it dynamic so that the dba doesn't have to manually add a partition each year

alter table premium
  modify partition by range (GREATER_EFF_MONTH_KEY)
 ( PARTITION greater_dt_2011 VALUES LESS THAN (20120101)
    TABLESPACE tsa
    ,PARTITION greater_dt_2012 VALUES LESS THAN (20130101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2013 VALUES LESS THAN (20140101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2014 VALUES LESS THAN (20150101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2015 VALUES LESS THAN (20160101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2016 VALUES LESS THAN (20170101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2017 VALUES LESS THAN (20180101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2018 VALUES LESS THAN (20190101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2019 VALUES LESS THAN (20200101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2020 VALUES LESS THAN (20210101)
        TABLESPACE tsa
    ,PARTITION greater_dt_2021 VALUES LESS THAN (20220101)
        TABLESPACE tsa
)
Avatar of damienm

ASKER

Thanks Tomas

Originally I tried adding an index on date range but it was still slower than I was hoping.  Also there is an index already on agent_id.

Generally a lot of queries are YTD or last 12 months based on an agent or all agents in a region sort of thing.  I was thinking if I partitioned by year then the agent index would be used and return fairly quickly.

ASKER CERTIFIED SOLUTION
Avatar of Tomas Helgi Johannsson
Tomas Helgi Johannsson
Flag of Iceland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You might create a before insert trigger to add partitions when needed.
Avatar of damienm

ASKER

INTERVAL partition is more or less exactly what I want

Just curious if you know how it chooses a name for the new partition, like in the below example would it choose t4 for the next partition or does it just create sales_interval00

CREATE TABLE sales_interval
  2  (product_id                NUMBER(6),
  3  customer_id                NUMBER,
  4  time_id                    DATE,
  5  channel_info               CHAR(1),
  6  promo_id                   NUMBER(6),
  7  qty_sold                   NUMBER(3),
  8  amt_sold                   NUMBER(10,2)
  9  )
 10  PARTITION BY RANGE (time_id)
 11  INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
 12  (PARTITION t0 VALUES LESS THAN (TO_DATE('1-1-2005','DD-MM-YYYY')),
 13  PARTITION t1 VALUES LESS THAN (TO_DATE('1-1-2006','DD-MM-YYYY')),
 14  PARTITION t2 VALUES LESS THAN (TO_DATE('1-7-2006','DD-MM-YYYY')),
 15  PARTITION t3 VALUES LESS THAN (TO_DATE('1-1-2007','DD-MM-YYYY')) )
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

If you declare partitions as automatic, it will generate a system name for new ones.

Your original question mentions partitioning by year and the latest code shows month?

If you can control the table, then store dates as dates NEVER as numbers!!!

It all depends on how you will be quering the data.  If you will query by "year" and partition by a "date", you will have to change all your queries to do "dates" to take advantage of partition pruning.

Might I suggest an alternative?  Create an invisble virtual year column and partition by that?

This can change if you can use dates over numbers...

An example:
drop table tab1 purge;
create table tab1(
	time_key NUMBER(15,0),
	amt NUMBER(12,2),
	year invisible as (to_number(substr(to_char(time_key),1,4)))
)
PARTITION BY list (year) AUTOMATIC
(PARTITION p0 VALUES (1900))
;

insert into tab1 values(20170701,1);

select partition_name from user_tab_partitions where table_name='TAB1';

Open in new window

Hi,

It's my opionion that for optimal query performance and then partition keys, primary keys and indexes needs to have some overlapping columns. Thus partition key needs to be subset of the primary key and subset of the partition key needs to be in every index. This helps the optmizer to quickly filter (partition pruning) out necessary table-partitions/index partions to access and exclude others.
Using other columns for partitioning that are not used in any query will cause some overhead in that filtering.
But I agree with slightwv that using date type as partitioning is far more easier to manipulate than dates stored as a number, decimals or strings.
In fact storing dates,time or timestamp not as a Date,Time or Timestamp datatypes in modern systems is basically wrong design strategy.

Regards,
    Tomas Helgi

Avatar of damienm

ASKER

Interesting on adding a new field i.e. year.  That would probably be possible

I can get the year by below though

ROUND(GREATER_EFF_MONTH_KEY/10000,0)

select (ROUND(20200601/10000,0))
from dual

gives 2020

I don't think I would be able to change the data type to a date.  This is a star/snowflake schema and the premium table only has dimension keys and amount fields.

To get the date we join the premium_fact table to the date_dim table similar to below.

So query might be

SELECT p.region,
               SUM(p.amt) tot_amt
FROM    premium_fact p,
               date_dim d
WHERE p.date_key = d.date_key
AND       d.cycle >= '01-Jan-2020'
GROUP BY
               p.region

I have asked the dbas to create similar to below so will see what they say

alter table premium_fact
  modify partition by range (MONTH_KEY)
INTERVAL(ROUND(MONTH_KEY/10000,0))
 (   PARTITION premium_fact01 VALUES LESS THAN (20120101)
   ...

    ,PARTITION premium_fact10 VALUES LESS THAN (20210101)
    ,PARTITION premium_fact11 VALUES LESS THAN (20220101)
)

You keep changing things?

Your last SQL shows a month_key column, the previous a time_key in a create table.

The query shows a where clause with a STRING.  NEVER rely on implicit data type conversions.

If you will be querying by a "date", you need to partiton by a "date".  The columns in the where need to at least be part of the partition key if not the primary part or you defeat one of the main purposes of partitioning.
Avatar of damienm

ASKER

We have an 8 column primary key on the premium table and there is a composite index on all 8 columns (including date_key).  I am not sure of the value of indexing on so many columns but when I suggested that this was not optimal there was a lot of pushback.  i.e. if it not broke don't fix it.

Avatar of damienm

ASKER

Every query will have date.  Some queries will have date and agent, some will have date and some other key that is why I think partition on date will help with speed of queries

>>Every query will have date.

Then you need a date somewhere in the mix.  Possibly partitioning.  Possibly a Function-Based index to index the number as an actual date.  Possibly an Index-Organized table (IOT) based on a new date column.  Haven't done much with invisible columns so I'm not sure if you can use them in an IOT.

>> that is why I think partition on date will help with speed of queries

Possibly some.  Probably not all.  A lot of other factors are involved in the optimizer choosing to do partition pruning.  If it cannot prune, then partitioning won't help.  Partitioning isn't a quick fix for bad queries/design.
Avatar of damienm

ASKER

I am not meaning to keep changing names, I am not putting the real table name or the date key name as it may be against company rules to put info like that on internet.  I am just trying to get general solution on range partition on a number field (which can be converted to a date in format YYYYMMDD).  
I get not posting real data but consistency helps for those of us that are easily confused and distracted...

In your case, you should partition based on how the data will be accessed the most.  There are other reasons to partition but they don't seem to apply here.

Your last query has date_key in the join, so maybe that would be a good option then possibly sub-partition by date?

If query performance is the final goal, as I said, partitioning may or may not help.  We really don't have any way of knowing.  There are alternatives that might achieve your tuning goal.
Since it appears that you may be locked into a time_key with a number data type, there is really no reason it can't be used in partitioning. There are lots of reasons why if it represents a date that it should be a date data type. Since the format is YYYYMMDD you can "get away with it" but no other date format would work. I have used something similar to slightwv's suggestion by creating a column, partition_key as a virtual column instead of the suggested year as an invisible column, with a data type of number. In my case the value came from a varchar2 column but in your case mod(time_key,10000) would work. So partitioning automatic with a "seed" partition name that includes any year prior to the earliest time_key year (e.g. P1990) works. I additionally found that, since we have reasons to drop outdated years, it's more efficient to drop a partition than to delete where a year equals a specific. To accommodate that you can create a procedure to rename the partitions to meaningful names (e.g. P1991, P1992, etc.) so that you can drop partitions without having to know which system generated names belong to which year.