DB2 table partitioning: Would you recommend this key as a partitioning key?

Posted on 2015-02-19
Medium Priority
Last Modified: 2015-03-09
Hi DB2 Gurus,
First my business is in the NorthEast and in the retail business and a lot of the things we do is "store" driven - sales, promos, delivery, purchasing, ec, etc. I am about to embark on partitioning some of our large tables and it was recommended that I partition the tables by Store_key (INTEGER). The store_Key is nothing special, just numbers begining fro 1 through 1200. That's it!
Question 1: Would you recommend the store_key as a partitiong key? If not, why not?
Question 2: If you accept store_key as a partitioning key, how would you code the parttioning details?. For example, I have in mind to create the partitions as follows:
create table products
(xxx int, store_key int, storeno int....)
partition by (store_key)
(partition 1 begin at (1),
 partition 2 begin at (2),
partition 2 begin at (3),
partition 2 begin at (4),
partition 2 begin at (5),
partition 2 begin at (6),


  Question 3: What recommendation can anyone give on how to distribute the table parts on tablespace partitions?
              Does it make sense to have 1200 partitions now or 600 partitions now, etc, etc.?
              Is there such thing as auto-add partition? I don't mean partition-by-growth?

  All any assistance, recommendations or links to useable partitioning information will be highly appreciated.

Question by:Enyimba
  • 5
  • 5
  • 4
  • +1
LVL 46

Expert Comment

by:Kent Olsen
ID: 40619796
Hi Enyimba,

There are several things that go into when to partition a database and how.  

-  Will this be the OLTP or OLAP database?  That is, is this the system of record with the full business accounting or a reporting system?

-  How many stores do you expect in the database?

-  How much data do you expect per store?  What is the range (highest and lowest volume) by store?

-  What's your expectation from partitioning?

LVL 62

Expert Comment

ID: 40620077
Nowadays you buy more RAM and SSD disks and dont hurt your brain with partitioning brain-damage.

Author Comment

ID: 40620423

Let me try to answer your questions:

1. This will be more of a OLAP system. There is a front-end third-party product thta capture, enhance and dump various record into several of these tables.
2. Today there are about 900 store and we expect to grow those to 1700.
3. Data per store will vary but only a few of the tables (9 tables to be exact) will have millions and millions of rows (the smallest is about 32 million rows, the larget about 972 million rows) and those nine tables are ones that I need to partition.
4. What do I expct frompartitioning? All the benefits of partitioning - managing tables growth with respect to it's maintenance (backup & restore comes to mind), enabling better query performance, etc etc.
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

LVL 62

Expert Comment

ID: 40620708
1. OLAP - so you need RAM, not SSD to keep maximum data in memory.
2. Partitioning does not impact scalability for client connections. Rule of a thumb is 100 per CPU core, though 1000 works too
3. If you run on 64bit system it will suffice for our lifetime
4. Nothing, just extra hassle and waste of your time.
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 1000 total points
ID: 40621042
I agree with gheist in that so far, I don't see anything that screams for partitioning.  It's not that we can't find something, but we're not there yet.

My suspicion is that you're not building a true data warehouse (i.e. star schema), but putting the combined data of all of the stores into a single data repository.

With that in mind, the only reason that I'd consider partitioning on the Store Id is if you completely rebuild the repository for a given store on a regular basis.  It's pretty efficient to drop and recreate the partition.  If you simply insert the new data, the partitioning doesn't help.  (And as a DBA, I really don't want to create and manage 1,700 partitions if I don't have to.)

You mention that 9 of the tables are quite large.  30M to 1B rows per store, times 1,700.  That suddenly gets to be a huge table and a concern.  Store sales is probably one of these tables.  Store ID is one possible partitioning key.  Transaction date is another.  Without knowing your needs it's impossible to say which is the better choice, but my instincts would prefer the date (at least for transactions like sales, deliveries, employee hours, etc.)  Queries over time could see a bit of a performance boost over Queries by store.  The big benefit comes when changes occur later.  If/when a decision is made to only keep 3 or 5 years data in this database, it takes only a few minutes to recycle the oldest partitions.

This database could easily be a stepping-stone to a true data warehouse.  What's the likelihood of management deciding to take it a step further and build a true data warehouse?

Author Comment

ID: 40621585

Thanks gentlrmen for your responses...

you said "This database could easily be a stepping-stone to a true data warehouse...". Precisely what management is gearing towards to do but for other business reason wants all store data accumuated in several tables in the meantime for temporary forecasting need. The EDW will take up to a year to complete but they want something right now.

In the QA environment where this work is being done, the following speaks to my server configuration that I am working with. I would think this pretty much sufficient for my needs at this time,don't you agree?

Operating System Information:

OSName:   AIX
NodeName: rapmq02hdqra
Version:  7
Release:  1
Machine:  00F6EF184C00

CPU Information:
TotalCPU    OnlineCPU   ConfigCPU   Speed(MHz)  HMTDegree  Cores/Socket
80          80          80          3108        4           8

Physical Memory and Swap (Megabytes):
TotalMem    FreeMem     AvailMem    TotalSwap   FreeSwap
153600      65121       n/a         241920      241719

Virtual Memory (Megabytes):
Total       Reserved    Available   Free
395520      n/a         n/a         306840

Message Queue Information:
MsgSeg      MsgMax      MsgMap      MsgMni      MsgTql      MsgMnb      MsgSsz
n/a         4194304     n/a         1048576     n/a         4194304     n/a

Shared Memory Information:
ShmMax               ShmMin               ShmIds      ShmSeg
35184372088832       1                    1048576     268435456

Semaphore Information:
SemMap      SemMni      SemMns      SemMnu      SemMsl      SemOpm      SemUme      SemUsz      SemVmx      SemAem
n/a         1048576     n/a         n/a         65535       1024        1024        n/a         32767       16384

CPU Load Information:
Short     Medium    Long
1.265182  1.308929  1.324539

CPU Usage Information (percent):
Total     Usr       Sys       Wait      Idle
3.400000  1.187500  0.137500  2.075000  96.600000
LVL 62

Assisted Solution

gheist earned 500 total points
ID: 40621650
I'd swap half of CPUs for memory to cover at least 10-20% of data

HMT is bad as only one HMT thread serves interrupts from devices just disable it and use 20 normal CPUs

With present configuration I'd suggest doubling the shared memory for DB
LVL 46

Expert Comment

by:Kent Olsen
ID: 40621850
*) It appears that once inserted, data will be static.  New data is added, but old data is not deleted.  And once inserted, data is unlikely to be modified.
*) On today's redundant hardware, a failure that takes down this database is unlikely.  In fact, the load process can be more dangerous than a potential hardware failure.

If I were going to build this, I'd be tempted to pick the largest tables (perhaps some or all of the 9 you referenced) and pick a file size threshold that makes you comfortable.  1TB, 10TB, 50TB, etc.  Make that your target partition size and then use the Store ID or transaction date as the partition key.  That keeps the number of partitions down while giving you a reasonable recovery method if you should have to rebuild part of the database.

When the DW is built, will this repository become the source for it?   (It would make sense as the data is already in one location making the transformation relatively easy.)  If so, I'd give serious consideration to using the transaction date for the partition key were possible.  Your data will probably be loaded into the new structure 1 store at a time as I would expect the stores' database backups to be the data source.  If so, the data will tend to be grouped by store, not by time.  Partitioning by transaction date can make loading the fact tables in the data warehouse a lot more efficient.

Author Comment

ID: 40622455
Thank you guys...you have given me a lot to chew on and that is what I am going to do and consult with my colleagues on your expressed thoughts. I will most probably come back with more question.

Again, thanks and stay tuned...

LVL 62

Expert Comment

ID: 40622571
For DB inserts you must have SSD mirror of size of say daily to weekly data, so you insert in an instand and let DB chew it.
Immediate advice is to add half of free memory to DB doubling its memory.
LVL 46

Accepted Solution

Kent Olsen earned 1000 total points
ID: 40623080
gheist is dead right here.  More memory.  

This kind of database isn't ideal.  It's not designed with any kind of performance in mind, it's simply a way of holding all of the combined data with relatively little (human) work.  I've a similar database at my office, though it's much smaller than what you're working with.

If your AIX server is an LPAR, have the admin crank up the memory at least until most of the loading is complete.  Have him be extremely generous.  (It can always be reduced later.)  That will be a huge help for some of the tasks, not so much for others.  A REORG or clustered index build will go a lot faster, a bulk load not so much.  With enough memory, a lot of the index pages will remain cached so that consecutive queries on related items (common columns) could see a huge performance boost.

In a lot of models, it's faster to drop the indexes, load the new data, and recreate the indexes.  It's probably not more effective with your data due to the high number of stores.  (Rebuilding indexes because the database is growing 0.1% is probably not best.)  Load as many of the stores as you can before you first build your indexes.
LVL 62

Expert Comment

ID: 40623206
memory is more for retrieval. Say if you work on latest data it is best to fit most of that in RAM.
LVL 26

Assisted Solution

by:Tomas Helgi Johannsson
Tomas Helgi Johannsson earned 500 total points
ID: 40627723

One thing that partitioning does to your queries are so called partition pruning
that is eliminating those partitions that are not needed, limiting the scanned indexes and table rows to
a small subset as possible ergo lower cpu and execution time as well as other lower resource usage (bufferpools, workfiles etc).
Partition keys needs to be choosen carefully in that  respect that those columns are used in all or most of your queries
and gives you the most evenly distribution of data as possible.

At work we have one table with around 2 billion rows in 207 partitions in a OLTP database on DB2 10 z/OS. :)
Queries run with optimal performance. :)

    Tomas Helgi

Author Comment

ID: 40638707
Hello gheist,

Could you please expand on your comment about doubling memory. What do mean by "...With present configuration I'd suggest doubling the shared memory for DB"? Buy more memory or increase the DB shared memory config. parameter?

I don't understand what you are talking about in this statement: " HMT is bad as only one HMT thread serves interrupts from devices just disable it and use 20 normal CPUs". What is HMT is bad...?



Author Closing Comment

ID: 40655375
Thanks guys

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is an update and follow-up of my previous article:   Storage 101: common concepts in the IT enterprise storage This time, I expand on more frequently used storage concepts.
Compliance and data security require steps be taken to prevent unauthorized users from copying data.  Here's one method to prevent data theft via USB drives (and writable optical media).
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses
Course of the Month13 days, 16 hours left to enroll

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question