• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 349
  • Last Modified:

DB2 table partitioning: Would you recommend this key as a partitioning key?

Hi DB2 Gurus,
First my business is in the NorthEast and in the retail business and a lot of the things we do is "store" driven - sales, promos, delivery, purchasing, ec, etc. I am about to embark on partitioning some of our large tables and it was recommended that I partition the tables by Store_key (INTEGER). The store_Key is nothing special, just numbers begining fro 1 through 1200. That's it!
Question 1: Would you recommend the store_key as a partitiong key? If not, why not?
Question 2: If you accept store_key as a partitioning key, how would you code the parttioning details?. For example, I have in mind to create the partitions as follows:
create table products
(xxx int, store_key int, storeno int....)
partition by (store_key)
(partition 1 begin at (1),
 partition 2 begin at (2),
partition 2 begin at (3),
partition 2 begin at (4),
partition 2 begin at (5),
partition 2 begin at (6),


  Question 3: What recommendation can anyone give on how to distribute the table parts on tablespace partitions?
              Does it make sense to have 1200 partitions now or 600 partitions now, etc, etc.?
              Is there such thing as auto-add partition? I don't mean partition-by-growth?

  All any assistance, recommendations or links to useable partitioning information will be highly appreciated.

  • 5
  • 5
  • 4
  • +1
4 Solutions
Kent OlsenData Warehouse Architect / DBACommented:
Hi Enyimba,

There are several things that go into when to partition a database and how.  

-  Will this be the OLTP or OLAP database?  That is, is this the system of record with the full business accounting or a reporting system?

-  How many stores do you expect in the database?

-  How much data do you expect per store?  What is the range (highest and lowest volume) by store?

-  What's your expectation from partitioning?

Nowadays you buy more RAM and SSD disks and dont hurt your brain with partitioning brain-damage.
EnyimbaAuthor Commented:

Let me try to answer your questions:

1. This will be more of a OLAP system. There is a front-end third-party product thta capture, enhance and dump various record into several of these tables.
2. Today there are about 900 store and we expect to grow those to 1700.
3. Data per store will vary but only a few of the tables (9 tables to be exact) will have millions and millions of rows (the smallest is about 32 million rows, the larget about 972 million rows) and those nine tables are ones that I need to partition.
4. What do I expct frompartitioning? All the benefits of partitioning - managing tables growth with respect to it's maintenance (backup & restore comes to mind), enabling better query performance, etc etc.
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

1. OLAP - so you need RAM, not SSD to keep maximum data in memory.
2. Partitioning does not impact scalability for client connections. Rule of a thumb is 100 per CPU core, though 1000 works too
3. If you run on 64bit system it will suffice for our lifetime
4. Nothing, just extra hassle and waste of your time.
Kent OlsenData Warehouse Architect / DBACommented:
I agree with gheist in that so far, I don't see anything that screams for partitioning.  It's not that we can't find something, but we're not there yet.

My suspicion is that you're not building a true data warehouse (i.e. star schema), but putting the combined data of all of the stores into a single data repository.

With that in mind, the only reason that I'd consider partitioning on the Store Id is if you completely rebuild the repository for a given store on a regular basis.  It's pretty efficient to drop and recreate the partition.  If you simply insert the new data, the partitioning doesn't help.  (And as a DBA, I really don't want to create and manage 1,700 partitions if I don't have to.)

You mention that 9 of the tables are quite large.  30M to 1B rows per store, times 1,700.  That suddenly gets to be a huge table and a concern.  Store sales is probably one of these tables.  Store ID is one possible partitioning key.  Transaction date is another.  Without knowing your needs it's impossible to say which is the better choice, but my instincts would prefer the date (at least for transactions like sales, deliveries, employee hours, etc.)  Queries over time could see a bit of a performance boost over Queries by store.  The big benefit comes when changes occur later.  If/when a decision is made to only keep 3 or 5 years data in this database, it takes only a few minutes to recycle the oldest partitions.

This database could easily be a stepping-stone to a true data warehouse.  What's the likelihood of management deciding to take it a step further and build a true data warehouse?
EnyimbaAuthor Commented:

Thanks gentlrmen for your responses...

you said "This database could easily be a stepping-stone to a true data warehouse...". Precisely what management is gearing towards to do but for other business reason wants all store data accumuated in several tables in the meantime for temporary forecasting need. The EDW will take up to a year to complete but they want something right now.

In the QA environment where this work is being done, the following speaks to my server configuration that I am working with. I would think this pretty much sufficient for my needs at this time,don't you agree?

Operating System Information:

OSName:   AIX
NodeName: rapmq02hdqra
Version:  7
Release:  1
Machine:  00F6EF184C00

CPU Information:
TotalCPU    OnlineCPU   ConfigCPU   Speed(MHz)  HMTDegree  Cores/Socket
80          80          80          3108        4           8

Physical Memory and Swap (Megabytes):
TotalMem    FreeMem     AvailMem    TotalSwap   FreeSwap
153600      65121       n/a         241920      241719

Virtual Memory (Megabytes):
Total       Reserved    Available   Free
395520      n/a         n/a         306840

Message Queue Information:
MsgSeg      MsgMax      MsgMap      MsgMni      MsgTql      MsgMnb      MsgSsz
n/a         4194304     n/a         1048576     n/a         4194304     n/a

Shared Memory Information:
ShmMax               ShmMin               ShmIds      ShmSeg
35184372088832       1                    1048576     268435456

Semaphore Information:
SemMap      SemMni      SemMns      SemMnu      SemMsl      SemOpm      SemUme      SemUsz      SemVmx      SemAem
n/a         1048576     n/a         n/a         65535       1024        1024        n/a         32767       16384

CPU Load Information:
Short     Medium    Long
1.265182  1.308929  1.324539

CPU Usage Information (percent):
Total     Usr       Sys       Wait      Idle
3.400000  1.187500  0.137500  2.075000  96.600000
I'd swap half of CPUs for memory to cover at least 10-20% of data

HMT is bad as only one HMT thread serves interrupts from devices just disable it and use 20 normal CPUs

With present configuration I'd suggest doubling the shared memory for DB
Kent OlsenData Warehouse Architect / DBACommented:
*) It appears that once inserted, data will be static.  New data is added, but old data is not deleted.  And once inserted, data is unlikely to be modified.
*) On today's redundant hardware, a failure that takes down this database is unlikely.  In fact, the load process can be more dangerous than a potential hardware failure.

If I were going to build this, I'd be tempted to pick the largest tables (perhaps some or all of the 9 you referenced) and pick a file size threshold that makes you comfortable.  1TB, 10TB, 50TB, etc.  Make that your target partition size and then use the Store ID or transaction date as the partition key.  That keeps the number of partitions down while giving you a reasonable recovery method if you should have to rebuild part of the database.

When the DW is built, will this repository become the source for it?   (It would make sense as the data is already in one location making the transformation relatively easy.)  If so, I'd give serious consideration to using the transaction date for the partition key were possible.  Your data will probably be loaded into the new structure 1 store at a time as I would expect the stores' database backups to be the data source.  If so, the data will tend to be grouped by store, not by time.  Partitioning by transaction date can make loading the fact tables in the data warehouse a lot more efficient.
EnyimbaAuthor Commented:
Thank you guys...you have given me a lot to chew on and that is what I am going to do and consult with my colleagues on your expressed thoughts. I will most probably come back with more question.

Again, thanks and stay tuned...

For DB inserts you must have SSD mirror of size of say daily to weekly data, so you insert in an instand and let DB chew it.
Immediate advice is to add half of free memory to DB doubling its memory.
Kent OlsenData Warehouse Architect / DBACommented:
gheist is dead right here.  More memory.  

This kind of database isn't ideal.  It's not designed with any kind of performance in mind, it's simply a way of holding all of the combined data with relatively little (human) work.  I've a similar database at my office, though it's much smaller than what you're working with.

If your AIX server is an LPAR, have the admin crank up the memory at least until most of the loading is complete.  Have him be extremely generous.  (It can always be reduced later.)  That will be a huge help for some of the tasks, not so much for others.  A REORG or clustered index build will go a lot faster, a bulk load not so much.  With enough memory, a lot of the index pages will remain cached so that consecutive queries on related items (common columns) could see a huge performance boost.

In a lot of models, it's faster to drop the indexes, load the new data, and recreate the indexes.  It's probably not more effective with your data due to the high number of stores.  (Rebuilding indexes because the database is growing 0.1% is probably not best.)  Load as many of the stores as you can before you first build your indexes.
memory is more for retrieval. Say if you work on latest data it is best to fit most of that in RAM.
Tomas Helgi JohannssonCommented:

One thing that partitioning does to your queries are so called partition pruning
that is eliminating those partitions that are not needed, limiting the scanned indexes and table rows to
a small subset as possible ergo lower cpu and execution time as well as other lower resource usage (bufferpools, workfiles etc).
Partition keys needs to be choosen carefully in that  respect that those columns are used in all or most of your queries
and gives you the most evenly distribution of data as possible.

At work we have one table with around 2 billion rows in 207 partitions in a OLTP database on DB2 10 z/OS. :)
Queries run with optimal performance. :)

    Tomas Helgi
EnyimbaAuthor Commented:
Hello gheist,

Could you please expand on your comment about doubling memory. What do mean by "...With present configuration I'd suggest doubling the shared memory for DB"? Buy more memory or increase the DB shared memory config. parameter?

I don't understand what you are talking about in this statement: " HMT is bad as only one HMT thread serves interrupts from devices just disable it and use 20 normal CPUs". What is HMT is bad...?


EnyimbaAuthor Commented:
Thanks guys
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

  • 5
  • 5
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now