Solved

DB2 table partitioning: Would you recommend this key as a partitioning key?

Posted on 2015-02-19
15
229 Views
Last Modified: 2015-03-09
Hi DB2 Gurus,
First my business is in the NorthEast and in the retail business and a lot of the things we do is "store" driven - sales, promos, delivery, purchasing, ec, etc. I am about to embark on partitioning some of our large tables and it was recommended that I partition the tables by Store_key (INTEGER). The store_Key is nothing special, just numbers begining fro 1 through 1200. That's it!
Question 1: Would you recommend the store_key as a partitiong key? If not, why not?
Question 2: If you accept store_key as a partitioning key, how would you code the parttioning details?. For example, I have in mind to create the partitions as follows:
create table products
(xxx int, store_key int, storeno int....)
partition by (store_key)
(partition 1 begin at (1),
 partition 2 begin at (2),
partition 2 begin at (3),
partition 2 begin at (4),
partition 2 begin at (5),
partition 2 begin at (6),
....

  SELECT STORE_KEY,STORENO, ZONENO, REGIONNO FROM RAPM.STORES

  Question 3: What recommendation can anyone give on how to distribute the table parts on tablespace partitions?
              Does it make sense to have 1200 partitions now or 600 partitions now, etc, etc.?
              Is there such thing as auto-add partition? I don't mean partition-by-growth?

  All any assistance, recommendations or links to useable partitioning information will be highly appreciated.

 Enyimba
0
Comment
Question by:Enyimba
  • 5
  • 5
  • 4
  • +1
15 Comments
 
LVL 45

Expert Comment

by:Kdo
ID: 40619796
Hi Enyimba,

There are several things that go into when to partition a database and how.  

-  Will this be the OLTP or OLAP database?  That is, is this the system of record with the full business accounting or a reporting system?

-  How many stores do you expect in the database?

-  How much data do you expect per store?  What is the range (highest and lowest volume) by store?

-  What's your expectation from partitioning?


Kent
0
 
LVL 61

Expert Comment

by:gheist
ID: 40620077
Nowadays you buy more RAM and SSD disks and dont hurt your brain with partitioning brain-damage.
0
 

Author Comment

by:Enyimba
ID: 40620423
kdo,

Let me try to answer your questions:

1. This will be more of a OLAP system. There is a front-end third-party product thta capture, enhance and dump various record into several of these tables.
2. Today there are about 900 store and we expect to grow those to 1700.
3. Data per store will vary but only a few of the tables (9 tables to be exact) will have millions and millions of rows (the smallest is about 32 million rows, the larget about 972 million rows) and those nine tables are ones that I need to partition.
4. What do I expct frompartitioning? All the benefits of partitioning - managing tables growth with respect to it's maintenance (backup & restore comes to mind), enabling better query performance, etc etc.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40620708
1. OLAP - so you need RAM, not SSD to keep maximum data in memory.
2. Partitioning does not impact scalability for client connections. Rule of a thumb is 100 per CPU core, though 1000 works too
3. If you run on 64bit system it will suffice for our lifetime
4. Nothing, just extra hassle and waste of your time.
0
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 250 total points
ID: 40621042
I agree with gheist in that so far, I don't see anything that screams for partitioning.  It's not that we can't find something, but we're not there yet.

My suspicion is that you're not building a true data warehouse (i.e. star schema), but putting the combined data of all of the stores into a single data repository.

With that in mind, the only reason that I'd consider partitioning on the Store Id is if you completely rebuild the repository for a given store on a regular basis.  It's pretty efficient to drop and recreate the partition.  If you simply insert the new data, the partitioning doesn't help.  (And as a DBA, I really don't want to create and manage 1,700 partitions if I don't have to.)

You mention that 9 of the tables are quite large.  30M to 1B rows per store, times 1,700.  That suddenly gets to be a huge table and a concern.  Store sales is probably one of these tables.  Store ID is one possible partitioning key.  Transaction date is another.  Without knowing your needs it's impossible to say which is the better choice, but my instincts would prefer the date (at least for transactions like sales, deliveries, employee hours, etc.)  Queries over time could see a bit of a performance boost over Queries by store.  The big benefit comes when changes occur later.  If/when a decision is made to only keep 3 or 5 years data in this database, it takes only a few minutes to recycle the oldest partitions.

This database could easily be a stepping-stone to a true data warehouse.  What's the likelihood of management deciding to take it a step further and build a true data warehouse?
0
 

Author Comment

by:Enyimba
ID: 40621585
Kdo/gheist,

Thanks gentlrmen for your responses...

kdo,
you said "This database could easily be a stepping-stone to a true data warehouse...". Precisely what management is gearing towards to do but for other business reason wants all store data accumuated in several tables in the meantime for temporary forecasting need. The EDW will take up to a year to complete but they want something right now.

gheist,
In the QA environment where this work is being done, the following speaks to my server configuration that I am working with. I would think this pretty much sufficient for my needs at this time,don't you agree?

Operating System Information:

OSName:   AIX
NodeName: rapmq02hdqra
Version:  7
Release:  1
Machine:  00F6EF184C00

CPU Information:
TotalCPU    OnlineCPU   ConfigCPU   Speed(MHz)  HMTDegree  Cores/Socket
80          80          80          3108        4           8

Physical Memory and Swap (Megabytes):
TotalMem    FreeMem     AvailMem    TotalSwap   FreeSwap
153600      65121       n/a         241920      241719

Virtual Memory (Megabytes):
Total       Reserved    Available   Free
395520      n/a         n/a         306840

Message Queue Information:
MsgSeg      MsgMax      MsgMap      MsgMni      MsgTql      MsgMnb      MsgSsz
n/a         4194304     n/a         1048576     n/a         4194304     n/a

Shared Memory Information:
ShmMax               ShmMin               ShmIds      ShmSeg
35184372088832       1                    1048576     268435456

Semaphore Information:
SemMap      SemMni      SemMns      SemMnu      SemMsl      SemOpm      SemUme      SemUsz      SemVmx      SemAem
n/a         1048576     n/a         n/a         65535       1024        1024        n/a         32767       16384

CPU Load Information:
Short     Medium    Long
1.265182  1.308929  1.324539

CPU Usage Information (percent):
Total     Usr       Sys       Wait      Idle
3.400000  1.187500  0.137500  2.075000  96.600000
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 125 total points
ID: 40621650
I'd swap half of CPUs for memory to cover at least 10-20% of data

HMT is bad as only one HMT thread serves interrupts from devices just disable it and use 20 normal CPUs

With present configuration I'd suggest doubling the shared memory for DB
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 
LVL 45

Expert Comment

by:Kdo
ID: 40621850
*) It appears that once inserted, data will be static.  New data is added, but old data is not deleted.  And once inserted, data is unlikely to be modified.
*) On today's redundant hardware, a failure that takes down this database is unlikely.  In fact, the load process can be more dangerous than a potential hardware failure.

If I were going to build this, I'd be tempted to pick the largest tables (perhaps some or all of the 9 you referenced) and pick a file size threshold that makes you comfortable.  1TB, 10TB, 50TB, etc.  Make that your target partition size and then use the Store ID or transaction date as the partition key.  That keeps the number of partitions down while giving you a reasonable recovery method if you should have to rebuild part of the database.

When the DW is built, will this repository become the source for it?   (It would make sense as the data is already in one location making the transformation relatively easy.)  If so, I'd give serious consideration to using the transaction date for the partition key were possible.  Your data will probably be loaded into the new structure 1 store at a time as I would expect the stores' database backups to be the data source.  If so, the data will tend to be grouped by store, not by time.  Partitioning by transaction date can make loading the fact tables in the data warehouse a lot more efficient.
0
 

Author Comment

by:Enyimba
ID: 40622455
Thank you guys...you have given me a lot to chew on and that is what I am going to do and consult with my colleagues on your expressed thoughts. I will most probably come back with more question.

Again, thanks and stay tuned...

Enyimba
0
 
LVL 61

Expert Comment

by:gheist
ID: 40622571
For DB inserts you must have SSD mirror of size of say daily to weekly data, so you insert in an instand and let DB chew it.
Immediate advice is to add half of free memory to DB doubling its memory.
0
 
LVL 45

Accepted Solution

by:
Kdo earned 250 total points
ID: 40623080
gheist is dead right here.  More memory.  

This kind of database isn't ideal.  It's not designed with any kind of performance in mind, it's simply a way of holding all of the combined data with relatively little (human) work.  I've a similar database at my office, though it's much smaller than what you're working with.

If your AIX server is an LPAR, have the admin crank up the memory at least until most of the loading is complete.  Have him be extremely generous.  (It can always be reduced later.)  That will be a huge help for some of the tasks, not so much for others.  A REORG or clustered index build will go a lot faster, a bulk load not so much.  With enough memory, a lot of the index pages will remain cached so that consecutive queries on related items (common columns) could see a huge performance boost.

In a lot of models, it's faster to drop the indexes, load the new data, and recreate the indexes.  It's probably not more effective with your data due to the high number of stores.  (Rebuilding indexes because the database is growing 0.1% is probably not best.)  Load as many of the stores as you can before you first build your indexes.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40623206
memory is more for retrieval. Say if you work on latest data it is best to fit most of that in RAM.
0
 
LVL 24

Assisted Solution

by:Tomas Helgi Johannsson
Tomas Helgi Johannsson earned 125 total points
ID: 40627723
Hi!

One thing that partitioning does to your queries are so called partition pruning
that is eliminating those partitions that are not needed, limiting the scanned indexes and table rows to
a small subset as possible ergo lower cpu and execution time as well as other lower resource usage (bufferpools, workfiles etc).
Partition keys needs to be choosen carefully in that  respect that those columns are used in all or most of your queries
and gives you the most evenly distribution of data as possible.

At work we have one table with around 2 billion rows in 207 partitions in a OLTP database on DB2 10 z/OS. :)
Queries run with optimal performance. :)

Regards,
    Tomas Helgi
0
 

Author Comment

by:Enyimba
ID: 40638707
Hello gheist,

Could you please expand on your comment about doubling memory. What do mean by "...With present configuration I'd suggest doubling the shared memory for DB"? Buy more memory or increase the DB shared memory config. parameter?

I don't understand what you are talking about in this statement: " HMT is bad as only one HMT thread serves interrupts from devices just disable it and use 20 normal CPUs". What is HMT is bad...?

Thanks

Enyimba
0
 

Author Closing Comment

by:Enyimba
ID: 40655375
Thanks guys
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

Problem description :  Some external hard disks / USB flash drives do not show actual space as mentioned in the factory settings. This is a common problem when you use an 8 GB USB drive to make it bootable to install a firmware/ driver on a serv…
How to update Firmware and Bios in Dell Equalogic PS6000 Arrays and Hard Disks firmware update.
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now