Solved

Updating Partition Key Columns

Posted on 2006-10-30
4
2,526 Views
Last Modified: 2008-01-09
I am designing a database in IBM DB2 UDB version 8 for z/OS and OS/390. The database will have several very large tables that will require partitioning. We intend to use a single SMALLINT column, PARTN_ID (Partition ID), as the partition key column on each partitioned table. The column value would determine the partition to which INSERTed rows would be put.

Now, for my questions.

1. If we UPDATE column PARTN_ID for existing rows via static SQL, will DB2 automatically redistribute the updated rows to the partitions indicated by the new values in PARTN_ID, or will the redistribution only happen at the next REORG?

2. Will a RUNSTATS be required or merely advisable after the redistribution or REORG?

3. Will plans need to be rebound after the redistirbution or REORG?
0
Comment
Question by:dlmedici
4 Comments
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 100 total points
ID: 17837080

1)  The DBMS (in this case DB2) will locate data according to data partitioning key.  If the new key indicates that the row should exist in another partition, DB2 will move it for you.

2)  Not sure what you mean by redistribution.  If you mean will the statistics be updated simply because a row was moved to another partition, the answer is "no".

3) No.


Good Luck,
Kent


0
 
LVL 13

Assisted Solution

by:ghp7000
ghp7000 earned 200 total points
ID: 17837332
yes, thats exactly correct. If you update the partitioning key, and the partitioning map says they key belongs in another partition, it will be moved there, but the runstats is now out of balance, because the row count is now different among the partitions. This may or may not matter, especially if you are frequently updating they keys-what matters is the current distribution among the partitions. When the row distribution gets out of balance, meaning the rows are not more or less evenly distributed among the partitions (for very large tables), that is when you need to to redistribute the partitions.
All this is possible, but it begs the question of why would you choose a column as the partitioning key if you know in advance that you are going to frequently update it? Not only will this create loads of additional work for you, but from a design point of view its a poor choice. The key should be chosen for its ability to satisfy sql where clause and group by queries and by its very nature should not be updateable, even though DB2 permts it. By frequently updating they key, you will run into poor access plans and loads of broadcasting as well as mkaing it next to impossible to co locate the users with their correct partitions, which will require additional network resources for the FCM blocks. So, I think you would be advised to rethink your strategy, your heading into a deep abyss from which there is no return.
0
 

Author Comment

by:dlmedici
ID: 17837698
Ok, please confirm my point-by-point understanding below...

1. Updating the partition key will cause the rows to redistribute (i.e., move) immediately.
2. A RUNSTATS would be required because the data is now located in different partitions.
3. No REORG is required.

Does not #2 above imply that a rebind of plans/packages would also be required, to take into consideration the new stats?

Now, to ghp7000's comments. We do not anticipate frequent updates of the partition key. However, it is possible that after some time we may discover that a client is now sending us more data than we anticipated, and that we would want to relocate his data into a an empty partition (assuming he originally shared a partition with other small clients), or spread his data over two or more parititions. In that case, the algorithm that populates PARTN_ID would be updated, static SQL UPDATEs of existing PARTN_ID columns for the client would be executed (thus moving the data), and RUNSTATS would be executed.

I agree that a partitioning key should satisfy well the SQL predicates, but we are also targeting overall performance. By creating a partition key that is flexible (via an algorithm) we can adjust performance for particular clients, without redefining partition ranges.

Does this make sense?
0
 
LVL 37

Accepted Solution

by:
momi_sabag earned 200 total points
ID: 17840464
Hi

all the things you are saying are correct.
db2 will move a row between partitions when you update the partitioning key.
it is recommended to run runstats regulary in order to keep the statistics updated, but you should consider the beneift when doing so. if you only move 5% of the rows from one partition to another partition, then maybe you should not run runstats, especially if your partitions are very big, and runstats will consume alote of resources.
if the total number of rows in a partition (either the one you delete from or the one you insert to) does not change in more than 10% i would not run runstats especially for that (assuming you are running runstats on some regular basis so the new statistics of the tables would be updated eventually).

beside for the case that you describe here, it is a good thing to rebind plans after you run runstatns and reorgs so db2 will be able to re-evaluate the access path

for the specific scenario you described,
when you set the partitioning key the way you said you do get the maximum flexability, and i understand why you would like to move some clients between partitions now and then, but maybe you should take another approache
in db2 for zos you can alter the partitioning key (does not matter if you used index based or table based partitioning) and after you do that run a reorg tablespace that will rebalance the rows.
this is a better way to do the update if you are about to update many rows, because when you update many rows using a sql statement you will cause massive logging, you will probably hit the maxium lock allowed and lock the entire partition, and it's a bit complex to implement commit logic into this massive update. instead if you rebalance using the reorg utility, it would go faster,
you would be able to recluster the data, reallocate the free space, and run runstats while running reorg.

so for the bottom line,
if you think you will be moving a large amount of rows between partitions, i would recommend choosing a different partitioning column (for example client code), and when you want to move the rows, instead of updating the partitioning key, just alter the partitioning key and run a reorg

momi sabag
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Recursive SQL in UDB/LUW (you can use 'recursive' and 'SQL' in the same sentence) A growing number of database queries lend themselves to recursive solutions.  It's not always easy to spot when recursion is called for, especially for people una…
Recursive SQL in UDB/LUW (it really isn't that hard to do) Recursive SQL is most often used to convert columns to rows or rows to columns.  A previous article described the process of converting rows to columns.  This article will build off of th…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now