Updating Partition Key Columns

I am designing a database in IBM DB2 UDB version 8 for z/OS and OS/390. The database will have several very large tables that will require partitioning. We intend to use a single SMALLINT column, PARTN_ID (Partition ID), as the partition key column on each partitioned table. The column value would determine the partition to which INSERTed rows would be put.

Now, for my questions.

1. If we UPDATE column PARTN_ID for existing rows via static SQL, will DB2 automatically redistribute the updated rows to the partitions indicated by the new values in PARTN_ID, or will the redistribution only happen at the next REORG?

2. Will a RUNSTATS be required or merely advisable after the redistribution or REORG?

3. Will plans need to be rebound after the redistirbution or REORG?
Who is Participating?

Improve company productivity with a Business Account.Sign Up

momi_sabagConnect With a Mentor Commented:

all the things you are saying are correct.
db2 will move a row between partitions when you update the partitioning key.
it is recommended to run runstats regulary in order to keep the statistics updated, but you should consider the beneift when doing so. if you only move 5% of the rows from one partition to another partition, then maybe you should not run runstats, especially if your partitions are very big, and runstats will consume alote of resources.
if the total number of rows in a partition (either the one you delete from or the one you insert to) does not change in more than 10% i would not run runstats especially for that (assuming you are running runstats on some regular basis so the new statistics of the tables would be updated eventually).

beside for the case that you describe here, it is a good thing to rebind plans after you run runstatns and reorgs so db2 will be able to re-evaluate the access path

for the specific scenario you described,
when you set the partitioning key the way you said you do get the maximum flexability, and i understand why you would like to move some clients between partitions now and then, but maybe you should take another approache
in db2 for zos you can alter the partitioning key (does not matter if you used index based or table based partitioning) and after you do that run a reorg tablespace that will rebalance the rows.
this is a better way to do the update if you are about to update many rows, because when you update many rows using a sql statement you will cause massive logging, you will probably hit the maxium lock allowed and lock the entire partition, and it's a bit complex to implement commit logic into this massive update. instead if you rebalance using the reorg utility, it would go faster,
you would be able to recluster the data, reallocate the free space, and run runstats while running reorg.

so for the bottom line,
if you think you will be moving a large amount of rows between partitions, i would recommend choosing a different partitioning column (for example client code), and when you want to move the rows, instead of updating the partitioning key, just alter the partitioning key and run a reorg

momi sabag
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:

1)  The DBMS (in this case DB2) will locate data according to data partitioning key.  If the new key indicates that the row should exist in another partition, DB2 will move it for you.

2)  Not sure what you mean by redistribution.  If you mean will the statistics be updated simply because a row was moved to another partition, the answer is "no".

3) No.

Good Luck,

ghp7000Connect With a Mentor Commented:
yes, thats exactly correct. If you update the partitioning key, and the partitioning map says they key belongs in another partition, it will be moved there, but the runstats is now out of balance, because the row count is now different among the partitions. This may or may not matter, especially if you are frequently updating they keys-what matters is the current distribution among the partitions. When the row distribution gets out of balance, meaning the rows are not more or less evenly distributed among the partitions (for very large tables), that is when you need to to redistribute the partitions.
All this is possible, but it begs the question of why would you choose a column as the partitioning key if you know in advance that you are going to frequently update it? Not only will this create loads of additional work for you, but from a design point of view its a poor choice. The key should be chosen for its ability to satisfy sql where clause and group by queries and by its very nature should not be updateable, even though DB2 permts it. By frequently updating they key, you will run into poor access plans and loads of broadcasting as well as mkaing it next to impossible to co locate the users with their correct partitions, which will require additional network resources for the FCM blocks. So, I think you would be advised to rethink your strategy, your heading into a deep abyss from which there is no return.
dlmediciAuthor Commented:
Ok, please confirm my point-by-point understanding below...

1. Updating the partition key will cause the rows to redistribute (i.e., move) immediately.
2. A RUNSTATS would be required because the data is now located in different partitions.
3. No REORG is required.

Does not #2 above imply that a rebind of plans/packages would also be required, to take into consideration the new stats?

Now, to ghp7000's comments. We do not anticipate frequent updates of the partition key. However, it is possible that after some time we may discover that a client is now sending us more data than we anticipated, and that we would want to relocate his data into a an empty partition (assuming he originally shared a partition with other small clients), or spread his data over two or more parititions. In that case, the algorithm that populates PARTN_ID would be updated, static SQL UPDATEs of existing PARTN_ID columns for the client would be executed (thus moving the data), and RUNSTATS would be executed.

I agree that a partitioning key should satisfy well the SQL predicates, but we are also targeting overall performance. By creating a partition key that is flexible (via an algorithm) we can adjust performance for particular clients, without redefining partition ranges.

Does this make sense?
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.