Solved

DB2 Performance - SQL select taking 1 hr 10minutes to get a single valu(row). Can I get some helP?

Posted on 2012-03-23
16
1,306 Views
Last Modified: 2012-04-17
Hi all,

I have a table with 460+ million plus rows. This table is read "once" a day to retrieve a single maxrownum used for downstream processing. The problem I am having is that this select SQL is taking about 1 hr 10-20 minutes to retrieve this one value! How can that be? I have only three indexes one of which is keyed off of the maxrownum.

Can anyone suggest/recommend a performance tip/strategy that I can use to improve my retrieval process? This may not be the best plan, but I was thinking perhaps I could create a tiny whinny table to store the maxrownum value and read from the small table for theis value. But, then I have to introduce another table, additional I/O, extra storage and maintenance, etc, etc in to the equation...So, what does anyone think about that?

Please do send in your comments, suggestion and recommendations. Articles and other pointers are very much welcomed.
0
Comment
Question by:Enyinnaya
  • 4
  • 3
  • 3
  • +4
16 Comments
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 200 total points
ID: 37758158
Hi Enyinnaya,

Several questions.  :)

Is there a primary key?  
Is there an identity column?  Is there an index on the identity column?
Are the statistics on the table current?
What column(s) are indexed?


Kent
0
 
LVL 18

Expert Comment

by:daveslash
ID: 37758351
Greetings, Enyinnaya

Let me see if I understand. You have a table with over 460 million rows, and you’re doing a SELECT max(someColumn) on that table, and it takes over an hour for the result to come back, right?

If you have an index on that table over the specified column, then the performance you’re seeing doesn’t make sense.

Perhaps the query doesn’t see the index. I know on DB2 for i, the sort-sequence of the index has to match the sort-sequence of the SQL session. Is there a similar concept in DB2 for Unix?

-- DaveSlash
0
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 200 total points
ID: 37758570
Hi Dave,

There's no equivalent on LUW.

My guess is that there's an identity column that's not indexed.  Most people don't realize that an identity column in DB2 isn't automatically indexed.  (Nor is there a constraint against duplicate values unless there is a unique index on the column, but that's another discussion.)


Kent
0
 

Author Comment

by:Enyinnaya
ID: 37759333
Hello kdo, all

Some info about my table definition:

"ROW_NUM" DECIMAL(15,0) NOT NULL GENERATED BY DEFAULT AS IDENTITY (
                    START WITH +1
                    INCREMENT BY +1
                    MINVALUE +1
                    MAXVALUE +999999999999999
                    NO CYCLE
                    CACHE 20
                    NO ORDER ) ,

Here is what I have for indices:

CREATE INDEX "SIEBEL  "."RTL_HIST_IDX" ON "SIEBEL  "."RTL_TRAN_HIST"
                ("ROW_NUM" ASC,
                 "TR_REC_TYPE" ASC,
                 "TR_TR" ASC,
                 "TR_FIELD_NO" ASC,
                 "TR_KEY" ASC)
                COMPRESS NO DISALLOW REVERSE SCANS;


CREATE INDEX "SIEBEL  "."RTL_HIST_J1" ON "SIEBEL  "."RTL_TRAN_HIST"
                ("ROW_NUM" ASC)
                COMPRESS NO DISALLOW REVERSE SCANS;


CREATE INDEX "SIEBEL  "."RTL_TRAN_WK" ON "SIEBEL  "."RTL_TRAN_HIST"
                ("TR_KEY" ASC)
                COMPRESS NO DISALLOW REVERSE SCANS;

ALTER TABLE "SIEBEL  "."RTL_TRAN_HIST" ALTER COLUMN "ROW_NUM" RESTART WITH 461899799;

Observe none is unique or primary key.
0
 
LVL 18

Expert Comment

by:daveslash
ID: 37759374
Just for kicks, try adding a primary key on the identity column and see if it makes your query faster.

e.g.
alter table MyTable
add constraint MyTable0
primary key ("ROW_NUM")

Open in new window

0
 

Author Comment

by:Enyinnaya
ID: 37760070
...and here is the insert, select statement that references the big table with row_num as identity.

insert into maxrownum (trgt_tbl, run_dt, start_row) select 'BIG_TABLE', current date, coalesce(max(row_num),0) from BIG_TABLE"
0
 
LVL 1

Assisted Solution

by:AngocA
AngocA earned 100 total points
ID: 37760882
Hi,

There are several options you can try to improve your performance:
You can change your isolation level, probably you are using a high level, and there are many locks in the table, making the process slower: Select .. From ... WITH ur
Put a lock at table level, instead of putting locks in several rows.
Create a Materialized Query Table to compute that value asynchronically.
Check the disks where the table is stored, more spindles is better to retrieve the data.
If you have to read the whole table make you are retrieving the values at a good rate. You should check the Extent size, the prefetch size (according to the strip), and even the page size.
Make sure the agent that is attending your application is not performing a lot of IO directly, and all rows are retrieved asynchronically with IO servers (Prefetchers)
The buffer pool associated with that table should have enough space to reduce page cleaning synchronically.
Probably, your operation needs sort, make sure the sortheap is enough, or if this sort becomes overflow, make sure the temporary table space is over fast disks.
Check the access plan for bottlenecks, which is the process with more timerons? Also, try to chanche to optimization level, to see if other strategies are better, or you could force te optimizer to use a specific strategy at a certain step by creating an optimizer profile.
For a good analysis (for the optimizer) keep statistics updated
Does your calculation could be realized with the values of the index? if it does, make sure the access plan is doing an index sargable, instead of going to the table.
Is the data reorged?

These are some ideas to improve your query.
0
 
LVL 45

Expert Comment

by:Kdo
ID: 37761228
Hi Enyinnaya,

This just doesn't look right.  Can you post an explain plan of this query?


SELECT MAX(ROW_NUM) FROM "SIEBEL"."RTL_TRAN_HIST"

Open in new window



Thanks,
Kent
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 24

Assisted Solution

by:Tomas Helgi Johannsson
Tomas Helgi Johannsson earned 100 total points
ID: 37771840
Hi!

For a table as big as this 460+ million rows and having indexes where row_num is in ASC order then issuing MAX(row_num) is not a surprise that the time is more than 1 hour when DB2 reads the index from top to bottom.
If you create the index where row_num is in DESC order then you would (should) see dramatic decrease in the query time.

Regards,
    Tomas Helgi
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37771857
Either change the row_num from ASC to DESC
or
allow reverse scan on the index.

Regards,
    Tomas Helgi
0
 
LVL 45

Expert Comment

by:Kdo
ID: 37771937
Hi Tomas,

I think that he's got more issues than just this.  Selecting the highest value in an indexed column should be the (relatively) simple matter of the DBMS following the "right son" pointer of each leaf node.  If the indexes are anywhere near balanced, the number of pages examined will be around log(f(n)).  Since the query is taking so long, either a full table scan is being used or database activity has resulted in the index being horrible out of balance.  I'm hoping that the explain plan shows index usage.  If so, recreating or reorganizing the index should then solve the performance issue, and improve virtually every query that uses the column.

But you're right.  Sorting descending will get the highest value from a single I/O (the root index page).  As long as that's compatible with his other usage of the table.  :)


Kent
0
 
LVL 16

Accepted Solution

by:
theo kouwenhoven earned 100 total points
ID: 37774224
Hi Enyinnaya,

A Select within an Insert is working a little diferent then an "standalone" Select, don't aske me why, I saw that several times. Normaly SQL is taking the optimal index, within an insert, this isn't allways the case.

You can try these options:
- Remove the coalesce() part, I don't think you need it with 460+ mln rows
- Use an Order by to force SQL to look to that index first.
- Try to use the "FETCH FIRST 1 ROW ONLY"
- Create a View on that record. accessing the table will use combination of indexes and views to get the optimal result.

(btw, did you also try to run the Select without the insert part? what was the result?)

Regards,
Murph
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37776017
Hi!

Doing a query without a where clause or FETCH FIRST statement
ALWAYS results in a TABLESCAN no matter what the size of the table is.

Ergo large table results in long query time.

Regards,
     Tomas Helgi
0
 
LVL 4

Expert Comment

by:kemot1000
ID: 37834900
Hi,

To save storage (and gain performance in reorg and runstats) loose this index as it's surplus:

CREATE INDEX "SIEBEL  "."RTL_HIST_J1" ON "SIEBEL  "."RTL_TRAN_HIST"
                ("ROW_NUM" ASC)
                COMPRESS NO DISALLOW REVERSE SCANS;

every query will use this index if you remove above as ROW_NUM is the first index column.

CREATE INDEX "SIEBEL  "."RTL_HIST_IDX" ON "SIEBEL  "."RTL_TRAN_HIST"
                ("ROW_NUM" ASC,
                 "TR_REC_TYPE" ASC,
                 "TR_TR" ASC,
                 "TR_FIELD_NO" ASC,
                 "TR_KEY" ASC)
                COMPRESS NO DISALLOW REVERSE SCANS;

And for sure recreate your indexes to allow reverse scan.


T.
0
 
LVL 4

Expert Comment

by:kemot1000
ID: 37834919
I would change your insert to:

insert into maxrownum (trgt_tbl, run_dt, start_row) select 'BIG_TABLE', current date, row_num from BIG_TABLE order by row_num desc fetch first row only
0
 

Author Closing Comment

by:Enyinnaya
ID: 37859001
Excellent responses all
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Java performance on Solaris - Managing CPUs There are various resource controls in operating system which directly/indirectly influence the performance of application. one of the most important resource controls is "CPU".   In a multithreaded…
Why Shell Scripting? Shell scripting is a powerful method of accessing UNIX systems and it is very flexible. Shell scripts are required when we want to execute a sequence of commands in Unix flavored operating systems. “Shell” is the command line i…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now