COUNT(1) Performance

1) We have a huge table with a 4 column composite PK and 2 more columns.
2) SELECT COUNT(1) FROM MyTable; is fast in iseries AS/400 V5R3 20 year old OS and DB i.e. 5 seconds
3) But it is extremely slow in Linux 10.2 DB2 UDB 9.1 both of which are modern i.e. 100 seconds.
4) Data is this table is about 80,00,000 rows. Can you explain me various reasons for this strange behaviour?
LVL 17
k_murli_krishnaAsked:
Who is Participating?
 
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Hi k_murl_krishna,

DB2 will actually count the items, either via an appropriate index or by doing a full table scan.  Since the table has a primary key, DB2 should do an index scan and count the number of items based on the number of items in the index.

As far as I know, the statistics are used only to allow DB2 to optimize your query.  DB2 does not rely on the statistics values as row counts.


Kent
0
 
MrPhenixConnect With a Mentor Commented:
This could have several reasons:

DB2 is more advanced but not necessary better in speed on huge tables.

Is the new db an exact copy? (meaning, indexes, PK, constraints, is the same)
Is there any statistics generated on the old vs the new server?
0
 
k_murli_krishnaAuthor Commented:
Both database schemas are exact copy of each other. Do REORG and RUNSTATS not done quickly enough create the 5 and 100 secs difference? We found that there is no exclusive lock which got released later giving rise to so much wait time. Could it be more queries are already fired on that table and there is I/O contention now that COUNT(1) is also fired. Can hardware configuration differences create so much difference?

Can there be any other special reasons for COUNT(1) on SUSE Linux 10.2 DB2 9.1 to be muich slower than iseries AS/400 DB2 V5R3? Could OS or its state/status have an impact? Could the linkage between a particular OS and a particular DB2 make the difference? Please think from all possible angles.
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
MrPhenixCommented:
Do you still have the ISeries? Could you in that case do a explain plan on both databases?
Or if you just have the new one, could you please post the explain plan?
0
 
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Hi k_murli_kris,

Depending on your linux server, you may have a huge difference in I/O performance.  IDE and SATA drives usually can't match the performance of SCSI or the native AS/400 hardware.

That said...

Why are you doing a count(1)?  There is a subtle difference between count(*) and count(1) in that count(*) counts the number of rows, count(1) should count the number of non-nulls.  I've not seen any IBM documentation that describes the difference, but Oracle warns against using count(1) as it forces a table scan.  Oracle and DB2 are different entities so the warning may not apply to DB2.  Still, it's worth checking out.

On UDB/LUW count(*) should use the index(es) and return a count quite quickly.  count(1) has a slightly different evaluation process, but should come up with the same number of rows.

Try using a count(*) instead of a count(1).


Good Luck,
Kent
0
 
momi_sabagConnect With a Mentor Commented:
if you table only has 80,000 rows , and it performs an index scan, it shouldn't take 100 seconds, no matter how lame your hardware is
you have some other problem there,
are you sure it's not a lock problem?
did you try to add "with ur" at the end of the query?
0
 
Kent OlsenData Warehouse Architect / DBACommented:
Hi momi,

I was confused by his row count.  I took it to be 80,000,000 -- a rather large table.

You're right though.  Unless he has large, poorly blocked rows, 80,000 should count quite quickly.


Kent
0
 
cmaslenConnect With a Mentor Commented:
In addition to the above advice particularly in getting the query plans, have a look at the following link on how DB2 handles parallelism and how you can see if you're taking advantage of it. Despite being 20 years old there is a good chance the AS/400 is optimally configured for DB2 and your Linux box is not.

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/c0005287.htm

Christian.
0
 
k_murli_krishnaAuthor Commented:
Thanks all of you. I have no admin access to either DB2 on Linux or AS/400. But in local database schema table with as many records, I did db2expln on following 3 queries and I got same cost and identical explain plans with index scan occuring over composite PK.

SEELCT COUNT(1) FROM <schema>.<table>;
SEELCT COUNT(*) FROM <schema>.<table>;
SELECT COUNT(first_col_of_pk) FROM <schema>.<table>;
0
 
k_murli_krishnaAuthor Commented:
Also, the problem is resolved. When I did COUNT(1) it took 45 secs and after REORG on this local table it just took 3 secs. This means the table is fragmented or the index since index scan is what is happening?

Also, does REORG TABLE <Schema>.<Table>; reorganize the indexes of the table as well? Or we need to specify like for runstats, RUNSTATS ON TABLE <Schema>.<Table> WITH DISTRIBUTION AND DETAILED INDEXES ALL; How much & how does runstats influence queries in general and in particular this simple COUNT(1) one?
0
 
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Hi k_murli_kris,

In this case, a full index scan of the primary key may be almost as costly as a full table scan.  The table has only 6 columns, 4 of which are the primary key.  Depending on the makeup of the key and the two data columns, there may be as many index pages as there are data pages.  If so, the index scan is just as resource demanding as a table scan.

Can you verify the row count?  Is it 80,000 or 80,000,000?


Kent
0
 
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Wow.

You really did have a fragmented table.  I had thought that the LUW version was built from the AS/400 version so fragmentation wouldn't have been an issue.  (Obviously, I was wrong....)

The REORG process reorganizes the table and indexes.  One of the options is to reorganize just the indexes or a subset of them.

RUNSTATS collects statistics about the table and stores it in a "header" area.  DB2 later uses these statistics to determine how to best process a query.  The collected statistics could cause DB2 to prefer index1 over index2, or even to perform a full table scan if it determined that the use of an index is redundant.


Kent
0
 
tliottaConnect With a Mentor Commented:
Minor note -- If V5R3 is the operating system release level on the AS/400, then "20 years old" is close to a couple decades off the mark. For V5R3, going back even five years puts it somewhere close to IBM development work.

Now, if the age refers to when the tables, views and indexes were generated, then even those have been somewhat updated automatically by upgrades as the system was brought forward to V5R3. Further, if those are indeed 20 years old, then there is no possibility of the two databases being identical. A 20-year old set of files on an AS/400, even one at V5R3, was not a SQL database.

It's rare to find even 5-year old SQL databases because the native (non-SQL) data management was easier for most developers and could perform much better in most circumstances. Nor did SQL have a number of capabilities that native data management could provide. (It still doesn't, but other considerations started to outweigh them.)

In any case, be aware that benchmark comparisons between the two might be far from what could be achieved. Either or both platforms can almost certainly do better than what's shown.

Tom
0
 
k_murli_krishnaAuthor Commented:
Thanks all.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.