SYBASE - A very different duration for 'select count(*)' for 2 tables that have similar rowcount


I have  2 tables with similar number of rows, but 1 took 5 mins only and the other took about 1.5 hours.
The tables about 7M rows.
The first table,  took 5 mins and it has non-clustered index.   The second table, took 1.5 hours, and it has a clustered index.

Showplan for the first table:  
        The type of query is SELECT.
        Evaluate Ungrouped COUNT AGGREGATE.

        FROM TABLE
        Nested iteration.
        Index : ixordprpsl2
        Forward scan.
        Positioning at index start.
        Index contains all needed columns. Base table will not be read.
        Using I/O Size 16 Kbytes for index leaf pages.
        With MRU Buffer Replacement Strategy for index leaf pages.

Showplan for the second table:
        The type of query is SELECT.
        Evaluate Ungrouped COUNT AGGREGATE.

        FROM TABLE
        Nested iteration.
        Table Scan.
        Forward scan.
        Positioning at start of table.
        Using I/O Size 16 Kbytes for data pages.
        With MRU Buffer Replacement Strategy for data pages.

Why the big difference and what can be done to improve the query on the second table.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

maria_rossiAuthor Commented:
Also, why did the 2nd table (the one with the clustered index) do a table scan?  Thanks.
Joe WoodhousePrincipal ConsultantCommented:
Ok, firstly this is correct and expected, and it's all due to the structure of the two index types.

A non-clustered index is a B-tree that is a separate data structure. The leaves of the tree are index pages, not data pages, but they point to the data pages. The data pages are not sorted according to the nonclustered key. This is like a library catalogue on Title, Author, Subject, etc. When you've found your book in the catalogue, all you have is a pointer to an address on the shelves where you can go to find your actual book.

A clustered index is a B-tree where the leaves of the tree are the actual data pages, which are physically ordered by the clustered key. This is like the catalog number of the books - by the time you find a particular catalog number on the shelf, you physically have the book that you were looking for, and the books are ordered by that key.

If all you have is a clustered index, this is not only useless to help answer a COUNT(*) (I'm assuming no WHERE clause here), it's actually *worse* than useless. This is because the clustered index is not a separate data structure - it's an attribute of the table. To use this, we'd end up reading every index page as well as every data page - and if we're doing that, we may as well just read every data page, i.e. table scan.

But a non-clustered index... this *is* a separate data structure. Since there is a one to one correspondance between rows on the leaf index pages and data rows, we can just count the index pages instead and completely ignore the data pages. This is good because the index key is a lot narrower than the entire data row, so we can fit far more index rows on an index page than we can fit data rows on a data page. So counting index pages will be much faster.

What you've seen in your comparison is basically best case vs. worst case - and all correct and expected.

If you want to tune for this query specifically - add a nonclustered index. This will slow down all writes, some. I don't have enough information to decide whether you should also keep the clustered index or not.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
maria_rossiAuthor Commented:

Thanks for the response.   Now I understand why the big difference.  I will look into adding a nonclustered index.
1 more question, though.   I did an optdiag on the 2nd table and found that the data page cluster ratio is 0.37.  Did this have an effect on  the runtime?.   IE, even if it did a table scan, would it have run faster if the DPCR was higher?

Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

Joe WoodhousePrincipal ConsultantCommented:
With your second question - maybe.

The DPCR is a measure of how "unclustered" the clustered index is. This can be either fragmentation in terms of rows not filling pages fully, or (for datapages or datarows locked tables) how many rows aren't "where they should be" according to the clustered index.

With a COUNT(*) without any WHERE clause though, we have to look at every row in the table anyway. It won't matter if rows are "out of order" (only possible for data-only locked - or DOL - tables) since we have to visit them all anyway.

It *could* make some difference if rows are not packing the pages fully, since it will affect how many pages we have to read to read those rows. And hmm, 0.37 is a pretty low DPCR, so yes, in best case you might see 2-3 times faster if the table was perfectly defragmented. I'm getting that estimate by dividing a perfect DPCR of 1.0 by your DPCR of 0.37... but you won't see 1.0. 0.9 isn't unreasonable to aim for though.

That DPCR suggests some table maintenance is in order. If this is a DOL table then it's time for a "reorg rebuild". If not, then dropping & recreating the clustered index will perfectly clean it up. Warning - this needs an exclusive table lock and free space equal to ~120% of the table size.
maria_rossiAuthor Commented:
You've answered my questions.  Thanks a lot for  the clear explanation.  This can be closed.
Joe WoodhousePrincipal ConsultantCommented:
Glad it was helpful! Good luck.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Sybase Database

From novice to tech pro — start learning today.