?
Solved

SYBASE - A very different duration for 'select count(*)' for 2 tables that have similar rowcount

Posted on 2006-03-27
6
Medium Priority
?
771 Views
Last Modified: 2012-06-21
Hi,

I have  2 tables with similar number of rows, but 1 took 5 mins only and the other took about 1.5 hours.
The tables about 7M rows.
The first table,  took 5 mins and it has non-clustered index.   The second table, took 1.5 hours, and it has a clustered index.

Showplan for the first table:  
 STEP 1
        The type of query is SELECT.
        Evaluate Ungrouped COUNT AGGREGATE.

        FROM TABLE
            q2cprpsl
        Nested iteration.
        Index : ixordprpsl2
        Forward scan.
        Positioning at index start.
        Index contains all needed columns. Base table will not be read.
        Using I/O Size 16 Kbytes for index leaf pages.
        With MRU Buffer Replacement Strategy for index leaf pages.

Showplan for the second table:
 STEP 1
        The type of query is SELECT.
        Evaluate Ungrouped COUNT AGGREGATE.

        FROM TABLE
            q2cquote
        Nested iteration.
        Table Scan.
        Forward scan.
        Positioning at start of table.
        Using I/O Size 16 Kbytes for data pages.
        With MRU Buffer Replacement Strategy for data pages.

Why the big difference and what can be done to improve the query on the second table.
Thanks.
0
Comment
Question by:maria_rossi
  • 3
  • 3
6 Comments
 

Author Comment

by:maria_rossi
ID: 16302699
Also, why did the 2nd table (the one with the clustered index) do a table scan?  Thanks.
0
 
LVL 24

Accepted Solution

by:
Joe Woodhouse earned 400 total points
ID: 16309489
Ok, firstly this is correct and expected, and it's all due to the structure of the two index types.

A non-clustered index is a B-tree that is a separate data structure. The leaves of the tree are index pages, not data pages, but they point to the data pages. The data pages are not sorted according to the nonclustered key. This is like a library catalogue on Title, Author, Subject, etc. When you've found your book in the catalogue, all you have is a pointer to an address on the shelves where you can go to find your actual book.

A clustered index is a B-tree where the leaves of the tree are the actual data pages, which are physically ordered by the clustered key. This is like the catalog number of the books - by the time you find a particular catalog number on the shelf, you physically have the book that you were looking for, and the books are ordered by that key.

If all you have is a clustered index, this is not only useless to help answer a COUNT(*) (I'm assuming no WHERE clause here), it's actually *worse* than useless. This is because the clustered index is not a separate data structure - it's an attribute of the table. To use this, we'd end up reading every index page as well as every data page - and if we're doing that, we may as well just read every data page, i.e. table scan.

But a non-clustered index... this *is* a separate data structure. Since there is a one to one correspondance between rows on the leaf index pages and data rows, we can just count the index pages instead and completely ignore the data pages. This is good because the index key is a lot narrower than the entire data row, so we can fit far more index rows on an index page than we can fit data rows on a data page. So counting index pages will be much faster.

What you've seen in your comparison is basically best case vs. worst case - and all correct and expected.

If you want to tune for this query specifically - add a nonclustered index. This will slow down all writes, some. I don't have enough information to decide whether you should also keep the clustered index or not.
0
 

Author Comment

by:maria_rossi
ID: 16310493
Joe,

Thanks for the response.   Now I understand why the big difference.  I will look into adding a nonclustered index.
1 more question, though.   I did an optdiag on the 2nd table and found that the data page cluster ratio is 0.37.  Did this have an effect on  the runtime?.   IE, even if it did a table scan, would it have run faster if the DPCR was higher?

Maria
0
New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 16312916
With your second question - maybe.

The DPCR is a measure of how "unclustered" the clustered index is. This can be either fragmentation in terms of rows not filling pages fully, or (for datapages or datarows locked tables) how many rows aren't "where they should be" according to the clustered index.

With a COUNT(*) without any WHERE clause though, we have to look at every row in the table anyway. It won't matter if rows are "out of order" (only possible for data-only locked - or DOL - tables) since we have to visit them all anyway.

It *could* make some difference if rows are not packing the pages fully, since it will affect how many pages we have to read to read those rows. And hmm, 0.37 is a pretty low DPCR, so yes, in best case you might see 2-3 times faster if the table was perfectly defragmented. I'm getting that estimate by dividing a perfect DPCR of 1.0 by your DPCR of 0.37... but you won't see 1.0. 0.9 isn't unreasonable to aim for though.

That DPCR suggests some table maintenance is in order. If this is a DOL table then it's time for a "reorg rebuild". If not, then dropping & recreating the clustered index will perfectly clean it up. Warning - this needs an exclusive table lock and free space equal to ~120% of the table size.
0
 

Author Comment

by:maria_rossi
ID: 16318576
You've answered my questions.  Thanks a lot for  the clear explanation.  This can be closed.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 16318720
Glad it was helpful! Good luck.
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Currently, there is an issue with being able to copy values from an external application to a dropdown list in Project Web Access (PWA).  The standard copy and paste methods don't seem to work properly. Here is a way to accomplish this task to s…
If anyone asked you to network diagram of the internet, it was drawn in the form of a fluffy cloud which further became known as cloud computing. Popularly cloud computing is defined as workloads that run over the internet in a commercial provider’s…
This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question