• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 773
  • Last Modified:

SYBASE - A very different duration for 'select count(*)' for 2 tables that have similar rowcount

Hi,

I have  2 tables with similar number of rows, but 1 took 5 mins only and the other took about 1.5 hours.
The tables about 7M rows.
The first table,  took 5 mins and it has non-clustered index.   The second table, took 1.5 hours, and it has a clustered index.

Showplan for the first table:  
 STEP 1
        The type of query is SELECT.
        Evaluate Ungrouped COUNT AGGREGATE.

        FROM TABLE
            q2cprpsl
        Nested iteration.
        Index : ixordprpsl2
        Forward scan.
        Positioning at index start.
        Index contains all needed columns. Base table will not be read.
        Using I/O Size 16 Kbytes for index leaf pages.
        With MRU Buffer Replacement Strategy for index leaf pages.

Showplan for the second table:
 STEP 1
        The type of query is SELECT.
        Evaluate Ungrouped COUNT AGGREGATE.

        FROM TABLE
            q2cquote
        Nested iteration.
        Table Scan.
        Forward scan.
        Positioning at start of table.
        Using I/O Size 16 Kbytes for data pages.
        With MRU Buffer Replacement Strategy for data pages.

Why the big difference and what can be done to improve the query on the second table.
Thanks.
0
maria_rossi
Asked:
maria_rossi
  • 3
  • 3
1 Solution
 
maria_rossiAuthor Commented:
Also, why did the 2nd table (the one with the clustered index) do a table scan?  Thanks.
0
 
Joe WoodhousePrincipal ConsultantCommented:
Ok, firstly this is correct and expected, and it's all due to the structure of the two index types.

A non-clustered index is a B-tree that is a separate data structure. The leaves of the tree are index pages, not data pages, but they point to the data pages. The data pages are not sorted according to the nonclustered key. This is like a library catalogue on Title, Author, Subject, etc. When you've found your book in the catalogue, all you have is a pointer to an address on the shelves where you can go to find your actual book.

A clustered index is a B-tree where the leaves of the tree are the actual data pages, which are physically ordered by the clustered key. This is like the catalog number of the books - by the time you find a particular catalog number on the shelf, you physically have the book that you were looking for, and the books are ordered by that key.

If all you have is a clustered index, this is not only useless to help answer a COUNT(*) (I'm assuming no WHERE clause here), it's actually *worse* than useless. This is because the clustered index is not a separate data structure - it's an attribute of the table. To use this, we'd end up reading every index page as well as every data page - and if we're doing that, we may as well just read every data page, i.e. table scan.

But a non-clustered index... this *is* a separate data structure. Since there is a one to one correspondance between rows on the leaf index pages and data rows, we can just count the index pages instead and completely ignore the data pages. This is good because the index key is a lot narrower than the entire data row, so we can fit far more index rows on an index page than we can fit data rows on a data page. So counting index pages will be much faster.

What you've seen in your comparison is basically best case vs. worst case - and all correct and expected.

If you want to tune for this query specifically - add a nonclustered index. This will slow down all writes, some. I don't have enough information to decide whether you should also keep the clustered index or not.
0
 
maria_rossiAuthor Commented:
Joe,

Thanks for the response.   Now I understand why the big difference.  I will look into adding a nonclustered index.
1 more question, though.   I did an optdiag on the 2nd table and found that the data page cluster ratio is 0.37.  Did this have an effect on  the runtime?.   IE, even if it did a table scan, would it have run faster if the DPCR was higher?

Maria
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
Joe WoodhousePrincipal ConsultantCommented:
With your second question - maybe.

The DPCR is a measure of how "unclustered" the clustered index is. This can be either fragmentation in terms of rows not filling pages fully, or (for datapages or datarows locked tables) how many rows aren't "where they should be" according to the clustered index.

With a COUNT(*) without any WHERE clause though, we have to look at every row in the table anyway. It won't matter if rows are "out of order" (only possible for data-only locked - or DOL - tables) since we have to visit them all anyway.

It *could* make some difference if rows are not packing the pages fully, since it will affect how many pages we have to read to read those rows. And hmm, 0.37 is a pretty low DPCR, so yes, in best case you might see 2-3 times faster if the table was perfectly defragmented. I'm getting that estimate by dividing a perfect DPCR of 1.0 by your DPCR of 0.37... but you won't see 1.0. 0.9 isn't unreasonable to aim for though.

That DPCR suggests some table maintenance is in order. If this is a DOL table then it's time for a "reorg rebuild". If not, then dropping & recreating the clustered index will perfectly clean it up. Warning - this needs an exclusive table lock and free space equal to ~120% of the table size.
0
 
maria_rossiAuthor Commented:
You've answered my questions.  Thanks a lot for  the clear explanation.  This can be closed.
0
 
Joe WoodhousePrincipal ConsultantCommented:
Glad it was helpful! Good luck.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now