sybase Large I/O efficiency clustering ratio value keeps going down

In one my tables (not very big, just 700 MB) , the performance for some of the queries goes bad in few days. I did some analysis and found that one of the index on that table , its large i/o efficiency value keeps going down and becomes 0.4.  when we do full reorg rebuild it becomes 1 and performance improves.
I want to know why the large io efficiency values goes down. there are around 1million DML operations on that table on daily basis.

sybase experts please throw some light as always you do.
LVL 2
learning_sybaseAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
>> there are around 1million DML operations on that table on daily basis.

If you have 1 Million DML operations on a particular table on a daily basis, then it is recommended to do a Index Rebuild(if fragmentation higher) or Rebuild(if fragmentation lesser).
Once Rebuilt/reorganized, optimizer can generate a good Execution plan making the queries perform better..

To clarify more, its recommended to perform index maintenance on tables that have huge DML operations happening on a daily basis..
For tables with less activities, you can do the Index Maintenance once per week or so as it might be sufficient for those tables..
learning_sybaseAuthor Commented:
Why only one index getting impacted and it's not used much too.
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
>> Why only one index getting impacted and it's not used much too.

May I know how many indexes are there on that particular table..
For every DML operation like INSERT/UPDATE/DELETE, your indexes would be fragmented and if fragmented, it will affect all the future DML operations like INSERT/UPDATE/DELETE and your SELECT operations on those tables..
If you have a Clustered index along with few non clustered indexes, then for every Data INSERT/DELETE operations, the Clustered index will get affected along with mapping to the non clustered indexes.
If you have just UPDATE operations happening, then it will impact the Non clustered indexes which uses those particular columns..
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

Joe WoodhousePrincipal ConsultantCommented:
So let's go back to the fundamentals here.

"Large I/O efficiency" is a measure, from 0 to 1 (multiple by 100 to make it a percentage) of how suitable the data currently is for large I/Os, which is another way of saying sequential prefetches. This only applies to reads; writes are only ever done one page at a time.

The whole point of sequential prefetch is to do I/O with less overhead. Imagine you have a bathtub where the drain is totally blocked and you have to bail out the water by hand. You have the same total amount of water to move no matter how you do it, but obviously it will take you fewer trips if you use a bucket than if you use a spoon.

It is exactly the same with disk I/O. While some of the cost of I/O is simply the size of the data to be moved, some of the cost is also based on the number of I/Os (the number of trips to and from the bathtub in my analogy above). The reasons for this are complex and I won't dive into them fully here, but let's just say there is overhead at the ASE level, at the O/S level, and at the storage level. All that adds up to it making perfect sense to do larger and fewer I/Os instead of smaller and more numerous I/Os.

The best possible candidate for a large read is when we are reading pages in sequential order. If I know you are wanting to read every page in a online book through a browser, in order of page number, I can "fetch" several pages at once to have them waiting for you when you get to them. It is exactly the same with streaming audio or video - I prefetch data you don't need right now but will need very soon, on the theory that since you are watching video I know what video frames you want next.

It works the same in the database, but a database introduces more than one way of accessing the data. Unlike watching streaming video we might be accessing the data in ways that aren't sequential. It's even worse because the logical order of data doesn't necessarily match the physical order. This can happen in many ways - some of them obvious, like a non-clustered index where the logical order of the index leaf pages will almost never match the physical order of the data pages themselves. (If that isn't clear, imagine searching a library catalogue by title or author. The search results will be ordered by title or author, but the search results are simply an address pointing to where the books are physically stored on shelves... and those books will most definitely not by sorted on the shelves by title or by author.)

But it can also happen in ways that aren't as obvious. You might start out with data that is perfectly well ordered - perhaps because you can recreated the index or ran a reorg rebuild - but over time data can be moved around. Again this happens for complex reasons that depend on a lot of things like whether the table is allpages locked or data-only locked (datapages or datarows); whether the table has any columns that allow NULLs or are variable length; whether there are any UPDATEs or DELETEs; and how and where new rows INSERTed end up.

A very simple example: if there are any variable-length columns (varchars are the obvious ones, but any column that allows a NULL is now variable length since it can store either a NULL (one byte) or its full value (which is probably more than one byte). If you UPDATE a row in a way that change the size of that row, it might not fit anymore in the place it was previously stored. What happens next depends on the locking scheme (allpages or DOL) but the end result is that at least one row moves to a different page. And that means that the physical order of the data no longer perfectly matches the logical order.

INSERTs are more complicated, and again depend on indexing and locking schemes. This is already a long answer so I'll summarise and say that if we INSERT a new row then sometimes that must be inserted into the middle of existing data, and if there isn't room there for the new row, either existing rows move to make room for it (this is a page split on an allpages locking table), or the row ends up somewhere else it isn't "supposed" to be (this is DOL behaviour).

Any way you look at it, it is entirely correct and expected that as a table is written to, it will slowly become less and less perfect in terms of physical placement of each row. Rows will end up "out of order" (DOL) or rows will be moved to different pages (APL and DOL). And any time we have to allocate a new page to a table, for any reason at all, there is no guarantee that this new page contiguously follows on in sequence from the last page in the table.

All of this is one form or another of fragmentation. There are other types we haven't really touched on here. If you take away only one thing from this answer, it is that writing almost always increases fragmentation in a database table. If you do UPDATEs or DELETEs, it is worse than if you only do INSERTs, but INSERTs that "want" to go into the middle of a table (because of a clustered index) will also cause fragmentation.

As the table becomes more and more fragmented, its logical order matches its physical order less and less, and the pages themselves become less full (less packed) with data. Go back to the bathtub analogy... we might prefer to drain the water by moving it in a bucket, but if we can for some reason only fill the bucket halfway or less then we might think it wasn't worth using a large bucket at all and maybe we should use something smaller.

It's exactly the same in ASE. As the data fragmentation increases, sequential prefetch becomes less and less useful, and after a point the optimiser will just stop attempting it. The stat you're looking at is one measure of this fragmentation. Another is the data page cluster ratio, which applies to the table as a whole, and each index also has an index page cluster ratio.

One of your questions was why does this seem to happen mostly to one index? I would guess that it is because that index key includes one or more columns that are being frequently UPDATEd, or because it is a clustered index and therefore new rows INSERTed must be inserted according to where the clustered index key says they should be. It is possible that both these things are true: it is a clustered index that directs where new rows are INSERTed as well as having one or more columns that are being UPDATEd. And of course if you are frequently DELETEing rows too, the index must be updated to reflect that.

TL;DR - all writes make fragmentation worse in a table over time. How much worse depends on what kinds of writes, what kind(s) of index(es), how many writes, and the table's locking scheme. For these reasons good Sybase housekeeping involve monitoring space usage and fragmentation, and fixing this as required.
learning_sybaseAuthor Commented:
I have no words to say ...that this explanation is just awesome... Many thanks Joe.

also this is not a clustered index but non clustered index with data very much skewed on the 2 columns of this table. The index is not big too , around 10MB only. Table size is 500MB.
There are 4 indexes on this table and all are non clustered. The table does not has any clustered index.

Everyday i have to do re-org rebuild for this table and the performance improves. But only the Large io values i see become bad and that too for this one index (non clustered 10MB size only)

Thanks in advance ...for providing more information.
Joe WoodhousePrincipal ConsultantCommented:
You're welcome! I'm ex-Sybase and taught every Sybase class and helped write the Sybase ASE certifications from 12.0 onwards so this is kinda in my blood. :)

NC indexes will always show worse fragmentation than CL. This makes sense if you think about it - the CL index is like the library reference number in my library example, so we expect that the data should be physically sorted according to the CL index key(s). But NC indexes are like the indexes by title or author - we will just about never see the data physically sorted according to the NC index key(s).

This is so true that it's actually kinda rare to see the ASE optimiser try to do a large I/O when reading a table via a NC index. There is one big exception which is when the NC index "covers the query", which means every column appearing anywhere in the query is contained in the NC index, so we don't even bother going to the table at all and read all data purely from the NC index leaf level. But other than this one exception we are almost never able to prefetch via a NC index, or not as effectively - we probably know which index pages you want and we might prefetch those, but prefetching the data pages is unlikely to be very helpful.

If the NC index keys are very skewed that means that the optimiser statistics are very sensitive to data changes. If I have a million rows with the data evenly distributed it doesn't greatly matter what I do to the table as it will tend to remain evenly distributed. But data skew might change substantially based on a heavy load of writes.

If the table is only 500Mb then reorg will run quickly. If you're running ASE 16.0.x look into "reorg rebuild ... online" which makes it even more friendly. If you can live with a daily reorg then my advice is to just shrug and say "this is a table extremely sensitive to data changes". Some tables are like that. I've had plenty of sites where there was one magic table we had to rebuild each night due to locking schemes, indexes, and how volatile it was (how much it changed).

Honestly the large I/O effectiveness stat isn't that important. I wouldn't be basing my reorg decisions on its value, for example; I'd be looking at DPCR and IPCR as I mentioned above.

But truly we don't need to over-think this. If you can continue to reorg it daily without pain, just keeping doing that. It doesn't mean anything is broken. :)

If it is causing pain then we can talk about how to make it less intrusive. #1 method is the ASE 16+ "online" version of reorg rebuild. Even if you're not on ASE 16 there's a lot we can do to speed up reorg rebuild.

Summary: I don't think anything needs to be fixed here. Some tables are exceptionally sensitive to data change and that doesn't mean anything is necessarily wrong.
learning_sybaseAuthor Commented:
Summary: I don't think anything needs to be fixed here. Some tables are exceptionally sensitive to data change and that doesn't mean anything is necessarily wrong.
I agree but very difficult to explain this to client.
I am on 15.7 version and doing reorg rebuild daily and yes it takes just a minute to complete, but with time the table will grow and this may take more time. It may cause blocking to other processes not sure though.

Kindly let me know what are the ways to speedup on 15.7

Once again many thanks...its great to work with you.
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
>> There are 4 indexes on this table and all are non clustered. The table does not has any clustered index.

If you can create a clustered index on the table, it would help improve the performance a little bit compared to your current scenario..
Joe WoodhousePrincipal ConsultantCommented:
With respect to client expectations: all databases need some form of regular DBA housekeeping. Many sites just apply a one size fits all scheme and reorg everything at the same time. You can tell your client they're getting a more intelligent bespoke housekeeping which only does nightly tasks on the tables that need it, and so far only one needs it nightly.

If it's only taking minutes to complete right now you're in great shape. Things to look for later on if this starts becoming a problem include:

  • increasing "number of sort buffers" (don't go above 10,000, and you may need to increase procedure cache some also)
  • ensuring you have large I/O buffer pools in default data cache (or whatever cache this table is bound to)
  • confirming you're not hurting for memory generally

All of those are written up well in the manuals and are probably beyond the scope of this question.

Also with respect to my learned colleague who recommends a clustered index, we don't know enough to say if that would improve matters. If the clustered index key is strictly increasing then it will be no change at all from the current scenario. if the clustered index key distributes INSERTs and UPDATEs through the table, then per my explanation earlier that will actually make fragmentation problems worse, not better.

There are many reasons to prefer clustered over nonclustered indexes and indeed I think most tables should have one, but I don't think that is the answer to this specific problem.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.