index scan not happening everytime -- doing table scan

learning_sybase
learning_sybase used Ask the Experts™
on
Sybase
One of my SQL queries is picking up the index sometimes and sometime it does the table scan. I can see in the show plan , sometimes it picks up the index but other times does the table can.
I have run the update stats on the table many times and even done reorg rebuild but still the issue persists.
Can you please advise what all to check or make changes so that optimizer picks up the right index always. ( i dont want to force the index on the table)
Please advise.
Thanks
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
lcohanDatabase Analyst

Commented:
Did you looked at all the potential reasons why the index is not used in certain conditions like recompiled code or data not fitting the index pattern? Please have a look at all posible causes as described here http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1570/html/queryprocessing/BHCHADCJ.htm 
and evaluate you code when index is not being used to find the reason why that happens and obviously to find the fix/workarround.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
I'm not familiar with Sybase and how its optimizer works but in most databases it picks what it thinks is the best execution.

Not all full table scans (FTS) are bad and not all index uses are good.  There are many articles out there on this topic.  That is the reason most database vendors took the time to create optimizers.  They don't write code just for fun.

You mention forcing the index.  Try executing the query letting the optimizer do the FTS then force the index and compare the execution times and stats.  My guess is forcing index use will take longer.
Joe WoodhousePrincipal Consultant
Most Valuable Expert 2012

Commented:
You can't make the optimiser always use the index without forcing. That's the point of forceindex.

If the optimiser is not choosing the index then it is choosing a table scan for what it thinks is the right reason. Find that reason. It may even be correct. :)

Is your code being run directly or through a stored procedure?
Does it use local variables?
Is there any EXEC ( [...] ) involved? (Also called "exec immediate".)
What is different about the times it picks an index and the times it doesn't? Same data in the table? Same rows returned? Exactly the same things going on in the ASE at the same time? Are the query plans identical except for this one index becoming a table scan instead?

We'll start with those questions - next set of questions depends on the answers. It would also be helpful to see the SQL, the definitions of every table involved, and the query plans when it does and doesn't.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
Hi Joe,
Thanks for your reply. i was waiting for you only, as i have seen lot of your valuable inputs.

1. Yes the SQL is from stored proc.
2. The data in one of the columns which is the primary column of the index is skew. When the sql picks up the index scan the logical io's are very less but when table scan they are huge, which is obvious.
3. I need to know how to understand the below optdiag output , if you can help. This is the columns of the index which is not getting picked up.

Statistics for column group:                                                    "Site", "DealNew", "TicketNew", "System"
Last update of column statistics:                                               Jan 25 2019  9:28:30:263AM

     Range cell density:                                                        0.0000008299671250
     Total density:                                                             0.0000008299671250
     Range selectivity:                                                         default used (0.33)
     In between selectivity:                                                    default used (0.25)
     Unique range values:                                                       0.0000008299671250
     Unique total values:                                                       0.0000009318153440
     Average column width:                                                      default used (3.00)
     Rows scanned:                                                              1204867.0000000000000000
     Statistics version:                                                        4

There is one more index which gets picked up and which reduces the LIO too.

Statistics for column group:                                                    "Site", "System"
Last update of column statistics:                                               Jan 25 2019  9:28:31:263AM

     Range cell density:                                                        0.0000008299671250
     Total density:                                                             0.3946322370918582
     Range selectivity:                                                         default used (0.33)
     In between selectivity:                                                    default used (0.25)
     Unique range values:                                                       0.0000008299671250
     Unique total values:                                                       0.0909090909090909
     Average column width:                                                      default used (3.00)
     Rows scanned:                                                              1204867.0000000000000000
     Statistics version:                                                        4

the range and total density for this are very different. Can this be the cause ? how to understand these ?
Joe WoodhousePrincipal Consultant
Most Valuable Expert 2012

Commented:
If optdiag shows range cell density different to total density, that means the stats contain both range and frequency cells. (You will see rows in the histogram using both "<=" and rows separately using "<" and "=").

Specifically, range cell density shows the average duplicate values within the range cells only for this column(s), and total density shows average duplicates for the column(s). If you had no range cells at all for these stats, the range cell density would be zero.

That does support your idea that the data is quite skewed. I have a theory too now: can you share the exact "update statistics" command(s) you've run on this table?

I am wondering if you have used the default number of steps for the histograms. Generally a mix of both types of histogram cells implies more steps would have been helpful, but we don't want to have too many. 1000 is the sweet spot on big tables.

May I also see the values of the sp_configure parameters "histogram tuning factor" and "number of histogram steps"?

What I would dearly love to see, if you're running an ASE version that supports it, is whether hash-based stats do better for you. If you aren't running ASE 16.0+ then the next best thing would be to set "histogram tuning factor" to 20 (if it isn't already) and "number of histogram steps" to 50, then run "update index statistics [table]".

Author

Commented:
Hi, First of all thanks for the detailed info.

I ran update index statistics "tablename".

 sp_configure parameters "histogram tuning factor" and "number of histogram steps"  : both are 20

version :  ASE 15.7 SP136

I cannot change the server level parameters , is there any way number of histograms are changed for this table only and test it non prod .
environment ?




Statistics for column:                                                          "System"
Last update of column statistics:                                               Jan 25 2019  8:28:34:263AM

     Range cell density:                                                        0.0000000000000000
     Total density:                                                             0.5104560333032000
     Range selectivity:                                                         default used (0.33)
     In between selectivity:                                                    default used (0.25)
     Unique range values:                                                       0.0000000000000000
     Unique total values:                                                       0.1428571428571428
     Average column width:                                                      default used (3.00)
     Rows scanned:                                                              151269.0000000000000000
     Statistics version:                                                        4
     Sampled Density.
     Sampling used.

Histogram for column:                                                           "System"
Column datatype:                                                                char(3)
Requested step count:                                                           20
Actual step count:                                                              14
Sampling Percent:                                                               15
Tuning Factor:                                                                  20
Out of range Histogram Adjustment is DEFAULT.
Sticky sampling.

     Step     Weight                    Value

        1     0.00000000        <       "ABC"
        2     0.69294435        =       "ABC"
        3     0.00000000        <       "DEF"
        4     0.00325909        =       "DEF"
        5     0.00000000        <       "GHI"
        6     0.02945746        =       "GHI"
        7     0.00000000        <       "JKL"
        8     0.03542696        =       "JKL"
        9     0.00000000        <       "MNO"
       10     0.12212681        =       "MNO"
       11     0.00000000        <       "PQR"
       12     0.11516570        =       "PQR"
       13     0.00000000        <       "STU"
       14     0.00161963        =       "STU"



Histogram for column:                                                           "Site"
Column datatype:                                                                integer
Requested step count:                                                           20
Actual step count:                                                              5
Sampling Percent:                    site s                                           15
Tuning Factor:                                                                  20
Out of range Histogram Adjustment is DEFAULT.
Sticky sampling.

     Step     Weight                    Value

        1     0.00000000        <       1
        2     0.88179523        =       1
        3     0.00000083       <=       14
        4     0.00000000        <       21
        5     0.11820392        =       21

Can you please explain with above values ? I want this index to be used, with both columns. It uses sometimes and when used performance is better.
Joe WoodhousePrincipal Consultant
Most Valuable Expert 2012

Commented:
You can request steps in the update statistics command:

update index statistics [table] using 50 values

I see from the stats that you've used sampling also, which might be creating worse statistics. Try running it just as I have suggested here.

Ultimately it's still up to the optimiser to decide if the index looks useful or not...

Author

Commented:
what is the difference between :
update index statistics [table] using 50 values
and
update index statistics [table]

I do not want to take a risk and run the same in prod. Will do in non prod and test but that can take sometime.

Thanks for your help. Anything more you can add ?
Principal Consultant
Most Valuable Expert 2012
Commented:
The first form explicitly requests histograms be created with an initial 50 steps. The second form uses the global default (in your case 20 steps).

ASE will look at what happens when it generates stats using that many steps. It is allowed to expand any one step by up to the value of "histogram tuning factor" (also currently 20), meaning the total number of steps in the final histogram can be as many as num_steps * tuning_factor. By default that means 400 total. My suggested command can allow expansion up to 1000 steps.

That is an upper bound. ASE will only expand using histogram tuning factor if it improves the quality of the stats - this is mostly likely the case when stats are skewed, as you have suggested.

ASE can also use fewer than requested steps. If for example there is a very low cardinality column, perhaps a Y/N flag that doesn't allow NULLs, then there are at most only two unique values in that column and you will only see two histogram steps no matter how many you actually request. (If it allows NULLs you will see a third, showing the weight or what percentage are NULL.)

It is always possible to use "optdiag" to copy out stats and save them as a backup, if you're nervous about making live changes.

I think best practice for update statistics is to do this:

- Backup the stats using optdiag.
- When backup is confirmed successful, delete all current stats (so no stale stats can be left hanging around.
- Generate new stats along the lines as I've suggested: index stats, with 50 values, preferably with hashing, preferably with "out of bound" functionality.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial