Solved

Performance optimization, Sybase

Posted on 2012-04-13
23
1,258 Views
Last Modified: 2012-06-27
I have to improve the query performance of the below query as it is taking more than 4 minutes to return data

LabResult has 46 million records!

Index
trnnx_IE1LabResult       is on two columns (PatientId, Test) and it is clustered


set showplan on
select      CollectionDate,
            ResultValue,
            TestAbbreviation,
            TestAbbreviation,
            NonMayoLab,
            LastChangedDate,
            Test,
            AccessionId,
            Patientid,
            Comment
from LabResult

where  
LastChangedDate >=  '10/01/2011'
and LastChangedDate <=  getdate()
and ResultValue is not null
and CollectionDate is not null

and patientid in
(id1,.....,id200)

------------------------------------------------------------------------------------------------------------------------------------

QUERY PLAN FOR STATEMENT 1 (at line 0).


    STEP 1
        The type of query is DECLARE.

Total estimated I/O cost for statement 1 (at line 0): 0.


QUERY PLAN FOR STATEMENT 2 (at line 2).
Optimized using Serial Mode


    STEP 1
        The type of query is SELECT.

    2 operator(s) under root

       |ROOT:EMIT Operator (VA = 2)
       |
       |   |RESTRICT Operator (VA = 1)(4)(0)(0)(0)(0)
       |   |
       |   |   |SCAN Operator (VA = 0)
       |   |   |  FROM TABLE
       |   |   |  LabResult
       |   |   |  Using Clustered Index.
       |   |   |  Index : trnnx_IE1LabResult
       |   |   |  Forward Scan.
       |   |   |  Positioning by key.
       |   |   |  Keys are:
       |   |   |    PatientId ASC
       |   |   |  Using I/O Size 16 Kbytes for data pages.
       |   |   |  With LRU Buffer Replacement Strategy for data pages.


Total estimated I/O cost for statement 2 (at line 2): 10178.

Let me know how can I proceed.
0
Comment
Question by:rbhargaw
  • 10
  • 5
  • 4
  • +1
23 Comments
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37846037
Hi!

For this query create an index on these columns
(LastChangedDate, patientid, ResultValue, CollectionDate)

And you will see some performance improvements.

Remember. Either use the columns in an index in your queries where clauses or
create index that matches those columns.

Regards,
     Tomas Helgi
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 37846106

1.

Put the ~200 patient IDs in a table (temp table is ok) and join to that table. Put a clustered index on the temp table too.

2.

Make sure you've run update index statistics on LabResult. On 46 million rows this won't be fast but it won't take all weekend either.

3.

Per the answer above, if you create that second index, make sure it is nonclustered.

4.

Why would ResultValue or CollectionDate ever be NULL? If they are, are they really adding any information here to this table? Consider making this into two tables, one containing only fully useful data, and the other containing rows that have a NULL in either of these columns. You could always build a VIEW that does a UNION ALL on the two tables if you really do need to be able to see both data sets at the same time.

5.

What's your performance target?
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 37846350
Clustered indexes should only be placed on columns that are perpetually ascending - I don't know much about your database, but I suspect that PatientID isn't a good candidate for clustering. If the clustering key isn't perpetually ascending, it can cause a large amount of data movement as you add values later, or can lead to a large amount of index fragmentation as values are added to the "middle" of the index, because they'll assumably spread across the PateintID range.

Though this article is Microsoft SQL Server specific, the considerations about clustered indexes hold across any database system, and you should consider the suggestions when choosing the cluster key:

http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx#p3

That said, the first expert recommended clustering starting with LastChangedDate, which is presumably much more upward-marching than just PatientID alone. While still not an ideal cluster key, it would help keep the data much more organized than the current key. In addition to better organization, the recommended key would quicken this query quite a bit because it makes filtering on PatientID quick as well.
0
 

Author Comment

by:rbhargaw
ID: 37846391
Joe,

I have recently joined this company 1 week back and this is the task I have been assigned to.
Actually this query is part of bigger problem. Basically there is one application which  calls a query to get the list of patient id and then each patient id will call with another query(labresult) to get the data.This is taking long time , around 8 minutes. So I suggested to combine 2 queries into Stored procedure so to get all data with one hit to database , rather than 200 times and then rewriting the poorly written 2nd query properly(labresult)

The second query (labresult) was used under  union at 5 places,so I can combine all 5 labresult table union into one temp table and then iterate through lesser number of rows, To get less rows is the query I have placed in this question!! So I was thinking of non clustered indexes on all columns as suggested and putting patientid in temp table and looping the query to get the lab results faster.So I am trying to get idea of ways to optimize the queries without touching much business logic . I need to check about NULLS in check value/Collection date, since this is old code so probably they had null values. I will check on Monday about the index statistics as I don't have weekend access :) Let me know if I need to post Stored proc too?
0
 

Author Comment

by:rbhargaw
ID: 37846399
ryan,

 I am assuming patientid is incremental but will let you know on Monday as I cant check in weekend and thanks for the link, I will go though them!
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 37846949
PatientID is probably incremented with each new patient, but in a table that contains test results, new results can belong to any patient, so the new rows aren't always added to the end of the table. In the "Patients" table, it might make sense to cluster based on PatientID since it's constantly increasing in that context and new rows aren't ever added using smaller PatientID values.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 37847515
Recall, too, that clustered indexes are the best for supporting joins, and I am guessing PatientID is the most frequent join column here.
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37847869
Hi!

If I would have recommended to change the clustering index
I would have said so in my first comment.  
I only recommended adding an index (non clustering) on these
columns (LastChangedDate, patientid, ResultValue, CollectionDate)  to speed up this and hopefully other queries.
So I think ryanmccauley has misunderstood my first comment and I would have to
agree with Joe_Woodhouse's last comment.

Regards,
    Tomas Helgi
0
 

Author Comment

by:rbhargaw
ID: 37853468
Even though I have created non clustered index on (LastChangedDate, patientid, ResultValue, CollectionDate), running them inside the stored procedure is calling the clustered index instead and giving below information.

 -- Optimized at runtime using Deferred Compilation.

If I run just the select query, I can use non clustered index, do I need to move the small query to another procedure?
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37853687
Either in create procedure or execute procedure
you should use the parameter "with recompile" if you have created
the procedure before the index to instruct the Adaptive Server to compile a new plan
which is used in subsequent executions.

Regards,
      Tomas Helgi
0
 

Author Comment

by:rbhargaw
ID: 37853870
Tomas,

I do have "with recompile" option in the create stored procedure
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 37854277
Try the other suggestions we've made in this thread. So long as update index statistics has been run recently on the table, ASE is telling you this is the best index to use. You're almost certainly looking at a SQL rewrite...
0
 

Author Comment

by:rbhargaw
ID: 37857543
As per the Database guys , update index statistics is run on every weekend.

I have asked them to put non clustered index, lets see if that improves any performance, will know by tomorrow, as the changes will be done in night
0
 

Author Comment

by:rbhargaw
ID: 37881125
Hey Guys, The index on these columns did not make much difference, Do you want me to put down the stored procedure?
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 37881851
The index may not change performance much, but can you get a new copy of the execution plan so you can see if it's even being used? I'm not sure about the specifics of Sybase, but I know SQL Server is particular about the tipping points where it shifts from an index seek/scan into a table scan, which can lead to some unexpected results. I'd be curious to see if the execution plan is the same with the index in place, meaning the DB is ignoring it - maybe a query hint or a slight restructure of the query could improve things.
0
 

Author Comment

by:rbhargaw
ID: 37882041
The newly created index is getting used, however I am checking the stored proc and problem could be in different area. There is one more table with 26 million rows which could be the bottleneck. I initially assumed that has only thousands of rows, should have checked in production!Sorry new to this environment and taking time.I will get back on this issue.
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 37882335
a table with 10,000x as many rows as you expected could definitely cause some problems :) That said, even billions of rows isn't too many as long as your query is selecting them appropriately - in this case, doing an index seek for the rows it wants as opposed to doing a index/table scan to find rows that match the criteria given.
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37890814
Hi!

I agree, joining two large tables together requires that suitable indexes are on both tables (A and B ) otherwise you will end up having a tablescan on the one that doesn't have the right index matching your query (let say it is the table B).
In the above case you have a tablescan in table B for each row that is fetched in table A
resulting in poor performance.

Regards,
    Tomas Helgi
0
 

Author Comment

by:rbhargaw
ID: 37893965
I have attached the Stored procedure, can you please have a look and let me know if this can be improved.

#temp_Patient  will contain around 150-200 rows
trntb_LabResult  contain 46 million rows
trntb_TxEventOrder  contain 26 million rows



Index on trntb_TxEventOrder

trnnx_IF1220TxEventOrder       PatientId      clustered
trnnx_IF1025TxEventOrder       ChecklistInstanceItemId      nonclustered
trnnx_IF1219TxEventOrder       TxEventId      nonclustered
trnnx_PKTxEventOrder       TxEventOrderId      nonclustered, unique
trnnx_TxEventOrderDate       OrderDate      nonclustered


Index on trntb_LabResult

trnnx_IE1LabResult       PatientId, Test      clustered
trnnx_IF1021LabResult       TxEventOrderId      nonclustered
trnnx_PKLabResult       LabResultId      nonclustered, unique
trnnx_TestAbbrevCollectionDate       TestAbbreviation, CollectionDate      nonclustered
trnnx_TestCollectionDate       Test, CollectionDate      nonclustered
trnnx_LastChangedDate       LastChangedDate      nonclustered
IDX_LastChangedDate       LastChangedDate, PatientId, ResultValue, CollectionDate      nonclustered
0
 

Author Comment

by:rbhargaw
ID: 37893973
Attaching the stored procedure
trnsp-GetPatientLabNotes-Comb-Mo.txt
0
 
LVL 24

Accepted Solution

by:
Joe Woodhouse earned 500 total points
ID: 37895429
General comments:

Don't use derived tables when you don't need to. (SELECT FROM (SELECT FROM ...) That forces a join order (optimiser must resolve the derived table first) which may not be ideal. Every time the code does that, it doesn't need to.

Which version of ASE is this? If you're using anything earlier than ASE 15.0.2, the optimiser isn't going to have much information about those temp tables and the indexes on them. If you're in an old version there are workarounds we should talk about.

Lots of hard-coded IN lists... I don't know how much that hurts, but I know it's a serial lookup on each row and for the rowcounts you're talking about, saving even 0.01s per row is something you want.

Can those UNIONs be made into UNION ALLs? Is there any possibility of duplicates between each arm of the UNIONs?

Suggest CREATE TABLE #temp table... INSERT SELECT be instead turned into SELECT INTO.

Those are the quick easy comments without knowing what the proc does and why. :)
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 37895986
Hi!

Do you have an index like this (PatientId, TxEventId) on trntb_TxEvent.
If not then I suggest you add one to improve the query that you use in
the insert in #temp_patient.

And if you are going to use this query

select      tlb.CollectionDate,
            tlb.ResultValue,
            tlb.TestAbbreviation,
            tlb.TestAbbreviation,
            tlb.NonMayoLab,
            tlb.LastChangedDate,
            tlb.Test,
            tlb.AccessionId,
            tlb.Patientid,
            tlb.Comment
           
from trntb_LabResult tlb, #temp_Patient tp
where  tlb.Patientid = tp.patientID
and tlb.LastChangedDate >=  @startDate
and tlb.LastChangedDate <=  getdate()
and tlb.ResultValue is not null
and tlb.CollectionDate is not null

 instead of the query that you first provided then the index which I suggested earlier should be a little bit modified
( patientid,LastChangedDate, ResultValue, CollectionDate) nonclustered

And on the table trntb_TxEventOrder there should be an index like this

(PatientId,OrderResultCollectionDate, OrderResultValue,CheckListTestCode) nonclustered

Regards,
    Tomas Helgi
0
 

Author Closing Comment

by:rbhargaw
ID: 38009209
The above recommendation helped a lot in reducing couple of minutes! I could not proceed much with the issue as this went into backlog from business but got a fair amount of idea from EE to work later on the issue. Thanks to everyone involved!
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Foreword In the years since this article was written, numerous hacking attacks have targeted password-protected web sites.  The storage of client passwords has become a subject of much discussion, some of it useful and some of it misguided.  Of cou…
Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now