Link to home
Start Free TrialLog in
Avatar of ser_berto
ser_bertoFlag for Germany

asked on

SQL 2005 Blocking issue in busy system

I have quite busy production system. I noticed that we have a lot of blockings during business hours. When I check processes I see a lot of information like:

PAGELATCH_UP                          2:7:1                                                                                                                                                                                                                                                          
LATCH_EX                              ACCESS_METHODS_SCAN_RANGE_GENERATOR (000000000BEE1760)                                                                                                                                                                                                          
LCK_M_U                               PAG: X:1:999999
And a lot of stuff like this:                          
                                                                                                                                                                                                             
LCK_M_U                               PAG: X:1:999999            
                                                                                                                                                                                                                                   
LCK_M_S                               PAG: X:1:999999                  
                                                                                                                                                                                                                             
LCK_M_S                               PAG: X:1:999999                        
                                                                                                                                                                                                                       
LCK_M_S                               PAG: X:1:999999                            
                                                                                                                                                                                                                   
LCK_M_S                               PAG: X:1:999999                                
                                                                                                                                                                                                               
LCK_M_S                               PAG: X:1:999999                          
                                                                                                                                                                                                                     
LCK_M_S                               PAG: X:1:999999                                  
                                                                                                                                                                                                             
LCK_M_U                               PAG: X:1:999999          
                                                                                                                                                                                                                                     
LCK_M_S                               PAG: X:1:999999          
                                                                                                                                                                                                                                     
LCK_M_S                               PAG: X:1:9999999

I am able to identify PAG: X:1:99999 but it points to different tables.

Number of blocked processes changes from XX to XXX causing in 4 times problems.
Any tips what I can do to fix this issue?
Avatar of Anuj
Anuj
Flag of India image

Make sure that your table has indexes (both clustered and non-clustered index), and your qureies can make use of the indexes. Find out the Blocked Process and tune it.
When you see the following, it says that there is some latch on tempdb
PAGELATCH_UP                          2:7:1    

So, is there any work you are doing with global tables in one process and trying to use the same in other process?
Avatar of ser_berto

ASKER

Tables have indexes both clustered and non-clustered. On tempdb there is some activity but main issue is in user DB. The problem is that it happenes with not only one table and this started 4 days ago. I think that there is a lot of inserts which causes contention on last page in each table.
Avatar of lcohan
How often do you run UPDATE STATISTICS and REINDEX (REBUILD with ONLINE=ON) all your tables (and databases) for that matter. Also jsut a simple SP_RECOMPILE on fiew tables may help flush bad plans from SQL cache
Do you have SQL own Performance dashboard reports installed? If yes they can help a lot to debug your issues:

http://blogs.technet.com/b/rob/archive/2009/02/18/performance-dashboard-reports-for-sql-server-2008.aspx
I don't use Update statistics ad hoc. Reindexing is done weekly basis. This is haevy stored_procedure load system. sp_recompile any of often used sp can cause a lot of issues.
Regarding Dashbord this is SQL 2005 system, so this article is not applicable.
You said "I think that there is a lot of inserts which causes contention on last page in each table."

Then: "I don't use Update statistics ad hoc."

But in my opinion that could be the issue if the performance problem is not permanent and if it is not caused by locking/blocking in your database.

"Regarding Dashbord this is SQL 2005 system, so this article is not applicable."
Again, in my opinion these SQL own reports can help you tremendously to find and solve issues in your prod environment without impacting at all running them live against production server(s)
And when I mentioned update statistics - I meant to run it only against the table(s) where you suspect the performance issue may be. Also if you just run "sp_recompile tablename" against only those tables thsi should not be an issue as all code related to those objects will be recompiled next time is executed. I do the same thing against a live SQL db behind a busy 24/7 e-commence system without impacting but improving performance.
I don't think that the issue is with not optimal execution plan. CPU is quite low (20 %) and it's difficult to find a pattern. Affected are both well indexed table with clustered index andd a lot of nonclustered and heap without indexes. Pages with latches wait changes frequently in blocked monitor.
From what you described you may have some IO issues - did you looked at average disk queue length where your sql files are? Are sql t-log files split from data files?
I think as well that it might be I/O related issue. All disks are on SAN. Of course data files and logs are splitted. I noticed that there is quite heavy activity on data file and some on tempdb. Additionally tempdb is splitted as well to 8 data files. My first guess it contention on last page in a table. This idea is quite good supported as both tables i found are heap and clustered index ascending based on ID. In heavy insert environment in both cases there will be contention on last page in table. I plan to add something to clustered index to avoid always hitting one page and to implement some indexes on heap table. THe issue is quite heavy to track as there is time where we have blocking and we don't have at all. Additionally application looses connection to database even thought there is completely no errors in SQL Server Log, only quite heavy blocking (200 processes) but without one blocking source and with a lot of dynamic (different blocking processes). If you have any idea or comments to my fix plan please let me know. I think as well about involving Ms as we have premier support but you know how much effort it requires and usually they propose what is well known and require to deliver tons of logs.
What FILLFACTOR you use if any? We started with 80 and now in some cases we have even 50/60 due to massive imports.
And are your tables partitioned? especialy thinking large volumes and large inserts. Just moving process off may not entirely fix your issue if its IO and/or index (clustered or not) related. You could also split all indexes out in their own filegroup(s) sitting on separate LUNs if possible.
Fill factor is 0. Table is not partitioned and is quite small. It inserts row in user session there and deletes afterwards. If I will add additional column to clustered index which one should it be:
Column A  1 million uniques
Column B  250 tousend
Column C 300

Clustered index [A,B,C], ID

In this case it would be necessary to change FF to different valuse to avoid page splitts.
"It inserts row in user session there and deletes afterwards"
I think this is your issue especialy on a table with clustered index right?
What datatype is the ID on it? hopefully int or bigint not GUID...but even with that I think that table will always look (heavyly) fragmented just because of the cluster and the fact that records are inserted/deleted therefor gaps in sequence for cluster right?
 That index will rebuild with every insert/delete so in my opinion that table needs very good index and query(ies) but not clustered. By very good index and queries I mean try to have as few indexes on it as possible (ideally only one) to fully cover all queries against it.
ID is INT of course. Table is small and usually keeps only latest clients activity.
I forgot to mention that after failover everything is fine for some time. Currently this issue stopped. (Last occurance oin WED) So it is difficult to investigate further at that moment.
ASKER CERTIFIED SOLUTION
Avatar of lcohan
lcohan
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial