Solved

CPU & I/O Affinity Masking

Posted on 2014-01-14
6
1,910 Views
Last Modified: 2014-02-25
I have a Server that hosts 2 instances of SQL Server 2005 SP4 on Windows 2003 R2 SP2.

It is a 32 bit physical machine with 16GB of RAM and 16 processors.

The first instance (the default instance) is for a ERP solution.  It should be considered the "primary" instance that is given the most resources.  It is mostly OLTP with mainly short transactions.  There's some batch processing but not as  much as the OLTP component.

The second instance is a named instance an supports another application that presents an alternate front end to the ERP solution.  It pulls all sorts of data from the ERP database daily, and there seems to be all sorts of read activity on this database the rest of the day (this comes from another vendor and we don't know too much about the internals of how this app works).

The secondary app is CPU intensive as well.

In order to separate out the resources on the box and try to partition out the secondary database as much as possible, affinity masking was set up for the named instance (this was setup 3 or 4 years ago, but recent calls about performance have me researching this more now).  This instance is limited to 4 processors.  It also runs in SQL Server 2000 compatibility mode (the version of the app on this one is a real dinosaur).

The default instance is currently set up to automatically set  CPU and I/O Affinity masking for all processors.   SQL Server priority boosting is turned on (and based upon research today will be turned off during our maintenance window this weekend.

The secondary instance has both CPU and I/O affinity masks set for CPU's 12-15 (leaving 0-11 for everything else).

For a start, from what I can see, Affinity masking hard binds schedulers to specific CPU's.  However there's a startup trace flag 8002 that seems like it might address part of this by letting the schedulers reassign load among the processors defined in a affinity mask.

The other thing that seems to be bad is that we have both I/O and CPU checked for the 4 CPU's (which seems to be a bad thing because you then have I/O and CPU contention in the affinity group).

What I'm looking for here, then, is advice on how to configure this secondary instance so that it can't hog all the CPU's on the server (because it is a CPU pig) without changing the code (it's a third party product).  Traceflag 8002 seems like it might be helpful, but with the 4 processors I am allowed to spread out over this instance, how should I assign the CPU's ?  3 for CPU and 1 for I/O ?  2 and 2 ?

The docs say that if you're using CPU affinity you also need to use I/O affinity and just make sure that for each CPU, they are both off, or one on and one off.  I was thinking of trying to set up individual CPU's for processing and auto assigning I/O for the instance but that seems to fly in the face of the docs.

Or am I reading that wrong and I can set up just CPU masking for CPU load and leave I/O affinity as system defined for both ?  The presence of two check boxes (auto allocate CPU and auto allocate I/O) seems to indicate that this may be OK.

Also, is affinity masking a "hot" change (taking affect as soon as the setting is made) or does it require a restart ?  The docs (link below) seem to reference that a restart is required.

References:
Affinity Mask option: http://technet.microsoft.com/en-us/library/ms187104%28v=sql.90%29.aspx
Affinity I/O Mask option: http://technet.microsoft.com/en-us/library/ms189629%28v=sql.90%29.aspx
Trace Flag 8002: http://support.microsoft.com/kb/818769

My gut feel here is to do the following (but I lack experience in this area to confirm that it is the right solution and I only get one maintenance window a month):

Default instance: Turn off Priority Boost, leave both CPU and IO set to auto affinity
Named instance:  Set I/O to auto affinity, leave CPU set for CPU's 12-15, set startup flag 8002

(Suggestions to tune code, modify indexes etc aren't the answer here.  Both applications are pre packaged from third party vendors and changing code or database structures is just not an option).

Any assistance here is greatly appreciated.
0
Comment
Question by:Steve Wales
  • 3
  • 3
6 Comments
 
LVL 42

Expert Comment

by:EugeneZ
ID: 39781880
< how should I assign the CPU's ?  3 for CPU and 1 for I/O ?  2 and 2 ?
if you have no option to get more CPU\RAM\ new box with more  resources..
you can try any combination.

consider to do all changes off hours..
<In Microsoft SQL Server 2005, the affinity mask option can be configured without requiring a restart of the SQL Server instance>
----

but before of doing  these "not regular in DBA world" changes: normally default "auto"
is good- but depends of situation there can be steps to adjust..


please clarify what is your sql server 2005 edition
how often do you run db maints?
what is the exact issue ?


check the max\min memory settings
and adjust as needed to give OS at least 2GB RAM
check max degree of parallelism
you may need to switch it of on both servers

max degree of parallelism Option
http://technet.microsoft.com/en-us/library/ms181007(v=sql.90).aspx
0
 
LVL 22

Author Comment

by:Steve Wales
ID: 39782221
SQL Server 2005 Enterprise is the Edition
DB Maintenance is monthly for index maintenance, other things like stats updates run weekly.

Exact issue is that the users have been complaining recently about performance issues.  This server was set up 4 or 5 years ago when I was brand new with SQL Server and hasn't been changed much since.  There has been growth over time and the last few month we've had more and more complaints about performance.  

Seeing that the affinity settings fly in the face of documented configuration (using the GUI to set affinity bits automatically does RECONFIGURE WITH OVERRIDE so you don't even see the warning messages normally generated by just doing RECONFIGURE) I found myself wondering if this is a major contributing factor.

Reading more into how affinity works, I found out that you shouldn't set the affinity bits on for both CPU and IO on the same CPU (which is how it's set at the moment).

Memory allocation is:
Default instance: 8GB (using AWE)
Named instance: 4GB (using AWE)
This leaves 4 GB for the OS.

MAXDOP for Default instance is 0
MAXDOP for named instance is 0
0
 
LVL 42

Expert Comment

by:EugeneZ
ID: 39784009
is it Windows 2003 R2 SP2 64 or 32 bit?
try to switch off MAXDOP for both == set 1 for start or

---about update stats \reindex  you may need to revisit schedule
if the data has a lot DML you may need to run daily at least sp_updatestats
--

check:
<users have been complaining recently about performance issues.>
what was changed?
when did it start?

etc
try to identify  "slow" code that maybe optimized by e.g. index
or code rewriting
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 22

Author Comment

by:Steve Wales
ID: 39784072
This app has had performance issues on and off for quite a while.   There's no option to rewrite code or add indexes (even though I've done some missing index analysis and provided the details to the vendor, nothing has changed yet.

However, my main concern at the moment is based on the documentation (and some topics on other forums) where setting affinity for both IO and CPU on the same CPU's is supposedly a terribly bad thing to do performance wise.

If, based on that and the docs, I'd like to change that first and see where we are at after that.

So, specifically for just this point in time, do you have any experience with Traceflag 8002 and CPU affinity set (to limit this CPU hog to a portion of the CPU's available on the server and leaving I/O affinity set to "Auto".

Thanks
0
 
LVL 42

Accepted Solution

by:
EugeneZ earned 500 total points
ID: 39784356
I would put back to the default "auto"
remove trace
and set MAXDOP or off or 2-4
for start
0
 
LVL 22

Author Closing Comment

by:Steve Wales
ID: 39887386
I will accept this as the solution.

General guidance seems to be to set affinity back to auto for I/O especially.  I don't have approval to proceed yet, but that's what I want to do.

CPU affinity will remain so going to leave the traceflag in place since it is specifically designed to overcome the shortcomings of the default behavior.

If I have continued issues down the road, I'll come back with another question.

Thanks for  your input.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now