Clustered SQL 2008 R2 patching specifics

We have a 2-server clustered (Active/Passive) SQL 2008 R2 RTM environment we need to update to SP2 along with Cumulative Update package 1.  While updating SQL Server, it is essential we minimize the downtime to our users.  For planning, I am trying to determine when and for how long SQL services will be offline.

I have already reviewed the MS KB article describing the failover cluster rolling patch/SP process (, but that has left me with questions.

This article, as I have read it, says to apply the updates/patches to the nodes that are no longer possible owners of the clustered resources. In which case, it would not longer have access to the Quorum or the shared disk storage where all database files are located (master, msdb, model, and all user db's).
Referring to the system databases, MS also states that "Installing a SQL Server 2008 R2 update changes these databases, making them incompatible with earlier versions of SQL Server 2008 R2".  (ref:

Based on the steps in the cluster instructions (KB958734 that I referred to earlier):
 - Step 2 is removing passive node(s) from the possible owners of clustered services.
 - Step 3 has the updates occurring on those nodes no longer possible owners of the services.
 - Step 4 is adding back the now-updated nodes to the cluster services owners list.
 - Step 5 is moving the cluster services to a node that was updated.
1. At this point (Step 5) when the SQL Services come online on the updated node, is this when it writes updates to the system databases?  If so, is this when there will be a delay in all services coming online while the appropriate data updates are performed?
2. Will the number and/or size of the user databases on the server impact the amount of time the upgrade will complete and services will come back online?

FYI Each cluster node is running Windows Server 2008 R2 Datacenter 64-bit, SP1 with 32 GB memory.
Who is Participating?
Ryan McCauleyConnect With a Mentor Data and Analytics ManagerCommented:
When we apply patches like this (that modify the databases to make them incompatible with previous versions), we usually follow the process of patching the passive node, failing the database to the newly patched node, and then patching the now passive (previously active) node. Once you've applied the patch, there's no issue until you move the database onto a patched node - then, the problem is that an instance won't come online on an unpatched node after it's come online on a patched node (it will fail to come online).

If you want to be especially rigorous, you can patch the passive node and move the instance to the patched node. Once it's moved, you can alter the cluster resource to remove the unpatched node from possible owners, and then patch that node. Once it's patched, add it back to the possible owners and fail the instance over to test that it comes online as expected.

You'd experience two periods of downtime associated with the two failovers (from original to newly-patched, and then from first-patched to second-patched). I'm not sure what you experience when you do a failover, but ours are around 25 seconds of client connectivity loss. After applying the patch and moving the instance to the newly patched node (first failover), you may have some additional delay while the instance brings the databases up for the first time and modifies their schema to comply with the upgrade. You'd have to test this in your environment to see what the associated delay is, but I've usually seen it around another 30 seconds (though this obviously varies pretty dramatically).

Does that address your question about process and alleviate concerns, or are there any segments you're still unsure about? If you have a test cluster, your best course of action would be to stage your databases and do a dry run on applying the patch, timing the downtime associated with each step.
orommAuthor Commented:
We are getting a clustered test environment together, but it will be a week before that will be ready for my use. And I'm needing to provide some guidance to our management team who, of course, are not patient in getting answers.

I think you've hit on my major question/concern which was just when to experience any additional connectivity delay during the process.  We can tolerate a minute or two of downtime, but not much more than that without very detailed preparation and timing.  We have only a couple of small windows each week that we can accommodate any lengthy downtime, and since we're patching to get a fix to a problem that caused an unplanned outage a few weeks ago, there is heightened sensitivity to any downtime, planned or other.
Ryan McCauleyData and Analytics ManagerCommented:
I thought I'd posted a follow-up here, but clearly I didn't - my apologies.

You can test the amount of time to takes to bring the upgraded databases online using a non-clustered test server - the differences to upgrade the schema are the same between a clustered and a stand-alone instance (as long as your storage is the same - that's the big factor). You can even test the upgrade with copies of your databases on a local test box to get a ballpark of the time required.

Then you'll end up with the following steps and expectations of downtime. Assume the SQL instance lives on node 1 when you start:

1. Patch node 2 (passive node) - no downtime
2. Fail instance from node 2 to 1 - downtime of regular failover + schema upgrade (probably close to a minute, but test the upgrade elsewhere to be sure)
3. Patch node 1 (now passive) - no downtime
4. Fail instance from node 1 to 2 - downtime of regular failover, no schema upgrade (30 seconds or so)
5. All done! The instance is back where it started, if that matters.
orommAuthor Commented:
As a followup to this (and for documentation in case someone someday finds this thread useful to them) I just completed the process on a test cluster we brought up for just this purpose.  I do follow the MS KB 958734 procedures to disable node parenting, just in case.

General steps:
 * disable passive node from parenting the SQL services
 * apply all patches to passive node
 * backup system databases natively, and all user databases using whatever normal means
 * enable passive node to parent SQL services
 * failover to the patched passive node
     ** it is at this time that, besides the 25-30 second period of inaccessible databases during startup, the upgrade scripts are applied to all system and user databases. In the case of going from 2008 R2 RTM up to 2008 R2 SP2 w/CU11, there was an additional 45 seconds of inaccessibility (on sub-production level servers).   During which any connection attempts resulted in "Login failed for user 'xxxxxxxxxxx'. Reason: Server is in script upgrade mode. Only administrator can connect at this time." error message.
 * validate, test, bring up application/services
 * repeat process for the 2nd node that is now passive.
    ** Once this starts, there is no point of rolling back.
    ** include a final failover to ensure all services can still be owned by both nodes again
    ** no additional time during failover, just the usual startup time

In a 2-node cluster scenario, from the time of starting the process (prepping to patch the passive node) until the 2nd node is patched and enable to parent services, there is a risk of not having any failover capabilities. So taking as little time as possible from start to finish is critical.

Next for me is to schedule and execute this in our production environment.  Joy.

That is all.
Ryan McCauleyData and Analytics ManagerCommented:
Thanks for posting the follow-up including the steps you took - it's a good walk-through of what's necessary to patch SQL cleanly, and it's good to hear that the database upgrade itself was relatively quick.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.