Link to home
Start Free TrialLog in
Avatar of oromm
orommFlag for United States of America

asked on

Clustered SQL 2008 R2 patching specifics

We have a 2-server clustered (Active/Passive) SQL 2008 R2 RTM environment we need to update to SP2 along with Cumulative Update package 1.  While updating SQL Server, it is essential we minimize the downtime to our users.  For planning, I am trying to determine when and for how long SQL services will be offline.

I have already reviewed the MS KB article describing the failover cluster rolling patch/SP process (http://support.microsoft.com/kb/958734), but that has left me with questions.

This article, as I have read it, says to apply the updates/patches to the nodes that are no longer possible owners of the clustered resources. In which case, it would not longer have access to the Quorum or the shared disk storage where all database files are located (master, msdb, model, and all user db's).
Referring to the system databases, MS also states that "Installing a SQL Server 2008 R2 update changes these databases, making them incompatible with earlier versions of SQL Server 2008 R2".  (ref: http://msdn.microsoft.com/library/dd638062(SQL.105).aspx)

Based on the steps in the cluster instructions (KB958734 that I referred to earlier):
 - Step 2 is removing passive node(s) from the possible owners of clustered services.
 - Step 3 has the updates occurring on those nodes no longer possible owners of the services.
 - Step 4 is adding back the now-updated nodes to the cluster services owners list.
 - Step 5 is moving the cluster services to a node that was updated.
 
QUESTIONS:
1. At this point (Step 5) when the SQL Services come online on the updated node, is this when it writes updates to the system databases?  If so, is this when there will be a delay in all services coming online while the appropriate data updates are performed?
2. Will the number and/or size of the user databases on the server impact the amount of time the upgrade will complete and services will come back online?


FYI Each cluster node is running Windows Server 2008 R2 Datacenter 64-bit, SP1 with 32 GB memory.
ASKER CERTIFIED SOLUTION
Avatar of Ryan McCauley
Ryan McCauley
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of oromm

ASKER

We are getting a clustered test environment together, but it will be a week before that will be ready for my use. And I'm needing to provide some guidance to our management team who, of course, are not patient in getting answers.

I think you've hit on my major question/concern which was just when to experience any additional connectivity delay during the process.  We can tolerate a minute or two of downtime, but not much more than that without very detailed preparation and timing.  We have only a couple of small windows each week that we can accommodate any lengthy downtime, and since we're patching to get a fix to a problem that caused an unplanned outage a few weeks ago, there is heightened sensitivity to any downtime, planned or other.
I thought I'd posted a follow-up here, but clearly I didn't - my apologies.

You can test the amount of time to takes to bring the upgraded databases online using a non-clustered test server - the differences to upgrade the schema are the same between a clustered and a stand-alone instance (as long as your storage is the same - that's the big factor). You can even test the upgrade with copies of your databases on a local test box to get a ballpark of the time required.

Then you'll end up with the following steps and expectations of downtime. Assume the SQL instance lives on node 1 when you start:

1. Patch node 2 (passive node) - no downtime
2. Fail instance from node 2 to 1 - downtime of regular failover + schema upgrade (probably close to a minute, but test the upgrade elsewhere to be sure)
3. Patch node 1 (now passive) - no downtime
4. Fail instance from node 1 to 2 - downtime of regular failover, no schema upgrade (30 seconds or so)
5. All done! The instance is back where it started, if that matters.
Avatar of oromm

ASKER

As a followup to this (and for documentation in case someone someday finds this thread useful to them) I just completed the process on a test cluster we brought up for just this purpose.  I do follow the MS KB 958734 procedures to disable node parenting, just in case.

General steps:
 * disable passive node from parenting the SQL services
 * apply all patches to passive node
 * backup system databases natively, and all user databases using whatever normal means
 * enable passive node to parent SQL services
 * failover to the patched passive node
     ** it is at this time that, besides the 25-30 second period of inaccessible databases during startup, the upgrade scripts are applied to all system and user databases. In the case of going from 2008 R2 RTM up to 2008 R2 SP2 w/CU11, there was an additional 45 seconds of inaccessibility (on sub-production level servers).   During which any connection attempts resulted in "Login failed for user 'xxxxxxxxxxx'. Reason: Server is in script upgrade mode. Only administrator can connect at this time." error message.
 * validate, test, bring up application/services
 * repeat process for the 2nd node that is now passive.
    ** Once this starts, there is no point of rolling back.
    ** include a final failover to ensure all services can still be owned by both nodes again
    ** no additional time during failover, just the usual startup time

In a 2-node cluster scenario, from the time of starting the process (prepping to patch the passive node) until the 2nd node is patched and enable to parent services, there is a risk of not having any failover capabilities. So taking as little time as possible from start to finish is critical.

Next for me is to schedule and execute this in our production environment.  Joy.

That is all.
Thanks for posting the follow-up including the steps you took - it's a good walk-through of what's necessary to patch SQL cleanly, and it's good to hear that the database upgrade itself was relatively quick.