Solved

replace sql custer node in 2 node cluster

Posted on 2015-01-14
20
102 Views
Last Modified: 2015-02-09
I have a 2 node active-passive sql cluster in Windows 2008 R2.  One of the servers is a VM (vmware esxi 5.5) and the other is a physical server.  The physical server is being retired and we have created its replacement in vmware.

I've never replaced a node in a cluster so I'm a bit worried about the steps necessary to do this without causing problems.  The application running on this cluster is the most visible and critical to our organization so I can't make mistakes.

Does anyone have a step-by-step instruction of how to replace a sql cluster node in a 2 node cluster?

My thought was to add the new server to the cluster and then just remove the old node but I'm not sure if that is the best way to do this.  I'm very new to working with clusters so I wanted to get some advice before proceeding.

Thanks...
0
Comment
Question by:dspjones
  • 12
  • 8
20 Comments
 
LVL 77

Expert Comment

by:arnold
ID: 40550673
Which is the active node? presumably the VM is the current active node?
First make sure that the SQL related storage are properly configured to be accessible on the new VM.
IF SAN is the source of the storage. Is it runing on the same ESX server or each VM is on a separate physical ESX server?
Make sure the network resources you are configuring in the new VM, match the existing.  Check with network to make sure if they restrict IPs to MAC mappings, that you are adding another VM to the cluster IP as well as SQL Application IP.
http://support.microsoft.com/kb/244331

First you should add the new VMWARE as a third node into the rotation marked least preferred.
What does the new VM (replacement node have installed)
Is it at the same level as the current VM/physical system? updates/application.
http://msdn.microsoft.com/en-us/library/ms191545%28v=sql.105%29.aspx

Do you have enough resources to practice, i.e. create a new two node VM cluster. and then practice adding a third VM, and then removing one.

If possible get the setup of the test VM as close as possible to the setup of the production one in terms of constraints, etc.
0
 

Author Comment

by:dspjones
ID: 40585084
I found the procedure for adding and removing nodes to sql 2005 cluster...  I have successfully removed and evicted the node to be retired.  I am now attempting to add the replacement node.

Based on these instructions:
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-1.html
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-2.html


When I get to step # 20 I get the following error:

Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

From everything I have checked this problem occurs because there is an open RDP session on the passive node being installed.  Simply logging off the machine (or rebooting it) should fix the problem and allow the install to finish.

My issue is that I AM NOT LOGGED ON TO ANY NODES other than the one doing the install.  So what else do I need to do?
0
 
LVL 77

Expert Comment

by:arnold
ID: 40585390
The only thing that needs to start is msiexec  to install. Or dealing with updates.

Are you using a domain account in the login places?
Is this account administrative on the domain, or only on the local system?
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:dspjones
ID: 40586495
The account is the domain administrator account.  After the error, I can go to the remote node and it will show the task as either running or it will show it did run and complete, but nothing is installed.

I tried a couple of things on the active node (where the process is initiated from).  It generates a task on the remote node with something like \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe /qn /ENDCMD [some long binary number here]

I went to the active node and tried to run setup.exe in that folder and it fails with:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.

Perhaps there is something wrong there?
0
 
LVL 77

Expert Comment

by:arnold
ID: 40586700
Did you install SQL on the node that you are looking to add?
Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?

from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
0
 

Author Comment

by:dspjones
ID: 40586754
Did you install SQL on the node that you are looking to add?
No, as I understand it, the process that is failing is what will perform the installation


Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?
Yes it does


from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
Yes I can
0
 
LVL 77

Expert Comment

by:arnold
ID: 40586834
You need to be running the install on the node that you are adding, not from the node that is currently active/clustered.

https://msdn.microsoft.com/en-us/library/ms191545.aspx#Add

The node needs the sql install media which the task is supposed to trigger.
Often, one has to bring the os/sql to as close to the existing active nodes version prior to joining into the cluster.
The instance is local and temporary.
0
 

Author Comment

by:dspjones
ID: 40586854
The instructions you posted are for SQL 2014
0
 

Author Comment

by:dspjones
ID: 40586855
0
 
LVL 77

Expert Comment

by:arnold
ID: 40586914
how about you try. Install a local instance of sql on the node that currently does not have SQL.
Then update it to as close as possible to the version you have running on the active node.

Then follow the instructions anew.

Alternatively, look at your active nodes services.msc tab do you have two references to sql server start one disabled, local instance and one manual running cluster instance?

the bootstrap setup.exe initiates the process, but there has to be an installed sql with sql related files which it triggers to reconfigure and add the clustered instance to the system.

I suggested you not detach the existing failover node until the new node can be added.
i.e. you sell your old car/home, when you have the new one.
0
 

Author Comment

by:dspjones
ID: 40587025
OK here's what I did:

installed sql on new node
on active node, added new node to the cluster
on active node, initiated CHANGE on sql server from programs and features
Chose new node when prompted to add the node in the SQL Maintenance Wizard
ran through the prompts and kicked off the install process (setup.exe from bootstrap)

result:
Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

Same as before.  I'm at a total loss here.
0
 
LVL 77

Expert Comment

by:arnold
ID: 40587034
I am guessing, you are missing a step

Are you opening and running the control panel using the clustered sql server's login/service account?

Can you try the procedure to re-add the former node into the cluster?

Look at the former node to see what is going on there. On the current node is the local mssql server instance running or stopped?

Does the new node have "access" to the shared storage?
0
 

Author Comment

by:dspjones
ID: 40587064
well the same thing is occurring with the old server.  SQL is installed, shared disks are accessible by both active and new/former node.  But the error is the same.

This has me thinking that something is wrong with the setup files in the \90\setup bootstrap\ folder on the active node.  As I stated before attempting to run setup from the active node in that folder results in:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.
0
 
LVL 77

Expert Comment

by:arnold
ID: 40587115
what is recorded in the event log on the node to be added?
There has to be a log file that would shed light on what is going on.
Though, I am not sure where that file would be.
See on the active/new node in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\log to see if there is a file with the date/time when it run detail.log to see what the issue it is running into.

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/26b41ece-de77-45f5-8695-93af7adb45a6/adding-second-node-to-sql-server-cluster-fails?forum=sqldisasterrecovery

The reference is for sql 2008, but see whether the log on your 2005 still includes this log.
0
 

Author Comment

by:dspjones
ID: 40587331
I found logs but they basically say - Task appears not to have run.  It gives an error code of 1602 which I looked up to find means user cancelled process.  Which I did after it fails.  So no help there.  There are some other log files which I'm still going through though.
0
 

Author Comment

by:dspjones
ID: 40587336
I also ran the validation tests for the cluster and found that the sql install on the new node is at sp 1, the active node is sp 4 so I am installing the service pack on the new node and rebooting to see if that has an affect.  I will let you know.
0
 

Author Comment

by:dspjones
ID: 40587415
SP levels now the same, still no dice.
0
 
LVL 77

Expert Comment

by:arnold
ID: 40587550
Before the user cancel level, there has to be something in the log, or the user you are using that may shed light on the issue.  
it can be something simple, that I can not think of where or what it might be.
0
 

Accepted Solution

by:
dspjones earned 0 total points
ID: 40590351
Well turns out that all I needed to do was reboot the active node.  I did that and when it came back up the cluster had failed over to the new node....successfully I might add.  All app functions based on this cluster/db worked perfectly.  

I am not going to even begin to wonder exactly what was "fixed" by the reboot of the active node.  Hopefully this entire system will be upgraded to SQL 2014 in the next 6 months and I can drop kick this current system to the curb.  Thanks for all of your help.
0
 

Author Closing Comment

by:dspjones
ID: 40597971
While appreciated the other comments did not resolve the problem.  A reboot of the active server ultimately fixed the issue.  I would think that in reference to the MS article listed here:

http://support.microsoft.com/kb/910851/en-us

That talks about a user being logged on to one or more nodes of the cluster, while not specifically mentioning the active node, most likely could be construed to mean the active node as well.  Thus rebooting that node solved the issue.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

New Windows 7 Installations take days for Windows-Updates to show up and install. This can easily be fixed. I have finally decided to write an article because this seems to get asked several times a day lately. This Article and the Links apply to…
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question