Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 133
  • Last Modified:

replace sql custer node in 2 node cluster

I have a 2 node active-passive sql cluster in Windows 2008 R2.  One of the servers is a VM (vmware esxi 5.5) and the other is a physical server.  The physical server is being retired and we have created its replacement in vmware.

I've never replaced a node in a cluster so I'm a bit worried about the steps necessary to do this without causing problems.  The application running on this cluster is the most visible and critical to our organization so I can't make mistakes.

Does anyone have a step-by-step instruction of how to replace a sql cluster node in a 2 node cluster?

My thought was to add the new server to the cluster and then just remove the old node but I'm not sure if that is the best way to do this.  I'm very new to working with clusters so I wanted to get some advice before proceeding.

Thanks...
0
dspjones
Asked:
dspjones
  • 12
  • 8
1 Solution
 
arnoldCommented:
Which is the active node? presumably the VM is the current active node?
First make sure that the SQL related storage are properly configured to be accessible on the new VM.
IF SAN is the source of the storage. Is it runing on the same ESX server or each VM is on a separate physical ESX server?
Make sure the network resources you are configuring in the new VM, match the existing.  Check with network to make sure if they restrict IPs to MAC mappings, that you are adding another VM to the cluster IP as well as SQL Application IP.
http://support.microsoft.com/kb/244331

First you should add the new VMWARE as a third node into the rotation marked least preferred.
What does the new VM (replacement node have installed)
Is it at the same level as the current VM/physical system? updates/application.
http://msdn.microsoft.com/en-us/library/ms191545%28v=sql.105%29.aspx

Do you have enough resources to practice, i.e. create a new two node VM cluster. and then practice adding a third VM, and then removing one.

If possible get the setup of the test VM as close as possible to the setup of the production one in terms of constraints, etc.
0
 
dspjonesAuthor Commented:
I found the procedure for adding and removing nodes to sql 2005 cluster...  I have successfully removed and evicted the node to be retired.  I am now attempting to add the replacement node.

Based on these instructions:
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-1.html
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-2.html


When I get to step # 20 I get the following error:

Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

From everything I have checked this problem occurs because there is an open RDP session on the passive node being installed.  Simply logging off the machine (or rebooting it) should fix the problem and allow the install to finish.

My issue is that I AM NOT LOGGED ON TO ANY NODES other than the one doing the install.  So what else do I need to do?
0
 
arnoldCommented:
The only thing that needs to start is msiexec  to install. Or dealing with updates.

Are you using a domain account in the login places?
Is this account administrative on the domain, or only on the local system?
0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
dspjonesAuthor Commented:
The account is the domain administrator account.  After the error, I can go to the remote node and it will show the task as either running or it will show it did run and complete, but nothing is installed.

I tried a couple of things on the active node (where the process is initiated from).  It generates a task on the remote node with something like \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe /qn /ENDCMD [some long binary number here]

I went to the active node and tried to run setup.exe in that folder and it fails with:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.

Perhaps there is something wrong there?
0
 
arnoldCommented:
Did you install SQL on the node that you are looking to add?
Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?

from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
0
 
dspjonesAuthor Commented:
Did you install SQL on the node that you are looking to add?
No, as I understand it, the process that is failing is what will perform the installation


Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?
Yes it does


from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
Yes I can
0
 
arnoldCommented:
You need to be running the install on the node that you are adding, not from the node that is currently active/clustered.

https://msdn.microsoft.com/en-us/library/ms191545.aspx#Add

The node needs the sql install media which the task is supposed to trigger.
Often, one has to bring the os/sql to as close to the existing active nodes version prior to joining into the cluster.
The instance is local and temporary.
0
 
dspjonesAuthor Commented:
The instructions you posted are for SQL 2014
0
 
dspjonesAuthor Commented:
0
 
arnoldCommented:
how about you try. Install a local instance of sql on the node that currently does not have SQL.
Then update it to as close as possible to the version you have running on the active node.

Then follow the instructions anew.

Alternatively, look at your active nodes services.msc tab do you have two references to sql server start one disabled, local instance and one manual running cluster instance?

the bootstrap setup.exe initiates the process, but there has to be an installed sql with sql related files which it triggers to reconfigure and add the clustered instance to the system.

I suggested you not detach the existing failover node until the new node can be added.
i.e. you sell your old car/home, when you have the new one.
0
 
dspjonesAuthor Commented:
OK here's what I did:

installed sql on new node
on active node, added new node to the cluster
on active node, initiated CHANGE on sql server from programs and features
Chose new node when prompted to add the node in the SQL Maintenance Wizard
ran through the prompts and kicked off the install process (setup.exe from bootstrap)

result:
Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

Same as before.  I'm at a total loss here.
0
 
arnoldCommented:
I am guessing, you are missing a step

Are you opening and running the control panel using the clustered sql server's login/service account?

Can you try the procedure to re-add the former node into the cluster?

Look at the former node to see what is going on there. On the current node is the local mssql server instance running or stopped?

Does the new node have "access" to the shared storage?
0
 
dspjonesAuthor Commented:
well the same thing is occurring with the old server.  SQL is installed, shared disks are accessible by both active and new/former node.  But the error is the same.

This has me thinking that something is wrong with the setup files in the \90\setup bootstrap\ folder on the active node.  As I stated before attempting to run setup from the active node in that folder results in:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.
0
 
arnoldCommented:
what is recorded in the event log on the node to be added?
There has to be a log file that would shed light on what is going on.
Though, I am not sure where that file would be.
See on the active/new node in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\log to see if there is a file with the date/time when it run detail.log to see what the issue it is running into.

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/26b41ece-de77-45f5-8695-93af7adb45a6/adding-second-node-to-sql-server-cluster-fails?forum=sqldisasterrecovery

The reference is for sql 2008, but see whether the log on your 2005 still includes this log.
0
 
dspjonesAuthor Commented:
I found logs but they basically say - Task appears not to have run.  It gives an error code of 1602 which I looked up to find means user cancelled process.  Which I did after it fails.  So no help there.  There are some other log files which I'm still going through though.
0
 
dspjonesAuthor Commented:
I also ran the validation tests for the cluster and found that the sql install on the new node is at sp 1, the active node is sp 4 so I am installing the service pack on the new node and rebooting to see if that has an affect.  I will let you know.
0
 
dspjonesAuthor Commented:
SP levels now the same, still no dice.
0
 
arnoldCommented:
Before the user cancel level, there has to be something in the log, or the user you are using that may shed light on the issue.  
it can be something simple, that I can not think of where or what it might be.
0
 
dspjonesAuthor Commented:
Well turns out that all I needed to do was reboot the active node.  I did that and when it came back up the cluster had failed over to the new node....successfully I might add.  All app functions based on this cluster/db worked perfectly.  

I am not going to even begin to wonder exactly what was "fixed" by the reboot of the active node.  Hopefully this entire system will be upgraded to SQL 2014 in the next 6 months and I can drop kick this current system to the curb.  Thanks for all of your help.
0
 
dspjonesAuthor Commented:
While appreciated the other comments did not resolve the problem.  A reboot of the active server ultimately fixed the issue.  I would think that in reference to the MS article listed here:

http://support.microsoft.com/kb/910851/en-us

That talks about a user being logged on to one or more nodes of the cluster, while not specifically mentioning the active node, most likely could be construed to mean the active node as well.  Thus rebooting that node solved the issue.
0

Featured Post

Creating Active Directory Users from a Text File

If your organization has a need to mass-create AD user accounts, watch this video to see how its done without the need for scripting or other unnecessary complexities.

  • 12
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now