Solved

replace sql custer node in 2 node cluster

Posted on 2015-01-14
20
108 Views
Last Modified: 2015-02-09
I have a 2 node active-passive sql cluster in Windows 2008 R2.  One of the servers is a VM (vmware esxi 5.5) and the other is a physical server.  The physical server is being retired and we have created its replacement in vmware.

I've never replaced a node in a cluster so I'm a bit worried about the steps necessary to do this without causing problems.  The application running on this cluster is the most visible and critical to our organization so I can't make mistakes.

Does anyone have a step-by-step instruction of how to replace a sql cluster node in a 2 node cluster?

My thought was to add the new server to the cluster and then just remove the old node but I'm not sure if that is the best way to do this.  I'm very new to working with clusters so I wanted to get some advice before proceeding.

Thanks...
0
Comment
Question by:dspjones
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 8
20 Comments
 
LVL 78

Expert Comment

by:arnold
ID: 40550673
Which is the active node? presumably the VM is the current active node?
First make sure that the SQL related storage are properly configured to be accessible on the new VM.
IF SAN is the source of the storage. Is it runing on the same ESX server or each VM is on a separate physical ESX server?
Make sure the network resources you are configuring in the new VM, match the existing.  Check with network to make sure if they restrict IPs to MAC mappings, that you are adding another VM to the cluster IP as well as SQL Application IP.
http://support.microsoft.com/kb/244331

First you should add the new VMWARE as a third node into the rotation marked least preferred.
What does the new VM (replacement node have installed)
Is it at the same level as the current VM/physical system? updates/application.
http://msdn.microsoft.com/en-us/library/ms191545%28v=sql.105%29.aspx

Do you have enough resources to practice, i.e. create a new two node VM cluster. and then practice adding a third VM, and then removing one.

If possible get the setup of the test VM as close as possible to the setup of the production one in terms of constraints, etc.
0
 

Author Comment

by:dspjones
ID: 40585084
I found the procedure for adding and removing nodes to sql 2005 cluster...  I have successfully removed and evicted the node to be retired.  I am now attempting to add the replacement node.

Based on these instructions:
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-1.html
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-2.html


When I get to step # 20 I get the following error:

Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

From everything I have checked this problem occurs because there is an open RDP session on the passive node being installed.  Simply logging off the machine (or rebooting it) should fix the problem and allow the install to finish.

My issue is that I AM NOT LOGGED ON TO ANY NODES other than the one doing the install.  So what else do I need to do?
0
 
LVL 78

Expert Comment

by:arnold
ID: 40585390
The only thing that needs to start is msiexec  to install. Or dealing with updates.

Are you using a domain account in the login places?
Is this account administrative on the domain, or only on the local system?
0
Why You Need a DevOps Toolchain

IT needs to deliver services with more agility and velocity. IT must roll out application features and innovations faster to keep up with customer demands, which is where a DevOps toolchain steps in. View the infographic to see why you need a DevOps toolchain.

 

Author Comment

by:dspjones
ID: 40586495
The account is the domain administrator account.  After the error, I can go to the remote node and it will show the task as either running or it will show it did run and complete, but nothing is installed.

I tried a couple of things on the active node (where the process is initiated from).  It generates a task on the remote node with something like \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe /qn /ENDCMD [some long binary number here]

I went to the active node and tried to run setup.exe in that folder and it fails with:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.

Perhaps there is something wrong there?
0
 
LVL 78

Expert Comment

by:arnold
ID: 40586700
Did you install SQL on the node that you are looking to add?
Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?

from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
0
 

Author Comment

by:dspjones
ID: 40586754
Did you install SQL on the node that you are looking to add?
No, as I understand it, the process that is failing is what will perform the installation


Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?
Yes it does


from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
Yes I can
0
 
LVL 78

Expert Comment

by:arnold
ID: 40586834
You need to be running the install on the node that you are adding, not from the node that is currently active/clustered.

https://msdn.microsoft.com/en-us/library/ms191545.aspx#Add

The node needs the sql install media which the task is supposed to trigger.
Often, one has to bring the os/sql to as close to the existing active nodes version prior to joining into the cluster.
The instance is local and temporary.
0
 

Author Comment

by:dspjones
ID: 40586854
The instructions you posted are for SQL 2014
0
 

Author Comment

by:dspjones
ID: 40586855
0
 
LVL 78

Expert Comment

by:arnold
ID: 40586914
how about you try. Install a local instance of sql on the node that currently does not have SQL.
Then update it to as close as possible to the version you have running on the active node.

Then follow the instructions anew.

Alternatively, look at your active nodes services.msc tab do you have two references to sql server start one disabled, local instance and one manual running cluster instance?

the bootstrap setup.exe initiates the process, but there has to be an installed sql with sql related files which it triggers to reconfigure and add the clustered instance to the system.

I suggested you not detach the existing failover node until the new node can be added.
i.e. you sell your old car/home, when you have the new one.
0
 

Author Comment

by:dspjones
ID: 40587025
OK here's what I did:

installed sql on new node
on active node, added new node to the cluster
on active node, initiated CHANGE on sql server from programs and features
Chose new node when prompted to add the node in the SQL Maintenance Wizard
ran through the prompts and kicked off the install process (setup.exe from bootstrap)

result:
Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

Same as before.  I'm at a total loss here.
0
 
LVL 78

Expert Comment

by:arnold
ID: 40587034
I am guessing, you are missing a step

Are you opening and running the control panel using the clustered sql server's login/service account?

Can you try the procedure to re-add the former node into the cluster?

Look at the former node to see what is going on there. On the current node is the local mssql server instance running or stopped?

Does the new node have "access" to the shared storage?
0
 

Author Comment

by:dspjones
ID: 40587064
well the same thing is occurring with the old server.  SQL is installed, shared disks are accessible by both active and new/former node.  But the error is the same.

This has me thinking that something is wrong with the setup files in the \90\setup bootstrap\ folder on the active node.  As I stated before attempting to run setup from the active node in that folder results in:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.
0
 
LVL 78

Expert Comment

by:arnold
ID: 40587115
what is recorded in the event log on the node to be added?
There has to be a log file that would shed light on what is going on.
Though, I am not sure where that file would be.
See on the active/new node in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\log to see if there is a file with the date/time when it run detail.log to see what the issue it is running into.

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/26b41ece-de77-45f5-8695-93af7adb45a6/adding-second-node-to-sql-server-cluster-fails?forum=sqldisasterrecovery

The reference is for sql 2008, but see whether the log on your 2005 still includes this log.
0
 

Author Comment

by:dspjones
ID: 40587331
I found logs but they basically say - Task appears not to have run.  It gives an error code of 1602 which I looked up to find means user cancelled process.  Which I did after it fails.  So no help there.  There are some other log files which I'm still going through though.
0
 

Author Comment

by:dspjones
ID: 40587336
I also ran the validation tests for the cluster and found that the sql install on the new node is at sp 1, the active node is sp 4 so I am installing the service pack on the new node and rebooting to see if that has an affect.  I will let you know.
0
 

Author Comment

by:dspjones
ID: 40587415
SP levels now the same, still no dice.
0
 
LVL 78

Expert Comment

by:arnold
ID: 40587550
Before the user cancel level, there has to be something in the log, or the user you are using that may shed light on the issue.  
it can be something simple, that I can not think of where or what it might be.
0
 

Accepted Solution

by:
dspjones earned 0 total points
ID: 40590351
Well turns out that all I needed to do was reboot the active node.  I did that and when it came back up the cluster had failed over to the new node....successfully I might add.  All app functions based on this cluster/db worked perfectly.  

I am not going to even begin to wonder exactly what was "fixed" by the reboot of the active node.  Hopefully this entire system will be upgraded to SQL 2014 in the next 6 months and I can drop kick this current system to the curb.  Thanks for all of your help.
0
 

Author Closing Comment

by:dspjones
ID: 40597971
While appreciated the other comments did not resolve the problem.  A reboot of the active server ultimately fixed the issue.  I would think that in reference to the MS article listed here:

http://support.microsoft.com/kb/910851/en-us

That talks about a user being logged on to one or more nodes of the cluster, while not specifically mentioning the active node, most likely could be construed to mean the active node as well.  Thus rebooting that node solved the issue.
0

Featured Post

Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
This tutorial will walk an individual through the steps necessary to install and configure the Windows Server Backup Utility. Directly connect an external storage device such as a USB drive, or CD\DVD burner: If the device is a USB drive, ensure i…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question