Solved

replace sql custer node in 2 node cluster

Posted on 2015-01-14
20
89 Views
Last Modified: 2015-02-09
I have a 2 node active-passive sql cluster in Windows 2008 R2.  One of the servers is a VM (vmware esxi 5.5) and the other is a physical server.  The physical server is being retired and we have created its replacement in vmware.

I've never replaced a node in a cluster so I'm a bit worried about the steps necessary to do this without causing problems.  The application running on this cluster is the most visible and critical to our organization so I can't make mistakes.

Does anyone have a step-by-step instruction of how to replace a sql cluster node in a 2 node cluster?

My thought was to add the new server to the cluster and then just remove the old node but I'm not sure if that is the best way to do this.  I'm very new to working with clusters so I wanted to get some advice before proceeding.

Thanks...
0
Comment
Question by:dspjones
  • 12
  • 8
20 Comments
 
LVL 76

Expert Comment

by:arnold
ID: 40550673
Which is the active node? presumably the VM is the current active node?
First make sure that the SQL related storage are properly configured to be accessible on the new VM.
IF SAN is the source of the storage. Is it runing on the same ESX server or each VM is on a separate physical ESX server?
Make sure the network resources you are configuring in the new VM, match the existing.  Check with network to make sure if they restrict IPs to MAC mappings, that you are adding another VM to the cluster IP as well as SQL Application IP.
http://support.microsoft.com/kb/244331

First you should add the new VMWARE as a third node into the rotation marked least preferred.
What does the new VM (replacement node have installed)
Is it at the same level as the current VM/physical system? updates/application.
http://msdn.microsoft.com/en-us/library/ms191545%28v=sql.105%29.aspx

Do you have enough resources to practice, i.e. create a new two node VM cluster. and then practice adding a third VM, and then removing one.

If possible get the setup of the test VM as close as possible to the setup of the production one in terms of constraints, etc.
0
 

Author Comment

by:dspjones
ID: 40585084
I found the procedure for adding and removing nodes to sql 2005 cluster...  I have successfully removed and evicted the node to be retired.  I am now attempting to add the replacement node.

Based on these instructions:
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-1.html
http://www.databasejournal.com/features/mssql/adding-node-sql-server-failover-cluster-2.html


When I get to step # 20 I get the following error:

Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

From everything I have checked this problem occurs because there is an open RDP session on the passive node being installed.  Simply logging off the machine (or rebooting it) should fix the problem and allow the install to finish.

My issue is that I AM NOT LOGGED ON TO ANY NODES other than the one doing the install.  So what else do I need to do?
0
 
LVL 76

Expert Comment

by:arnold
ID: 40585390
The only thing that needs to start is msiexec  to install. Or dealing with updates.

Are you using a domain account in the login places?
Is this account administrative on the domain, or only on the local system?
0
 

Author Comment

by:dspjones
ID: 40586495
The account is the domain administrator account.  After the error, I can go to the remote node and it will show the task as either running or it will show it did run and complete, but nothing is installed.

I tried a couple of things on the active node (where the process is initiated from).  It generates a task on the remote node with something like \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe /qn /ENDCMD [some long binary number here]

I went to the active node and tried to run setup.exe in that folder and it fails with:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.

Perhaps there is something wrong there?
0
 
LVL 76

Expert Comment

by:arnold
ID: 40586700
Did you install SQL on the node that you are looking to add?
Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?

from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
0
 

Author Comment

by:dspjones
ID: 40586754
Did you install SQL on the node that you are looking to add?
No, as I understand it, the process that is failing is what will perform the installation


Does \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe exist?
Yes it does


from the node that is a member of a cluster but is yet not a member of the application cluster (SQL)
can you access \\ServerA\C$\program files\microsoft sql server\90\setup bootstrap\setup.exe?
Yes I can
0
 
LVL 76

Expert Comment

by:arnold
ID: 40586834
You need to be running the install on the node that you are adding, not from the node that is currently active/clustered.

https://msdn.microsoft.com/en-us/library/ms191545.aspx#Add

The node needs the sql install media which the task is supposed to trigger.
Often, one has to bring the os/sql to as close to the existing active nodes version prior to joining into the cluster.
The instance is local and temporary.
0
 

Author Comment

by:dspjones
ID: 40586854
The instructions you posted are for SQL 2014
0
 

Author Comment

by:dspjones
ID: 40586855
0
 
LVL 76

Expert Comment

by:arnold
ID: 40586914
how about you try. Install a local instance of sql on the node that currently does not have SQL.
Then update it to as close as possible to the version you have running on the active node.

Then follow the instructions anew.

Alternatively, look at your active nodes services.msc tab do you have two references to sql server start one disabled, local instance and one manual running cluster instance?

the bootstrap setup.exe initiates the process, but there has to be an installed sql with sql related files which it triggers to reconfigure and add the clustered instance to the system.

I suggested you not detach the existing failover node until the new node can be added.
i.e. you sell your old car/home, when you have the new one.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:dspjones
ID: 40587025
OK here's what I did:

installed sql on new node
on active node, added new node to the cluster
on active node, initiated CHANGE on sql server from programs and features
Chose new node when prompted to add the node in the SQL Maintenance Wizard
ran through the prompts and kicked off the install process (setup.exe from bootstrap)

result:
Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine.

Same as before.  I'm at a total loss here.
0
 
LVL 76

Expert Comment

by:arnold
ID: 40587034
I am guessing, you are missing a step

Are you opening and running the control panel using the clustered sql server's login/service account?

Can you try the procedure to re-add the former node into the cluster?

Look at the former node to see what is going on there. On the current node is the local mssql server instance running or stopped?

Does the new node have "access" to the shared storage?
0
 

Author Comment

by:dspjones
ID: 40587064
well the same thing is occurring with the old server.  SQL is installed, shared disks are accessible by both active and new/former node.  But the error is the same.

This has me thinking that something is wrong with the setup files in the \90\setup bootstrap\ folder on the active node.  As I stated before attempting to run setup from the active node in that folder results in:

The installation package could not be opened.  Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package.
0
 
LVL 76

Expert Comment

by:arnold
ID: 40587115
what is recorded in the event log on the node to be added?
There has to be a log file that would shed light on what is going on.
Though, I am not sure where that file would be.
See on the active/new node in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\log to see if there is a file with the date/time when it run detail.log to see what the issue it is running into.

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/26b41ece-de77-45f5-8695-93af7adb45a6/adding-second-node-to-sql-server-cluster-fails?forum=sqldisasterrecovery

The reference is for sql 2008, but see whether the log on your 2005 still includes this log.
0
 

Author Comment

by:dspjones
ID: 40587331
I found logs but they basically say - Task appears not to have run.  It gives an error code of 1602 which I looked up to find means user cancelled process.  Which I did after it fails.  So no help there.  There are some other log files which I'm still going through though.
0
 

Author Comment

by:dspjones
ID: 40587336
I also ran the validation tests for the cluster and found that the sql install on the new node is at sp 1, the active node is sp 4 so I am installing the service pack on the new node and rebooting to see if that has an affect.  I will let you know.
0
 

Author Comment

by:dspjones
ID: 40587415
SP levels now the same, still no dice.
0
 
LVL 76

Expert Comment

by:arnold
ID: 40587550
Before the user cancel level, there has to be something in the log, or the user you are using that may shed light on the issue.  
it can be something simple, that I can not think of where or what it might be.
0
 

Accepted Solution

by:
dspjones earned 0 total points
ID: 40590351
Well turns out that all I needed to do was reboot the active node.  I did that and when it came back up the cluster had failed over to the new node....successfully I might add.  All app functions based on this cluster/db worked perfectly.  

I am not going to even begin to wonder exactly what was "fixed" by the reboot of the active node.  Hopefully this entire system will be upgraded to SQL 2014 in the next 6 months and I can drop kick this current system to the curb.  Thanks for all of your help.
0
 

Author Closing Comment

by:dspjones
ID: 40597971
While appreciated the other comments did not resolve the problem.  A reboot of the active server ultimately fixed the issue.  I would think that in reference to the MS article listed here:

http://support.microsoft.com/kb/910851/en-us

That talks about a user being logged on to one or more nodes of the cluster, while not specifically mentioning the active node, most likely could be construed to mean the active node as well.  Thus rebooting that node solved the issue.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now