Solved

SQL wont fail over with MS 2003 Cluster

Posted on 2010-11-23
7
900 Views
Last Modified: 2012-05-10
i have a active/passive cluster that runs a SQL data base. when the SQL group is owned by one server(WDB06) it runs fine, when i try and fail other to the second server in the group(WDB07) everything in the cluser's MSCS group comes online(DTC, Cluster name, cluster ip, and quorum"DTC log is also in quoruom drive). then when i try and fail over the SQL Server group, few things happen: at first the SQL server would not start, and was throwing error:
----------------------------------------------------------------------------
Event Type:      Error
Event Source:      MSSQLSERVER
Event Category:      (3)
Event ID:      19019
Date:            11/23/2010
Time:            9:02:02 PM
User:            N/A
Computer:      DA-GHN-WDB07
Description:
[sqsrvres] OnlineThread: service stopped while waiting for QP.
----------------------------------------------------------------------------
what i did for this problem was on the server that had control of the Quorum drive, ran command: MSDTC -resetlog
and i did not get this error again for SQL server.  but i am having a problem with a SQL Server Agent coming online, and again it is only when i try and move the SQL group onto the server: WDB07.  the odd thing here is, when i move the group this this physical server, everything comes online, then with in 3 minuets this server agent turn from online to fail. event view has the following log for this:
------------------------------------------------------------------
Event Type:      Error
Event Source:      SQLSERVERAGENT
Event Category:      Service Control
Event ID:      103
Date:            11/23/2010
Time:            9:20:36 PM
User:            N/A
Computer:      DA-GHN-WDB08
Description:
SQLServerAgent could not be started (reason: Unable to connect to server '(local)'; SQLServerAgent cannot start).
-----------------------------------------------------------------------------
Event Type:      Information
Event Source:      SQLSERVERAGENT
Event Category:      Service Control
Event ID:      102
Date:            11/23/2010
Time:            9:20:37 PM
User:            N/A
Computer:      DA-GHN-WDB08
Description:
SQLServerAgent service successfully stopped.
------------------------------------------------------------------------------------------
Event Type:      Error
Event Source:      SQLSERVERAGENT
Event Category:      Failover
Event ID:      53
Date:            11/23/2010
Time:            9:20:38 PM
User:            N/A
Computer:      DA-GHN-WDB07
Description:
[sqagtres] CheckServiceAlive: Service is dead
----------------------------------------------------------------------------------
I am not too sure as to why this agent will not stqay online when the SQL group is ran on server WDB07, but runs fine when server WDB06 is handling it.  im not a expert with MS server 2003 clustering, and i only know a few basic things with SQL, so any detailed help would be great. let me know if you need anymore information

thank you
Steven
0
Comment
Question by:sdmarek
  • 5
  • 2
7 Comments
 
LVL 4

Expert Comment

by:rlog
ID: 34205924
Seems like the sql agent can't connect to the sql server service. Do you use separate accounts for for server and agent?
0
 
LVL 2

Author Comment

by:sdmarek
ID: 34206428
under system services, both SQL server and SQL Agent (on both servers) run loged in as:  production\prdsqlsrvc
on the servers when i go to computer managment, this domain account is added to the administrator group on both servers.

-steven
0
 
LVL 2

Author Comment

by:sdmarek
ID: 34206512
another note:  im at my new job, this is a network i have inherited so i did not set any of this up, but i would assume this has been working before.   when i got to this job, the systems admin before me had started to move this cluster's physical disk resources onto a new SAN, he did the hard part, he attached the LUNs and moved the SQL database and log drive over to the new SAN, all i did was:
http://support.microsoft.com/default.aspx?scid=kb;en-us;280353
to move the quorum drive, and then reset the DTC log to the new quorum drive.  i dont hink that would of broken any permisions or anything, because the domain user that runs SQL Server and Agent are in the admin group on both servers, but thats a little more history info if it helps.

-steven
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 4

Accepted Solution

by:
rlog earned 500 total points
ID: 34209134
Have you ever tried filemon.exe utility from MS (Sysinternals.com). In Cluster admin - set do not affect the group if SQL Agent fails. Move the cluster group over to the faulty node and the sql agent will fail. Start filemon and it logs all disk activity - start sqlagent (it will fail). Stop filemon and assess the file operations. Look for "File not found" and access denied.

You can see what files it tries to open or can't find. Try locating these files. I've come across misspelled path's in registry as well (sqlagent path) so correcting the path in a reg key has often done the trick.

If you're out of option you could uninstall the faulty node (start the installer on the active working node and uninstall the passive node). Once it's uninstalled you can either evict the node from the cluster and add it again (maybe a fresh install)? Once it's joined the cluster - you can install sql server from the active node (and service packs) on the passive node.
0
 
LVL 2

Author Comment

by:sdmarek
ID: 34209688
iv uased file mon many times (now rolled into proccess mon) so i can run that, then start messing around with the cluster if i cant find erros there.  one little thing, since the SQL DB is running (even w/o the fail over right now) and this is my production enviroment, im not going to take it off line untill next tuesday between 8pm-12am to be able to work on it (thats my weekly server maintenence window).  so, this ticket is gona sit for a little, id love any other tips you guys got, but not gona touch the running production enviroment till tuesday.  ill keep in touch when i do make changes and have the option to crash the DB for all i care... long as its running by 12.

thank you, talk to you guys by tuesday
-steven
0
 
LVL 2

Author Comment

by:sdmarek
ID: 34319927
sorry i havent been to this post, got distracted for the last couple weeks, i will not acualy be trying to get this done this weekend and will keep you posted

-steven
0
 
LVL 2

Author Closing Comment

by:sdmarek
ID: 34493328
we did have a read/write permission, the agents were not set to run as the same user on both nodes of the cluster, filemon showed that

thanks,
steven
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now