Akulsh
asked on
SQL Cluster Resource Group 'Failed' but Cluster working fine
Our SQL cluster, for many months, has "‘Failed” status under Roles > Resource Group. But the cluster is working fine!
In the System event log, every hour, there is an Event ID 1069 error (Source: FailoverClustering): "Cluster resource 'SQL Server CEIP (MSSQL2016)' of type 'Generic Service' in clustered role 'RsrcGrp-MSSQL2016' failed." I have taken corrective steps given in MSKB: #883732, but it has made no difference.
On running Cluster Validation tests, we see only a Warning. It simply repeats what is in the event log.
BTW, this cluster is based on 2 VMs in VMware infrastructure.
How may I get rid of Failed state of Resource group.
SQL-Cluster-ResourceGrp-Failed.JPG
In the System event log, every hour, there is an Event ID 1069 error (Source: FailoverClustering): "Cluster resource 'SQL Server CEIP (MSSQL2016)' of type 'Generic Service' in clustered role 'RsrcGrp-MSSQL2016' failed." I have taken corrective steps given in MSKB: #883732, but it has made no difference.
On running Cluster Validation tests, we see only a Warning. It simply repeats what is in the event log.
BTW, this cluster is based on 2 VMs in VMware infrastructure.
How may I get rid of Failed state of Resource group.
SQL-Cluster-ResourceGrp-Failed.JPG
Try running the SQL installation and selecting "Repair" and point it at that specific DB.
ASKER
Thanks Coolie. Will it cause any interruption in the functionality of the SQL server?
Yes, it will more than likely restart that instance so you're going to want to schedule some downtime.
ASKER
Not sure when management would allow that.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
The Customer Experience Improvement Service is designed to help Microsoft improve its products over time. This program collects information about computer hardware and how people use the product, without interrupting the users in their tasks at the computer. The information that is collected helps Microsoft identify which features to improve.
If you want to disable them, you can stop them in the Service Manager. Or you can set all CEIP registry keys to 0. If you want to change the registry keys, please back up the registry firstly.
CEIP is not mandatory to run but you can start those service also.
If you want to disable them, you can stop them in the Service Manager. Or you can set all CEIP registry keys to 0. If you want to change the registry keys, please back up the registry firstly.
CEIP is not mandatory to run but you can start those service also.
ASKER
65td,
Yes, SQL CEIP is being used. Is the error "Cluster resource 'SQL Server CEIP (MSSQL2016)' of type 'Generic Service' in clustered role 'RsrcGrp-MSSQL2016' failed." solely due to this CEIP feature?
Your link for deactivating CEIP is good. I will follow its instructions.
BTW, only the standby node has been started since the corrective actions of KB883732. However, I doubt if reboot would help as I have since noticed that one of our other SQL cluster server does not have those two registry entries (suggest by KB883732) but is working fine.
Thanks.
Rafa, thanks to you also for explaining what CEIP is.
Yes, SQL CEIP is being used. Is the error "Cluster resource 'SQL Server CEIP (MSSQL2016)' of type 'Generic Service' in clustered role 'RsrcGrp-MSSQL2016' failed." solely due to this CEIP feature?
Your link for deactivating CEIP is good. I will follow its instructions.
BTW, only the standby node has been started since the corrective actions of KB883732. However, I doubt if reboot would help as I have since noticed that one of our other SQL cluster server does not have those two registry entries (suggest by KB883732) but is working fine.
Thanks.
Rafa, thanks to you also for explaining what CEIP is.
ASKER
65td,
I found that on both nodes, SQLTELEMETRY$MSSQL2016 servoce was already Disabled & Stopped. Other 2 services (SQL Analysis Services CEIP, SQL Server Integration Services CEIP service 13.0) were not found.
The registry entries had to be modified. The Resource Group CEIP still does not come online.
Do I need to reboot the servers after registry modifications for new registry values to go into effect? Thanks.
AKK
I found that on both nodes, SQLTELEMETRY$MSSQL2016 servoce was already Disabled & Stopped. Other 2 services (SQL Analysis Services CEIP, SQL Server Integration Services CEIP service 13.0) were not found.
The registry entries had to be modified. The Resource Group CEIP still does not come online.
Do I need to reboot the servers after registry modifications for new registry values to go into effect? Thanks.
AKK
Hello,
If you have already made all the changes at the registry level, and do not observe changes, I recommend that you restart the server.
Regards...
If you have already made all the changes at the registry level, and do not observe changes, I recommend that you restart the server.
Regards...
ASKER
After reboot of both nodes, the error event ID 1069 remains. I have searched thru registry and under CPE keys, CustomerFeedback=0 and EnableErrorReporting=0 are set. However, my servers have only 2 instances of CPE -- under MSSQL**.<instance> and MSRS**.<instance>. We don't have MSAS**.<instance>.
Please advise. Thanks.
Please advise. Thanks.
Yes a restart is required.
ASKER
Restart has not helped, as mentioned earlier.
Try this rom a command prompt, type "cluster log /g" and then examine the C:\Windows\Cluster\Reports \cluster.l og
ASKER
I installed cluster.exe tool and ran the command. The resulting log is 198 MB. I searched for "failed" in it and saw just repeat of Event log errors. What should I look for? Should I attach the log? Thanks.
please attach log.
ASKER
Please review the following:
[System] 00000b68.0000228c::2019/01 /30-10:40: 36.822 ERR Cluster resource 'SQL Server CEIP (MSSQL2016)' of type 'Generic Service' in clustered role 'RsrcGrp-MSSQL2016' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
[System] 00000b68.00000b10::2019/01 /30-10:40: 37.320 ERR Cluster resource 'SQL Server CEIP (MSSQL2016)' of type 'Generic Service' in clustered role 'RsrcGrp-MSSQL2016' failed.
[System] 00000b68.0000228c::2019/01
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
[System] 00000b68.00000b10::2019/01
ASKER
I was seeing all this in Event logs also but no way to know what exactly is broken. Thanks.
Have you reviewed the failure policies for the resource and role, via cluster manager?
ASKER
65td,
I really appreciate you sticking with me on this.
If by 'Failure policies' you mean the Policies and Advanced Policies tabs in the Properties, then yes I have seen them but they are all at default -- same as in the other two SQL clusters that I manage. Thanks.
I really appreciate you sticking with me on this.
If by 'Failure policies' you mean the Policies and Advanced Policies tabs in the Properties, then yes I have seen them but they are all at default -- same as in the other two SQL clusters that I manage. Thanks.
https://mssqlfun.com/2018/04/26/sql-server-2016-sql-server-telemetry-ceip-services/
How is another site to review.
How is another site to review.
ASKER
Dear 65td,
On our SQL cluster, I do not find 'SQLTELEMETRY' services, only 'SQL Server CEIP service (MSSQL2016)' service, which is disabled. All those registry entries are already '0'. Thanks for sticking with me.
On our SQL cluster, I do not find 'SQLTELEMETRY' services, only 'SQL Server CEIP service (MSSQL2016)' service, which is disabled. All those registry entries are already '0'. Thanks for sticking with me.
ASKER
Dear 65td,
I enabled 'SQL Server CEIP service (MSSQL2016)' service, which was somehow disabled, and all is well now. Thanks for sticking with me. I will give you credit. Thanks.
I enabled 'SQL Server CEIP service (MSSQL2016)' service, which was somehow disabled, and all is well now. Thanks for sticking with me. I will give you credit. Thanks.
ASKER
I enabled 'SQL Server CEIP service (MSSQL2016)' service, which was somehow disabled, and all is well now. Thanks for sticking with me. I am giving credit to those who tried to help me. Thanks.
Good to hear.