We had a cluster service 'lost' for several minutes yesterday
(& this is a repeat incident): refer to attached screens.
What could be the cause of the issue?
Any resolution or workaround for this issue?
Currently the heartbeat goes thru Production network.
Will setting up dedicated heartbeat (ie a direct cross-
cable between the 2 member servers of the cluster
Or can we tune/tweak the heartbeat interval & the number
of missed heartbeat to make the cluster more resilient
ie to address this issue? If so, can point me to a link or
provide instructions on how to tune these?
Can this be due to differences in the various firmware
(UEFI, versions between the 2 member servers as IBM
found the versions to be different but can't pinpoint that
this issue is due to differences in firmware version of
the member nodes.
”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.
-Mike Kapnisakis, Warner Bros
With your subscription - you'll gain access to our exclusive IT community of thousands of IT pros. You'll also be able to connect with highly specified Experts to get personalized solutions to your troubleshooting & research questions. It’s like crowd-sourced consulting.
We can't always guarantee that the perfect solution to your specific problem will be waiting for you. If you ask your own question - our Certified Experts will team up with you to help you get the answers you need.
Our certified Experts are CTOs, CISOs, and Technical Architects who answer questions, write articles, and produce videos on Experts Exchange. 99% of them have full time tech jobs - they volunteer their time to help other people in the technology industry learn and succeed.
We can't guarantee quick solutions - Experts Exchange isn't a help desk. We're a community of IT professionals committed to sharing knowledge. Our experts volunteer their time to help other people in the technology industry learn and succeed.