Link to home
Start Free TrialLog in
Avatar of mylogo
mylogoFlag for United States of America

asked on

Exchange storage groups stuck on initializing

Have clustered Exchange servers. Security STIG's applied to 1 set. Mail is flowing but passive server unable to start cluster service. Storage groups no longer show healthy. All show Mounted and Initializing. I have tried dismounting, suspending and restoring database via the gui, restarting services, rebooting but nothing works.

Event viewer shows Event ID 1009 (Cluster service could not join an existing server cluster...). All services are up and running except for Cluster Service and Microsoft Exchange Information Store. IIS is up and running as well.

I have not hard booted this box and since today is a holiday, i'd like to get this fixed before tomorrow morning. Would appreciate any help anyone can provide.
Avatar of OctInv
OctInv
Flag of United Kingdom of Great Britain and Northern Ireland image

Are there any other errors in the event log? How may resources are there in the cluster? What has changed latley (was it working before), are any seervices not starting in the correct order (network services fro example, so the passive node can see the cluster).
Avatar of mylogo

ASKER

2 cluster servers (clust1 - active & clust2-passive)
it was working before the security STIG's applied to the Active server. STIG's applied to the active which forced the passive to then become the active. Once completed, i started the cluster service on the original active server (clust1).

as of 5:28pm CST, here's what i see:
Application Log: Event ID: 7005

I've attached some screenshots of additional things i see in the System log as well.
Storage-groups.docx
I'm not in the office or near a pc for the most of the day, if no-one else can help out I'll have a look later.


Avatar of mylogo

ASKER

Thank you. Ideally, i think i should've had the STIG's applied to both clustered servers rather than individually. I will try stopping the cluster service service on the Active server, reboot both servers, bring the active up first, start the cluster service (if doesn't start on its own) and then bring up the passive server. Maybe then, it might rejoin the cluster.

For anyone who might ask, here is the definition of a STIG: A Security Technical Implementation Guide or STIG is a methodology for standardized secure installation and maintenance of computer software and hardware. The term was coined by DISA who creates configuration documents in support of the United States Department of Defense (DoD). The implementation guidelines include recommended administrative processes and span over the lifecycle of the device.
An example where STIGs would be of benefit is in the configuration of a desktop computer. Most operating systems are ordinarily usable in a wide-range of environments. This leaves them open to easily being controlled by malicious people, such as identity thieves and computer hackers. Therefore, a STIG describes what needs to be done for minimizing network-based attacks and also for stopping system access if a computer criminal is next to the device. Lastly, a STIG may also be used to describe the processes and lifecycles for maintenance (such as software updates and vulnerability patching).
STIG is basically a methodology. Please can you explain how you implemented this on the cluster? What did you cange while observing your STIG?

Thanks.
What type od cluster is this (single copy, CCR etc)? I think loking at the screenshots, and without knowing your set up, that the node that is now passive no longer has the correct state of the cluster. You may want to remove this from the cluster then try re-adding it from the active node, using cluster manager. if you are using CCR do you have a file share witness server to maintain the state of the cluster?

Avatar of mylogo

ASKER

This is a CCR and you're right, the passive node no longer has the correct state. Should i evict the node first and then re-add it using Cluster Manager?
Avatar of mylogo

ASKER

Over the weekend I stopped both servers, (shut down) then powered the active back on allowing all of its servers to come back up, then powered on the passive. Cluster service service was missing from services and the node missing from Win Cluster Manager. But, it shows up in the cluster group. I continue to get a space error and "RPC server is unavailable". Node was not evicted so not sure why it didn't come up unless we had some other underlying issues with this server which went unnoticed before. This is my 1st dealings working with cluster service. I've decided i need to call MS for some help in getting the passive back to a healthy state and reattaching to the cluster. Need to get the cluster service reinstalled without having to reinstall Exchange.
ASKER CERTIFIED SOLUTION
Avatar of mylogo
mylogo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mylogo

ASKER

Continued work on my part to fix the issue.