A host of strange issues after drive replacement.


Back in November I had an issue with a failed drive in a mirror. After the drive was replaced, there have been a host of odd issues that leaves the server mostly functional. The details:

1. System drive is a mirrored and data drive is a RAID 5 array both on one hardware controller. One of the system drives failed and the server experienced a major slowdown in network response and service availability.

2. Server shut down, failed drive removed, and server rebooted with no noticeable issues.

3. Added replacement drive, reestablished mirror on controller, and brought system back up. Server was incredibly slow, Netlogon and other services failed to start, Explorer.exe failing whenever attempting to open any program or browse, server network adapter taking about 30 minutes to come up causing a host of AD error messages. Tried changing to secondary adapter, different cable, different switch port. No difference with each reboot. After I got the server back up I was able to start Netlogon and the Exchange services that failed and things seemed to be functioning normally after the fact. Next couple of reboots did not recreate the issue.

4. Periodically checking the logs occasionally show a failed CRC check for the system drive. Have to dig back to paste the specifics.

5. Last reboot exhibited the same problems as when I had installed the drive. What would normally take 20 minutes to reboot took two hours to get everything running again. Same issues with network adapter not coming up right away, Netlog and AD related services not starting due to network communications (lack of), Explorer hanging on anything. I've left the server up since then since this is the one and only server that runs all the applications the company uses. It's "functional mostly".

6. Current issues are:

- Server seems to be very slow to respond. Some programs will not open. Opening mstsc.exe on the server will show it as a process under task manager, but it never opens. Other programs will open but not be responsive. For example, I can open and start a backup, but the progress will sit at 0% indefinitely.

- There is not as much in the error logs as I'd hope there would be. These are the messages I am seeing repeatedly over the last day or two since reboot.
Event ID 4: The print spooler failed to reopen an existing printer connection because it could not read the configuration information from the registry key S-1-5-18\Printers\Connections. The print spooler could not open the registry key. This can occur if the registry key is corrupt or missing, or if the registry recently became unavailable.

Event ID 2501: Process MSEXCHANGEADTOPOLOGY (PID=1464). The site monitor API was unable to verify the site name for this Exchange computer - Call=DsctxGetContext Error code=8007077f. Make sure that Exchange server is correctly registered on the DNS server.

Event ID 2601: Process MSEXCHANGEADTOPOLOGY (PID=1464). When initializing a remote procedure call (RPC) to the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the SID for account <WKGUID=DC1301662F547445B9C490A52961F8FC,CN=Microsoft Exchange,CN=Services,CN=Configuration,...> - Error code=8007077f.
 The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.

EVENT ID 9782: Exchange VSS Writer (instance 4aefa16d-ee52-4885-91e8-9794b66074c5:1) has unsuccessfully completed the backup of storage group 'First Storage Group'. No log files have been truncated for this storage group.

EVENT ID 10016: The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID
 to the user NT AUTHORITY\NETWORK SERVICE SID (S-1-5-20) from address LocalHost (Using LRPC). This security permission can be modified using the Component Services administrative tool.
I have no idea where to start. Some things seem to point to DNS, though DNS seems to be functioning correctly on the server. Others make me wonder if the failed drive corrupted the mirror before it went. I'm wondering if this is fixable or if I should be looking at restore/rebuild over the weekend. Sorry there is not a lot of detail there, but it's the best i can get posted for now.
Who is Participating?
DavidConnect With a Mentor PresidentCommented:
same make/model doesn't guarantee same firmware settings.  Example, the HDD write cache is most likely disabled on Dell firmware, because that is default.  But HDD write cache may very well be enabled on the one you bought.  THere are dozens (well hundreds) of configurable parameters.  (# of retry and ECC-related settings are also important and thresholds on when to give up are vital)

So example, The dell firmware typically sets disks to give up after 2-3 seconds, and most controllers allow up to 7 before they think a disk died. What if the programmable firmware setting on a disk is 8 seconds?   (Factory default on Seagate constellation SAS is typically 13 retries x 100 milliseconds + allowance per retry. Just know that off top of my head,   But who knows what this other disk is set to.  It could give up too fast or to late.

Anyway, this is the danger if mixing & matching just any drive in the same RAID set.  Operational parameters are different in the same RAID group and that is just bad on so many levels.
Mandr1chAuthor Commented:
Working on the backup part of it now. WBADMIN shows the following when attempting to backup:

Consistency Check Failed for component ffb241c3-cf88-449c-a50b-12879fd622a5 (Mic
rosoft Exchange Server\Microsoft Information Store\BPCSBS\ffb241c3-cf88-449c-a50
Backup of application Exchange failed.
Detailed Error: Class not registered
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).

The Exchange message store was located on the mirror when the mirror failed. Since then i had moved it to the RAID array along with the logs. Exchange is functional from the user's standpoint.
Are you using enterprise class, or cheap desktop drives.  If the latter, then that will almost always be root cause.
Simplify Active Directory Administration

Administration of Active Directory does not have to be hard.  Too often what should be a simple task is made more difficult than it needs to be.The solution?  Hyena from SystemTools Software.  With ease-of-use as well as powerful importing and bulk updating capabilities.

Mandr1chAuthor Commented:
Thanks for the response. Enterprise class or at least it should be. The replacement drive was no longer available from the manufacturer (Dell) though the server is only 1.5 years old, but the replacement sourced from third party was same make and model number as the original.
Mandr1chAuthor Commented:
Ah. I didn't know that. Next step seems to be to pull the replacement drive and see if these issues are still evident after reboot. I'll give it a try this weekend and reply back. Thank you.
No prob. Note there is software you can buy to view/edit the settings on both the Dell and non-Dell disks.  Google the string, "Mode Page Editor".  But you need a JBOD controller to hook up to the disks temporarily, and any firmware build is free to hard-code a parameter to a certain value or range that you may need to change to match the other disks.  

So if you aren't a storage pro and can't justify the expense and time to get over the learning curve, then just buy a replacement Dell disk with same make/model/firmware rev.

Be sure to do a data consistency check/repair before yanking the drive. This will clean up any problems you have before you degrade the system.  Otherwise you could end up with some data loss.
Mandr1chAuthor Commented:
Just an update. I've been waiting to get the okay to bring the server down. Finally scheduled for this weekend. I will be able to follow up with results then.
Mandr1chAuthor Commented:
That was exactly it. Thank you!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.