Solved

A host of strange issues after drive replacement.

Posted on 2013-01-17
8
456 Views
Last Modified: 2013-02-12
Hello,

Back in November I had an issue with a failed drive in a mirror. After the drive was replaced, there have been a host of odd issues that leaves the server mostly functional. The details:

1. System drive is a mirrored and data drive is a RAID 5 array both on one hardware controller. One of the system drives failed and the server experienced a major slowdown in network response and service availability.

2. Server shut down, failed drive removed, and server rebooted with no noticeable issues.

3. Added replacement drive, reestablished mirror on controller, and brought system back up. Server was incredibly slow, Netlogon and other services failed to start, Explorer.exe failing whenever attempting to open any program or browse, server network adapter taking about 30 minutes to come up causing a host of AD error messages. Tried changing to secondary adapter, different cable, different switch port. No difference with each reboot. After I got the server back up I was able to start Netlogon and the Exchange services that failed and things seemed to be functioning normally after the fact. Next couple of reboots did not recreate the issue.

4. Periodically checking the logs occasionally show a failed CRC check for the system drive. Have to dig back to paste the specifics.

5. Last reboot exhibited the same problems as when I had installed the drive. What would normally take 20 minutes to reboot took two hours to get everything running again. Same issues with network adapter not coming up right away, Netlog and AD related services not starting due to network communications (lack of), Explorer hanging on anything. I've left the server up since then since this is the one and only server that runs all the applications the company uses. It's "functional mostly".

6. Current issues are:

- Server seems to be very slow to respond. Some programs will not open. Opening mstsc.exe on the server will show it as a process under task manager, but it never opens. Other programs will open but not be responsive. For example, I can open and start a backup, but the progress will sit at 0% indefinitely.

- There is not as much in the error logs as I'd hope there would be. These are the messages I am seeing repeatedly over the last day or two since reboot.
-----------------------------------------------------------------------------------------------
Event ID 4: The print spooler failed to reopen an existing printer connection because it could not read the configuration information from the registry key S-1-5-18\Printers\Connections. The print spooler could not open the registry key. This can occur if the registry key is corrupt or missing, or if the registry recently became unavailable.

Event ID 2501: Process MSEXCHANGEADTOPOLOGY (PID=1464). The site monitor API was unable to verify the site name for this Exchange computer - Call=DsctxGetContext Error code=8007077f. Make sure that Exchange server is correctly registered on the DNS server.

Event ID 2601: Process MSEXCHANGEADTOPOLOGY (PID=1464). When initializing a remote procedure call (RPC) to the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the SID for account <WKGUID=DC1301662F547445B9C490A52961F8FC,CN=Microsoft Exchange,CN=Services,CN=Configuration,...> - Error code=8007077f.
 The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.

EVENT ID 9782: Exchange VSS Writer (instance 4aefa16d-ee52-4885-91e8-9794b66074c5:1) has unsuccessfully completed the backup of storage group 'First Storage Group'. No log files have been truncated for this storage group.

EVENT ID 10016: The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID
{61738644-F196-11D0-9953-00C04FD919C1}
 to the user NT AUTHORITY\NETWORK SERVICE SID (S-1-5-20) from address LocalHost (Using LRPC). This security permission can be modified using the Component Services administrative tool.
-----------------------------------------------------------------------------------------
I have no idea where to start. Some things seem to point to DNS, though DNS seems to be functioning correctly on the server. Others make me wonder if the failed drive corrupted the mirror before it went. I'm wondering if this is fixable or if I should be looking at restore/rebuild over the weekend. Sorry there is not a lot of detail there, but it's the best i can get posted for now.
0
Comment
Question by:Mandr1ch
  • 5
  • 3
8 Comments
 

Author Comment

by:Mandr1ch
ID: 38789861
Working on the backup part of it now. WBADMIN shows the following when attempting to backup:

Consistency Check Failed for component ffb241c3-cf88-449c-a50b-12879fd622a5 (Mic
rosoft Exchange Server\Microsoft Information Store\BPCSBS\ffb241c3-cf88-449c-a50
b-12879fd622a5)
Backup of application Exchange failed.
Detailed Error: Class not registered
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
Running backup of volume DATAPART2(D:), copied (0%).
....

The Exchange message store was located on the mirror when the mirror failed. Since then i had moved it to the RAID array along with the logs. Exchange is functional from the user's standpoint.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 38790206
Are you using enterprise class, or cheap desktop drives.  If the latter, then that will almost always be root cause.
0
 

Author Comment

by:Mandr1ch
ID: 38790221
Thanks for the response. Enterprise class or at least it should be. The replacement drive was no longer available from the manufacturer (Dell) though the server is only 1.5 years old, but the replacement sourced from third party was same make and model number as the original.
0
 
LVL 47

Accepted Solution

by:
dlethe earned 500 total points
ID: 38790348
same make/model doesn't guarantee same firmware settings.  Example, the HDD write cache is most likely disabled on Dell firmware, because that is default.  But HDD write cache may very well be enabled on the one you bought.  THere are dozens (well hundreds) of configurable parameters.  (# of retry and ECC-related settings are also important and thresholds on when to give up are vital)

So example, The dell firmware typically sets disks to give up after 2-3 seconds, and most controllers allow up to 7 before they think a disk died. What if the programmable firmware setting on a disk is 8 seconds?   (Factory default on Seagate constellation SAS is typically 13 retries x 100 milliseconds + allowance per retry. Just know that off top of my head,   But who knows what this other disk is set to.  It could give up too fast or to late.

Anyway, this is the danger if mixing & matching just any drive in the same RAID set.  Operational parameters are different in the same RAID group and that is just bad on so many levels.
0
 

Author Comment

by:Mandr1ch
ID: 38790445
Ah. I didn't know that. Next step seems to be to pull the replacement drive and see if these issues are still evident after reboot. I'll give it a try this weekend and reply back. Thank you.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 38793476
No prob. Note there is software you can buy to view/edit the settings on both the Dell and non-Dell disks.  Google the string, "Mode Page Editor".  But you need a JBOD controller to hook up to the disks temporarily, and any firmware build is free to hard-code a parameter to a certain value or range that you may need to change to match the other disks.  

So if you aren't a storage pro and can't justify the expense and time to get over the learning curve, then just buy a replacement Dell disk with same make/model/firmware rev.

Be sure to do a data consistency check/repair before yanking the drive. This will clean up any problems you have before you degrade the system.  Otherwise you could end up with some data loss.
0
 

Author Comment

by:Mandr1ch
ID: 38839178
Just an update. I've been waiting to get the okay to bring the server down. Finally scheduled for this weekend. I will be able to follow up with results then.
0
 

Author Closing Comment

by:Mandr1ch
ID: 38882204
That was exactly it. Thank you!
0

Join & Write a Comment

When you upgrade from Windows 8 to 8.1 or to Windows 10 or if you are like me you are on the Insider Program you may find yourself with many 450MB recovery partitions.  With a traditional disk that may not be a problem but with relatively smaller SS…
Disabling the Directory Sync Service Account in Office 365 will stop directory synchronization from working.
Windows 8 came with a dramatically different user interface known as Metro. Notably missing from that interface was a Start button and Start Menu. Microsoft responded to negative user feedback of the Metro interface, bringing back the Start button a…
With the advent of Windows 10, Microsoft is pushing a Get Windows 10 icon into the notification area (system tray) of qualifying computers. There are many reasons for wanting to remove this icon. This two-part Experts Exchange video Micro Tutorial s…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now