Solved

Problematic server... Intel Raid to another controller?

Posted on 2010-08-23
12
883 Views
Last Modified: 2013-11-14
This is a production SQL server. Windows 2003 R2 x64. Supermicro motherboard, 24gb ram, Quad-Xeon.
It has on-board Intel raid, configured to Raid 1 (for OS), and Raid 5 (for data).
Configuration is as follows:
Drive 0--RAID 1
Drive 1--RAID 1
Drive 2--RAID 5
Drive 3--RAID 5
Drive 4--RAID 5
Drive 5--RAID 5
All drives the same model Seagate SATA.

The Intel Raid Manager has been problematic. I found the manager to be quite problematic on regular workstations as well. Randomly, Intel Manager pops up with "raid degraded" message. This, however, can be quickly fixed by right-clicking on the drive (in the Intel Manager software), and selecting "Normal". Poof--and the problem is fixed, Intel proceeds to rebuild the drive. The same happens when Intel Raid Manager claims that a hard drive has failed. One click and it's online. (buggy software?) Also, spontaneous reboots have been occurring with this machine as well. I checked the logs--too quick for even Event Log to catch the problem, so no trace.

But this is not the main issue. I am trying to move this machine to a different RAID controller, Adaptec to be specific. Data on the server is crucial. Usually, in raid cases, I was always able to make an Acronis Image (.tib), and move it to another pre-configured raid. With this machine, however, even Acronis fails ("cannot load linux kernel" message comes up. CD is not scratched and works on all other raid machines).

Are there any other ways that I could move that machine from Intel to Adaptec controller, without loss of data? The Intel Raid Manager is a mess, and it's not reliable enough to be used further. Any suggestions?

Thank you very much!
0
Comment
Question by:94704
  • 5
  • 4
  • 2
  • +1
12 Comments
 
LVL 47

Expert Comment

by:dlethe
ID: 33498816
Specifically, what is make/model of disk, and what is the model of Adaptec controller?
0
 
LVL 20

Expert Comment

by:wolfcamel
ID: 33498891
i use storagecraft IT edition for this - you can get a 2 week license which is a bit cheaper than a full license.

0
 
LVL 20

Expert Comment

by:wolfcamel
ID: 33498896
the adaptec controller probably wont boot until you disable the onboard intel raid.
0
 
LVL 87

Accepted Solution

by:
rindi earned 500 total points
ID: 33498903
Try using paragon backup to create the image files. The advantage is that it boots to a WinPE environment which includes more drivers for RAID controllers, and if necessary you can add drivers via USB stick. There is also an adaptive restore option so you can restore the images to different hardware (I think in the new software it is included, but you can also get it as a standalone CD, free).

http://www.paragon-software.com/index.html
0
 

Author Comment

by:94704
ID: 33498920
@ Wolfcamel: I am aware of that. I'll try storagecraft and post results soon.
@ dlethe: ST3250310NS Seagate 250gb SATA
New RAID card would be ADAPTEC-SUPERMICRO AOC-LPZCR2 rev 3.00 (the card has been suggested by the manufactorer as 100% tested and approved for the motherboard)
0
 

Author Comment

by:94704
ID: 33498933
@ rindi: I'll give it a try, sounds like miracle software.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 47

Expert Comment

by:dlethe
ID: 33498955
You probably have a much bigger problem.  The symptoms are classic indication of TLER issue with the intel controller, assuming you are using consumer-class disks, and not the  enterprise/server drives.   Specifically the crux of the issue is that the consumer class disks go into a deep recovery cycle when they encounter a bad block.  This can take 10-30 secs, depending on specific model.  (The drive basically freezes and dedicates 100% to recovering the block).

 Unfortunately, the Intel controller only allows around 7 seconds before it figures the drive died, and so it kills it from the RAID set.  This is WHY you are having such a problem.   Bad block -> deep recovery -> disk "locks up" for too long -> controller thinks it died -> degraded RAID -> you manually reset & rebuild -> repeat.

Now MANY, but not all of the Adaptec RAID controllers also require enterprise drives.  So I would hate you to have to go through all of this trouble, and continue to see the issue.

You can read more about it here (among other related issues regarding disk/data reliability in general).
http://www.experts-exchange.com/Storage/Misc/A_2757-Disk-drive-reliability-overview.html

So my suggestion is to first step back, consider root cause before making changes.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 33498981
never mind, you have enterprise class.  But still this should not happen.  Make sure you don't have acoustic (quiet mode) turned on, as this affects timing and performance.  Also, if these are OEM, not retail labeled, they could have been programmed with different firmware to make them behave better in another config.

Run full diagnostics, including media verify.   What you are experiencing is still indication of an inherent drive issue.   Are this retail disks with standard firmware?
0
 

Author Comment

by:94704
ID: 33504178
@ dlethe: the disks are enterprise level, just as you noticed. We purchased them directly from Seagate, with specification for server use. Further, acoustic mode is not turned on, under no circumstances is there any setting set for "quiet" or "power-save" enabled. Also, the drives the the original manufactorer firmware.
0
 
LVL 47

Expert Comment

by:dlethe
ID: 33504317
Good acoustic should be off ... the other tunable parameter on enterprise drives that is sometimes enabled that messes things up is the power-saving "green" stuff.   Make sure they aren't going all tree-hugger on you and spinning down ;)

If you have NOT been running regular (weekly at least) data consistency checks, then please look at event log and do so. This is run within the firmware and it reads all blocks on all disks and looks for parity XOR errors as well as unreadable blocks and fixes them.  Drives go offline for a reason, and get a few consecutive bad blocks, and they will time out.  A rebuild will repair the blocks and move on.

Personally, I would take a downtime window and run extensive diags on disks through a NON-RAID controller.   It is foolish to throw drives at a new controller when you very well have bad drives.  the windows scandisk, and other tests won't run true diags, especially with that RAID controller in the way.
0
 

Author Comment

by:94704
ID: 33505970
Event logs doesn't not show anything unordinary. Sometimes a failed service at most.

Drives are not treehugger-friendly, so I don't suspect that they would spin down. Besides, the server is used constantly nearly 24/7, so I'd doubt that it would have the time to sleep the drives even if it did have treehugger option enabled.

The goal, at this moment, is to get it off the Intel raid. We bought enterprise-level WD5001AALS drives, just to make sure we don't trash the production drives. At this moment, even an Acronis backup messes up the Intel Raid and I have to rebuild after doing a simple backup.

 
0
 

Author Closing Comment

by:94704
ID: 33723522
Paragon software worked where Acronis failed miserably. Also, the software did have an option to inject new raid drivers to the OS, which allowed me to transfer from intel to adaptec raid.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Having issues meeting security compliance criteria because of those pesky USB drives? Then I can help you! This article will explain how to disable USB Mass Storage devices in Windows Server 2008 R2.
Moving your enterprise fax infrastructure from in-house fax machines and servers to the cloud makes sense — from both an efficiency and productivity standpoint. But does migrating to a cloud fax solution mean you will no longer be able to send or re…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now