Link to home
Start Free TrialLog in
Avatar of Wayne Barron
Wayne BarronFlag for United States of America

asked on

Need Quick Advice - Raid 5 - Rebuilding Array - Adaptec 6805T Controller, shows a drive grayed out, all drives are active

Hello All;

I was adding another VM in ESXi when I lost connection to it.
Looking over at the server, I noticed all the lights were off of the hard drives, except for 1 drive.
Which would be, I THINK, the 2nd drive. I really did not pay much attention to it, once I noticed the other lights were off.
I brought open the Remote Management console and shut down the server, and then booted it back up.
Going into the Adaptec 6805T Config properties, then into Manage Arrays.
The below screenshot is what I am looking at.
User generated image
The question(s) I have is as follows.
Q1: The grayed out drive, is this the drive that could potentially be bad?
Q2: They are out of sync in following the regular number pattern. (00,01,02,03)

When I connected the cables to the board. I connected like so.
1 - First (Top first drive)
2 - Second
3 - Third
4 - forth (the bottom drive of the second row, not really sure why I did that for?)
(I removed the cover and checked my cables and they are in fact connected as described above)

The rebuilding % went pretty quickly from 1 to 7, in about 10 minutes.
However, it has been on 7% for about, 10 minutes on its own. Which kind of worries me.
So, it might be that the greyed out drive is in fact bad.
Which in that case, I have two spares, I can replace it with.
I just need some guidance on this one.
Which drive would I remove from the tray and replace it?
And should I shut down the server first, or, keep it on, while it is still rebuilding, and do a hot-swap?

Wayne
Avatar of Dr. Klahn
Dr. Klahn

I have not dealt with a failure on this particular controller before; but if the controller thinks it can rebuild, I wouldn't attempt a swap until the array either rebuilds or the controller states that rebuilding failed.  Within a reasonable time, of course.  2700 gigabytes is a lot of data to process on a corrupt RAID set.

The controller's definiton of 7% may not be "I am 7% done with the whole thing", but rather something like "Out of 28 steps, I have completed two."
Avatar of Wayne Barron

ASKER

Thanks, Klahn.
I guess it decided to do this when I swapped the live sites over to the Maintenance Server.
Not even 20-minutes had passed by when it did this.

I will leave it alone for the night and see what is happening in the morning.
Hopefully, I don't lose anything.
But, the good thing about it is this.
I did a backup of all websites and databases the other day, and the primary DC is not in the Big Server, as it is in its own machine.
So, there is that.
And It would take about two days to build the VM network back up again, which I hope does not come to that.
But, we will have to wait and see what happens with this.
In case everything works out, don't replace drives, but add the 2 spare drives as hot spares.
Also, after ESXi boots, please take a look at the logs around the time it got stuck/reboot was needed (would be nice to know the reason it got stuck).
Also, if you need uptime, make sure your system is fully compatible (makes sure the card is compatible with your ESXi version), because if it is, you can continue booting to ESXi, and manage it further while the system is running (meaning, it would rebuild while everything was still running, performance hit depending on hardware specs). If it isn't, consider investing in a fully compatible solution, so you won't have downtime in the future (for example a series 7 or 8 card, if you have a newer ESXi version).
Yes, I installed the drivers for ESXi for the Raid card when I first jumped into doing this in 2018.
So, you are saying I can boot the system into ESXi, and it will not affect the rebuilding process?
The screenshot I supplied, that is where it is stuck at.
I tried to hit ESC, but it will not budge from the area it is in.
And still at 7%, going on an hour.
If the correct drivers AND correct compatible MANAGEMENT package from Adaptec is installed, you could've booted to ESXi first. The rebuilding process would already have started in the background while ESXi was running. If would not have affected it (except for a performance hit).
In your current state though, already at the 7% mark, I'm not so sure if it's wise to force a power down (though the HDD leds should give you an indication if there's really activity or not).
Come Mondays deadline though, these could be the next steps:
- force power down
- install 2 spares
- start ESXi
- Adaptec management, assign 2 new drives as hot spares. Hopefully in the GUI, it still states Dev00 as suspect/defective, go into the properties to get the serial number
- pray to God system is still running, let the rebuild decide what to do (rebuild again with 4 existing, or rebuild with one of the hot spares).
- if rebuild with 4 existing still doesn't work after a full day or so, you now have the serial number to physically remove the drive. Start ESXi again, the rebuild now probably WILL work with the new spares.
Lights are still busy.
I will give it until tomorrow when I wake.
If it is still at 7%, I will put the two spares in.

How will I be able to boot into ESXi, AND, Adaptec Management at the same time?
(or) are you telling me to boot into ESXi first and check the logs, and THEN, do the Adaptec Management?
One question that was not answered in this.
The grayed drive, would that be in fact, the FIRST drive?
It should be the drive with no activity showing on its activity LED.  Now which one that is in the rack, that's another question.  I'd expect it to be the far left or far right drive, but what I expect machts nicht.
All drives were showing activity.
They all just went down again, so I am going to put the two spares in and go from there.
I will post back a little later.
OK, I just probed the Topology of the Adaptec Controller.
And it shows me all six drives.
The first 4
CN0 : Dev 00 (Wire 1)
CN0 :Dev 01 (Wire 2)
CN0 :Dev 02 (Wire 3)
CN0 :Dev 03 (Wire 4)
The two extras.
CN1 : 00 (Wire 1)
CN1 : 01 (Wire 2)

We know now that it is going to fail after about an hour, and stop at 7%.
So it is safe to say, that the Drive that is in location 0 (First Drive)
Needs to be replaced.

So, what I am going to do. (As stated, if I screw up, I can build everything back up in a couple of days)
Is swap out the bad drive for a good drive and then see what happens.
I will then leave the extra drive in, for use of a hot spare.

So, that is the plan.
I will post back in a little while on what is going on.
Cannot choose the 1TB drives as a Global Host Spare.
It has to be greater than the other drive(s) of which it would take over.
I do not have any drives larger than the ones I have.
So, I am going to have to swap out the 0 drive with a good drive and HOPE that it is the right drive.
If not, I will have to continue until I find the bad drive.
I am in ESXi.
And looking through the DEVICE Manager log, one line told me to look in the VMKernel Log.
So, here is what I found.

2020-03-28T23:02:09.873Z cpu1:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:09.873Z cpu1:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:09.873Z cpu1:65594)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:09.873Z cpu1:65594)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:09.873Z cpu1:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:09.873Z cpu1:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:09.889Z cpu7:65871)WARNING: NetDVS: 681: portAlias is NULL
2020-03-28T23:02:11.429Z cpu4:65950)WARNING: ScsiPath: 7482: Adapter Invalid does not exist
2020-03-28T23:02:11.429Z cpu5:65952)WARNING: PCI: 1207: 0000:00:1d.7 is nameless
2020-03-28T23:02:11.574Z cpu0:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:12.328Z cpu4:65950)WARNING: ScsiPath: 7482: Adapter Invalid does not exist
2020-03-28T23:02:12.328Z cpu4:65950)WARNING: ScsiPath: 7482: Adapter Invalid does not exist
2020-03-28T23:02:12.329Z cpu4:65950)WARNING: ScsiPath: 7482: Adapter Invalid does not exist
2020-03-28T23:02:12.330Z cpu4:65950)WARNING: ScsiPath: 7482: Adapter Invalid does not exist
2020-03-28T23:02:12.340Z cpu1:65593)WARNING: ScsiUid: 274: Path 'vmhba33:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:12.340Z cpu5:65594)WARNING: ScsiUid: 274: Path 'vmhba34:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:12.341Z cpu1:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:12.341Z cpu5:65594)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:12.342Z cpu3:65593)WARNING: ScsiUid: 274: Path 'vmhba35:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:12.342Z cpu5:65594)WARNING: ScsiUid: 274: Path 'vmhba36:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:12.342Z cpu3:65593)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:12.343Z cpu5:65594)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:17.034Z cpu4:65871)WARNING: ScsiUid: 274: Path 'vmhba36:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:17.037Z cpu0:65871)WARNING: ScsiUid: 274: Path 'vmhba33:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:17.040Z cpu0:65871)WARNING: ScsiUid: 274: Path 'vmhba34:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:17.071Z cpu0:65871)WARNING: ScsiDeviceIO: 9480: Mode Sense cmd reported block size 0, does not match the current logical block size 512(with physical block size 512) for device.
2020-03-28T23:02:17.071Z cpu0:65871)WARNING: ScsiDeviceIO: 9482: The device mpx.vmhba2:C0:T1:L0 is marked format corrupt.
2020-03-28T23:02:17.073Z cpu0:65871)WARNING: ScsiUid: 274: Path 'vmhba35:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:34Z mark: storage-path-claim-completed
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: KcsRegPort_Init:86: ipmi: Error registering IO resource for KCS port address at 0xca2.. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: IpmiSysIntKcs_Init:769: ipmi: Failure to inialize KCS registers. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: IpmiDriver_Init:195: ipmi: Failed to initialize IPMI system interface. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: CreateIpmiDrivers:1245: ipmi: Failed to initialize IPMI driver. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: KcsRegPort_Init:86: ipmi: Error registering IO resource for KCS port address at 0xca2.. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: IpmiSysIntKcs_Init:769: ipmi: Failure to inialize KCS registers. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: IpmiDriver_Init:195: ipmi: Failed to initialize IPMI system interface. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: CreateIpmiDrivers:1245: ipmi: Failed to initialize IPMI driver. Error: Bad address range
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: SanityCheckRegs:111: ipmi: Reading the BT Control Register produced an invalid value: 0xFF
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: IpmiSysIntBt_Init:107: ipmi: Failed to initialize BT registers. Error: Failure
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: IpmiDriver_Init:195: ipmi: Failed to initialize IPMI system interface. Error: Failure
2020-03-28T23:02:29.096Z cpu5:66385)WARNING: ipmi: CreateIpmiDrivers:1245: ipmi: Failed to initialize IPMI driver. Error: Failure
2020-03-28T23:02:29.097Z cpu5:66385)WARNING: Elf: 3017: Kernel based module load of ipmi failed: Failure <Mod_LoadDone failed>
2020-03-28T23:02:29.909Z cpu1:66408)WARNING: FTCpt: 875: Using IPv4 address to start server listener
2020-03-28T23:02:41.450Z cpu2:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:41.450Z cpu2:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.628Z cpu2:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:41.628Z cpu2:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.628Z cpu3:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.762Z cpu3:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.763Z cpu0:65750)WARNING: ScsiUid: 274: Path 'vmhba33:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:41.765Z cpu3:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.766Z cpu5:65754)WARNING: ScsiUid: 274: Path 'vmhba34:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:41.767Z cpu3:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.769Z cpu0:65753)WARNING: ScsiUid: 274: Path 'vmhba35:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:41.770Z cpu3:66599)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:41.771Z cpu0:65752)WARNING: ScsiUid: 274: Path 'vmhba36:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:46.010Z cpu5:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:46.010Z cpu5:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.192Z cpu4:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 0: No connection
2020-03-28T23:02:46.192Z cpu4:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.192Z cpu4:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.322Z cpu4:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.324Z cpu5:65754)WARNING: ScsiUid: 274: Path 'vmhba33:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:46.325Z cpu5:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.327Z cpu3:65753)WARNING: ScsiUid: 274: Path 'vmhba34:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:46.330Z cpu5:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.331Z cpu2:65752)WARNING: ScsiUid: 274: Path 'vmhba35:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.
2020-03-28T23:02:46.333Z cpu5:67007)WARNING: xpt_scsi_adapter_discover:1224: unable to find target 1: No connection
2020-03-28T23:02:46.334Z cpu6:65751)WARNING: ScsiUid: 274: Path 'vmhba36:C0:T0:L0' : supports ANSI version 0x2 and a UID could not be extracted from the INQUIRY info. In order to be used with ESX a device must support the SCSI 3 protocol.

Open in new window

I hope this is helpful.

As for the ARRAY Build.
All lights are flickering like they are supposed to.
The drive I swapped out, is light up and has a slight flicker to it, just not as active as the other lights.
If it goes down again, I am going to swap out the cables.
It is at 21% now.
So that is awesome.
I will let it chug along throughout the night and check back in the morning.
But, so far, so good.
ASKER CERTIFIED SOLUTION
Avatar of Wayne Barron
Wayne Barron
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
How will I be able to boot into ESXi, AND, Adaptec Management at the same time?
(or) are you telling me to boot into ESXi first and check the logs, and THEN, do the Adaptec Management?
 
If you have the correct card/ESXi combo, it means you can start the Adaptec management tools after ESXi is fully booted.
Sometimes, it's just the command line (ARCCONF), sometimes it's a GUI (Adaptec or Maxview Storage Manager, either directly or through web interface). The 6805T seems to have the rights to install the GUI though. What I'm talking about IS NOT THE SAME as your screenshots. The whole point of this construction (to be able to manage your RAID array at all times) is ease of use, AND prevent unnecessary downtime.

In case you can't remember ever installing it, please do so now (can be installed on a seperate PC, as long as you're in the same network as the ESXi): https://storage.microsemi.com/en-us/speed/raid/storage_manager/msm_vmware_v2_05_22932_zip.php 

I am downloading it right now.
When I got up a little while ago, all the drive lights were off.
I booted into the Raid Controller, and everything looked good and was identical to the last screenshot I posted.
So, not sure what is going on. On rather, it is a cable issue or hardware issue.
Swapped out the cables, as the original cables were showing all lights on and not blinking, no activity.
Once I booted back up, and into ESXi, I started two of the VM's (Mail and Backup DC)
The lights started showing activity, and a few minutes later, the 01 drive (Second drive) stopped blinking.
I logged on to the DC to see if it would start showing Activity and nothing.
Lost Drive 01
Hot Swapped it out, with the last spare.
Going to place an order for some SAS drives today, in hopes they will be in by mid-week.

@Kimputer
I installed the Maxview - Web Client.
How do I access it?
I viewed a webpage that showed adding
192.168.2.101:8443
But that did not work.

OK, the one I am needing to install is.
Guest_OS/Windows_x64/Setup_maxView_GOS_x64.exe
After the installation, I could not find any icons to launch it with.
So I did a search and it was in the PUBLIC/DESKTOP
I grabbed the shortcut and added it to my VMware folder.
Now I am just trying to login to the system now, and it is not working.
So, I guess I will open a new TA about this issue, as I REALLY want to get this working.