We help IT Professionals succeed at work.

Upgrade of ESXi 6.5 - no longer can see internal server RAID drives

Stan J
Stan J asked
on
117 Views
Last Modified: 2020-07-16
We have an incident where after upgrading our Supermciro to the latest 6.5 ESXi release where ESXi is not seeing the internal SAS drives.

Internal SAS controller is not seeing the drives.  

Prior to the ESXi update, vCenter was showing under Storage Adapters, MegaRAID SAS Invader Controller with Storage Device Local AVAGO disk.

I booted the ESXi server and went into the RAID utility.
All of the drives show on-line and active with no failures.

I checked with the vendor and we tried a new backplane riser card.  
Did not help.  He said if the drives are seen by the controller, then he thinks the problem is not a hardware issue.


He thinks there is some disconnect in the ESXi somewhere …especially since the only two hardware pieces it could be at this point are backplane and motherboard.

However, the internal SATA drives are working fine, so that really rules out the backplane and motherboard.

Has anyone seen anything like this after upgrading ESXi 6.5?


Anyone see something similar?

 

thanks
Comment
Watch Question

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Has anyone seen anything like this after upgrading ESXi 6.5?


Anyone see something similar?

Yes, always happens from VMware have DROPPED a device from the HCL.

I would check with VMware HCL and SuperMicro for support of 6.5 on this server, or the update used.

Has the version of the driver changed, between versions of 6.5.
Louis LIETAERSystem Infrastructure Architect
CERTIFIED EXPERT

Commented:
It may be a disk driver issue, have check hardware compatibly with esxi 6.5?
Stan JVirtualization Engineer

Author

Commented:
yep,,already moving that direction...

possibly a driver update is needed
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
That is a discussion with the Vendor, and check VMware HCL.

OR ROLLBACK

If the datastore is MISSING.

Can you see the devices (or LUN if it's an raid array)

How was it upgraded ?
Stan JVirtualization Engineer

Author

Commented:
yes,,already contacted the vendor..

It Avago is on the HCL...
I see the below at the VMware download site, which looks to be newer than what is currently installed.
I would have to check next time i am back on-site

VMware ESXi 6.5
  lsi-mr3 7.708.09.00-1OEM
 SAS Driver for Avago Megaraid SAS 12Gbps Based SAS Adapters - 5/27/2019

the server was delivered at least 3 years ago, so an update is due.
hopefully, this will correct the issue
Louis LIETAERSystem Infrastructure Architect
CERTIFIED EXPERT

Commented:
hopefully
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
In addition to checking driver you may also want to check firmware as well.  I have seen firmware being an issue on older raid controllers and HBAs.  Check the HCL for the specific Megaraid controller to ensure that it is supported.

Please check the MegaRaid controller to ensure that the raid is still intact.  If that's the case you should be able to roll back to the previous version.  Just in case verify your backups, if backups are good you can continue to troubleshoot, otherwise rollback should be your primary choice.


Stan JVirtualization Engineer

Author

Commented:
yes..waiting  on the vendor...will be looking at driver, firmware, and BIOS

thanks 
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Should always before the FIRST procedure before upgrading/updating.
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
Supermicro is typically stable, considering how many storage products run on them. What MegaRaid model do you have installed?
Stan JVirtualization Engineer

Author

Commented:
The vendor wants to verify if it was the update before applying the updated drivers and firmware.

So, to revert back, I know the process as defined here https://kb.vmware.com/s/article/1033604 will  get back to the previous version and would work since I used CLI and the vib update command.

However, based on what i recall, the ESXi host was at build 13932383 (U3).
I applied 14320405 (3a) 
Then, 15256549 at a later date.

So, it it possible to run the revert consecutive time to go from build  15256549  back to  13932383 (U3)?

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
You should be able to revert, but sometimes it does not.

It is not designed as a toggle, to switch between builds....

but the method to which ESXi is designed, is there are TWO bootbanks on the installation which exists in two different partitions, which are defined in the boot.cfg.

When you update ESXi...

if you are currently running from bootbank1, bootbank2 is updated, on next boot bootbank2 is selected to boot from, on revert boot bank 1 is selected. (but permanently written...)

Do you follow ?

So to toggle you would have to edit the boot.cfg

look at

/bootbank/boot.cfg

and

/altbootbank/boot.cfg

these should be different to rollback.
Stan JVirtualization Engineer

Author

Commented:


The model of the RAID is ... MegaRAID SAS Invader Controller

i will not be back on-site until possibly Wednesday to check,,
However, if i follow the revert sequence, below is a summary and it looks like i cannot get back to the version of ESXi 6.5 that did not produce the issue

Bootbank  3/1/2020
     ESXi 6.5 U2c (8294253)  -- no RAID issue
Altbootbank
      Empty
 
Upgrade
Bootbank  3/20/2020
    ESXi 6.5 U3 (13932383)   --  RAID issue
Altbootbank 3/1/2020
    ESXi 6.5 U2c (8294253)   -- no RAID issue
 
Update
Bootbank  4/30/2020
   ESXi 6.5 patch 4 (15256549)  --  RAID issue
Altbootbank 3/20/2020
   ESXi 6.5 U3 (13932383)  --  RAID issue
 

if the above is accurate, i cannot get back and either have to go with one of the two options below?

1.  re-install ESXi 6.5 U2c (8294253)
2.   update the firmware and drivers for MegaRAID SAS Invader Controller 
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
You can only rollback to a previous version.

So, it it possible to run the revert consecutive time to go from build  15256549  back to  13932383 (U3)?

No.

Re-install. (not an upgrade) and select Preserve VMFS (do not overwrite).

and then re-configure from screenshots, and you'll be done, register Vms with the Inventory.
Stan JVirtualization Engineer

Author

Commented:

If i do the re-install, do i lose the folder structure defined in vCenter that house all of the VMs on that ESXi host?

There are over 150 VMs , so that would take a long time to register

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
If i do the re-install, do i lose the folder structure defined in vCenter that house all of the VMs on that ESXi host?

stored in vCenter Server DB, nothing to do with ESXi compute.

are these 150 VMs stored on shared storage or local datastore ?

do you not have vMotion to move VMs to other hosts ?
Stan JVirtualization Engineer

Author

Commented:
for the 150 Vms
some  or most are on NetApp Shared storage.

If I recall, the ESXi local storage has about 20 of the VMs.  
I would need to check when i get back in to the lab this week.
These are the ones showing as unreachable from vCenter

Also, Update 3 release notes show

   VMW_bootbank_lsimr3_7.708.07.003vmw.650.3.96.13932383

This patch updates the lsi-mr3 VIB.
•   Update to the lsi_mr3 driver
        The lsi_mr3 driver is updated to version 7.708.07.00-3.


I assume this is the issue and 6.5 U2 had a different lsi_mr3 driver  ?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Migrate all the VMs off - simple.

Rebuild from scratch.
Stan JVirtualization Engineer

Author

Commented:
Do you mean migrate the 20 or so VMs from internal storage to NAS
or
Mirgrate VMs to another ESXi host


Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
MIGRATE EVERYTHING.

or are the internal VMs currently missing ?
Stan JVirtualization Engineer

Author

Commented:
the internal VMs on the RAID drive that is mo seen by ESXi
so, i can't move them.


I will look at updating the  SAS RAID driver.

What driver is required for the Broadcom/Avago/LSI 3108 based controller ?
      Vendor pointed me to VMware ESXi 6.5 lsi-mr3 7.712.05.00-1OEM 

However, U3 release notes say,,,
,,This patch updates the lsi-mr3 VIB.
•   Update to the lsi_mr3 driver
        The lsi_mr3 driver is updated to version 7.708.07.00-3.

How would moving to ls i-mr3 7.712.05.00-1OEM correct the issue id the current is 7.708.07.00-3 or something different?

this command should list the current driver?
# vmkload_mod -s mptspi |grep Version
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
LSI MegaRAID SAS invader controller uses lsi_mr3 module, but it could have been using the  Legacy megaraid driver!!!

Do you know which ?

In that case

1. Move ALL VMs from this host that can be migrated. That leaves no VMs present.

2. Document the build, by taking screenshots.

3. Re-install the build you require when it was all working from ISO or CDROM.

4. On installation of the INSTALL, select Preserve VMFS, do not overwrite.

5. On reconfigure, add 20 VMs local from the storage back to inventory.

You are then done, and back to where you were.
Stan JVirtualization Engineer

Author

Commented:
unfortunately, the other host is not available to our project, so we cannot migrate VMs.



Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
So if you have 120 VMs on this currently..

they are going to need to be registered at sometime to this host after re-install.

I'm surprised when upgrading you didn't notice that storage had disappeared, to immediately roll back.
Stan JVirtualization Engineer

Author

Commented:
i have updated other SM servers and did not see this issue
i didn't notice the storage issue as there were no alerts and didn't see it until i went to the storage view in vCenter

it sounds like i need to install the older megaraid driver
the other esxi server has the same exact setup, but has not been updated.

i should be able to check the driver version with something like......
   esxcfg-scsidevs -a
   vmkload_mod -s driver-name | grep Version

i would then need to get that driver and load it?

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
esxcli software vib install -v="/vmfs/volumes/DATASTORENAME/scsi-megaraid-sas_6.610.15.00-1OEM.600.0.0.2494585.vib"
Stan JVirtualization Engineer

Author

Commented:
ok,,,where is the  csi-megaraid-sas_6.610.15.00-1OEM.600.0.0.2494585.vib   ?

vib install
or
vip update
or
profile update

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
an example, download it, get it from another host etc

as written.
Stan JVirtualization Engineer

Author

Commented:
so that driver is an example from here , https://thevirtualist.org/replacing-driver-for-megaraid-sas-9361-8i-on-esxi-6/ 

i can check the other host, but where can i get the vib 
Stan JVirtualization Engineer

Author

Commented:
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
You can also grab the driver which is in use from another host which you have working.

I would make sure that the driver you use is the same version, if you have the same exact hardware, it makes sense to use the exact same in from another server.
Stan JVirtualization Engineer

Author

Commented:
ok,

where would the driver be located on the other esxi host..?

is the driver in vib format on the other host?

 how is it installed to replace the current lsi driver
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
it's located in the bootbank.

But it's already installed, the vib is an installation bundle.

The easiest method, is to find the driver your working hosts are using... and grab a vib from internet or vendor.

I detail the process here

http://andysworld.org.uk/2011/09/20/tweaking-esxi-50-adding-un-supported-hardware-to-vmware-vsphere-esxi-50-adding-a-qle-220-to-esxi-50/

it's long and complicated and not for the faint hearted.

Summary.

1. Re-install from scratch the actual ESXi build that worked. Register those missing VMs on the local datastore which is missing.

2. Check the version currently installed on working hardware, obtain same driver (VIB) from vendor website and install as above.
Stan JVirtualization Engineer

Author

Commented:
The admin does not recall the root account password and has to check with his supervisor, so he can't issue the commands to find the driver on the other host.

I can see this info on the other ESXi server in vcenter and the below info for the internal SAS
vmba2     MegaRaid SAS Invader Controller     5003048018cffe02
Local Avago Disk    naa.6003048018cffe02223170808c15ee

The other ESXi  server is running ESXi build 5969303 (Update 01)
I downloaded Update 01 and extracted the contents.
Under vib20, in the lsi_mr3 folder, I see the below:
    VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib
But, i cannot check the other host yet.

Looking at the HCL and plugging in Avago, SAS, 6.5U1, i get to the below driver
lsi_mr3 version 6.912.12.00-1OEM
  which unzips to,,
lsi-mr3-6.912.12.00-1OEM.650.0.0.4240417.x86_64.vib

i could not find the  lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib 
i also could not locate the driver at the Broadcom site.

Closest

lsi_mr3 version 6.912.12.00-1OEM
lsi_mr3 version 6.913.05.00-1OEM
lsi_mr3 version 6.913.06.00-1OEM


Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
The admin does not recall the root account password and has to check with his supervisor, so he can't issue the commands to find the driver on the other host.

oh! that's a croc of ****!

You need to establish the version on a working version of ESXi, assuming all the hardware is the same, and if you didn't install any additional drivers, then the driver in that version of ESXi should work.

if you look under Host > Configuration > Software Packages

it will list all the packages and drivers installed, with versions.

So if you have access to that host in vCenter Server, you will be able to find the version info.

In one of the later versions of ESXi 6.5 it's 7.708.07.00-3vmw.650.3.96.13932383

In July 2019, there was a release and change to your driver of enhancements...

lsi-mr3      7.708.07.00-3vmw.650.3.96.13932383      VMW      Updates the ESX 6.5.0 lsi-mr3      enhancement      important      ESXi650-201907209-UG


You are probably going to have to discuss with VMware and Supermicro/Av. what changes were made, and do you have to update firmware on your HBA.

if this is the issue.

because moving forward to a current version of 6.5 may cause you issues.
Stan JVirtualization Engineer

Author

Commented:
i am working on getting a support call opened.

in VC, i do not see an option  to Host > Configuration > Software Packages
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Host > Configuration > System > Packages

This is via the web client (HTML5)

You should always apply new firmware updates before updating ESXi to later builds.

It is strange that this HBA has disappeared off the HCL, or it's not been updated with this driver version.
Stan JVirtualization Engineer

Author

Commented:
yes...this whole event is strange,,,i never have seen this when updating servers ..
even if the firmware was updated, this drive issue may still have existed
it will be interesting to see what VMware support has to offer!

i will check the software listing tomorrow morning and report back
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
is the actual hba missing from storage devices ?
Stan JVirtualization Engineer

Author

Commented:
i still do not see Host > Configuration > System > Packages in vCenter using  HTML5?

in vCenter, under host > configure > Storage Adapter
the vmhba2 is not displayed

it is displayed for the other host

the admin is here and ran some commands to check the driver info on the other host
i will try to get that info posted

....in the meantime, if you want me to tell him any specific commands to run, let me know 
Stan JVirtualization Engineer

Author

Commented:
here is info from the other node

esxi-scsidevs -a
vmba2  lsi_mr3  sas.5003048018cffe02  0000:01:00.0  Avago (LSI) Megaraid SAS Invader Controller

vmkload_mod  -s lsi_mr3  |  grep  Version
Version: 6.910.18-1vmw.650.0.0.4564106

this is the driver i could no locate on VMware's site
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
okay, so the driver and version of that driver which is currently installed on that Host is not compatible with the HBA, either make and model, or firmware.

that driver, is probably incorporated into the default ESXi build.

There would seem to be a lot of issues with this driver and it's later HCL certification, which was dropped after 6.5 U1.

However....

VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib

This driver is available from

update-from-esxi6.5-6.5_update01.zip

which can be found here

https://my.vmware.com/group/vmware/patch#search

Extract the vib, and install as below...

esxcli software vib install -v="/vmfs/volumes/DATASTORENAME/VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib"


that should solve your initial problem, BUT.... you've got an issue because everytime you upgrade, you will have to re-apply this driver, until you resolve the Hardware Certification Driver issue with your

Avago (LSI) Megaraid SAS Invader Controller

which you are going to have to discuss with Avago and VMware, why it's been dropped from the HCL now, because it's not on the HCL anymore.
Stan JVirtualization Engineer

Author

Commented:
yes,,,
up in 3-4 previous posting, i posted,,,
I downloaded Update 01 and extracted the contents.
Under vib20, in the lsi_mr3 folder, I see the below:
    VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib 

i will have to copy this over to a datastore an then apply it,,,,
and i agree, it should work and i cannot update past ESXi U1 (unless as you say, I keep re-applying the VIB driver after each update, which make no sense)

I will contact VMware to start and see why this driver is missing / dropped.
Avago would be next and then let the SM Manufacturer know.

I will post a follow up once i get answers.

Hopefully, this long drawn out posting will help someone else that may have a similar problem!

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
yes,,,
up in 3-4 previous posting, i posted,,,
I downloaded Update 01 and extracted the contents.
Under vib20, in the lsi_mr3 folder, I see the below:
    VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib

missed that, anyway it confirms a solution (maybe!)provided the command to install, get it installed.

the only thing that could go wrong, is you are on a different BUILD, it goes PSOD (crashes!).

it can happen, because you are effectively rolling your own ESXi custom build, useful for labs and test, but Production!???

You will have to restart after the in install. it will states this.
Stan JVirtualization Engineer

Author

Commented:
ok,,,am going to confirm with VMware first...
Stan JVirtualization Engineer

Author

Commented:
Here is where we are.

We opened a ticket to VMware support on 5/27

Provided responses to their question as well as some log data.
I explained that the system hardware is a dual node comprised of the exact same hardware.
Once side running 6.5u1 that has a SAS driver that works.
The other side, was upgraded to 6.5u3 and the driver no longer appears to be working.

i provided them the same info as I posted here, that the SAS driver (6.910.18-1vmw.650.0.0.4564106 ) on the other node works..

they stated for the node that is working,,

"I did go through the attachment and from there I could clearly see that the current device driver version for vmhba2 is listed as 6.910.18-1vmw.650.0.0.4564106  - this version is not supported for the ESXi version that's running. "

I asked  them if i can use the 6.910.18-1vmw.650.0.0.4564106 driver on the node running 6.5u3..
 
they stated,
since that is not part of a supported device driver set, it wouldn't be possible for us to predict its behavior as to when it would work/get detected and when it wouldn't.  If it was working with the earlier builds, we wouldn't be able to comment on it, as owing to its incompatibility - its behavior can't be predicted. 

So, my question was, how is it that the driver made it into 6.5u1 and works?
So VMware supplied the drivers for ESXi 6.5u1 that are not supported?
The response..
Highly unlikely, the device drivers bundled with the vanilla images are most definitely in compliance with the HCL guide - the same cant be said for the Custom ESXi images that would be provided by the vendors.

I dd not apply any custom image, so I am not sure why that was brought up and my question was not answered....

The first action they proposed was to use a driver,,VMW-ESX-6.5.0-lsi_mr3-7.708.07.00-13346416.zip
 Stating,, Yes, this driver shall surely work as its listed on our HCL guide as a compliant driver version for the ESXi host.  
 
I tried the driver and it did not work.
So, they are requesting more info.
I don't know what they will propose next..

I will update the posting when they provide some solution that will work,,,,


Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Unsupported hardware always an issue.

I'm afraid it happens, hardware gets dropped.

It rarely happens with Dell, Lenovo, HPE, Fujitsu because they just drop the entire server from the HCL.

BUT Supermicro builds, which are just motherboards, it happens alot.
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
In addition what Andrew stated the hardware that gets dropped of the HCL does not mean that it will not work, it just means that it has not been certified and tested with the particular version of vSphere.  In many instances it will work just find but VMware support has the option of backing out for full remediation because it has not been tested and certified. 
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
and in addition to what Paul has written.

You will always have the pain of rolling you're own Custom ESXi for your servers.

VMware will not support them, because you are not supported because items are not on the HCL, which always becomes a risk, if you ever get any VMware related issued, PSOD, of other problem.

because VMware will state - not supported and owing to its incompatibility - its behavior can't be predicted.

Have you discussed this issue with Supermicro ?
Stan JVirtualization Engineer

Author

Commented:
SuperMicro vendor went back to them and asked about firmware.

They are still checking with Taiwan, but this is all they have, and confirmed it is the latest version of this firmware for the backplane/controller.
 
Packge Ver: 24.21.0-0100
BIOS Ver : 6.36.00.3
FW Ver : 4.680.00-8465

What is interesting is that this firmware has the driver in ESXi 7.0
https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=37806 


Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Maybe have skipped a Build of ESXi.

Some ESXi 7.0 drivers have been re-architected.

So what works in 6.7 does not work in 7.0 and vice versa, especially around storage.
Stan JVirtualization Engineer

Author

Commented:
VMware is pointing me to load the driver that works on the other host to the host with the failed driver

..go figure.,....something we said to try a month ago,,,but was not sure about PSOD

They also asked for the build sheet of components.

I asked them for the driver and they said.

        These drivers are bundled up with the VMware Image and unfortunately aren't available for manual download directly.  Considering its working for the other host - we would request you to have that specific vib instance from the functioning host moved to the non-functioning host and see if you're able to have that working.

They also asked for the build sheet of components.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
again it comes back to their statement...

not supported and owing to its incompatibility - its behaviour can't be predicted.

which is your risk.

But if the hardware is exactly the same, and you've no issues with stability on the other hosts, then that is the answer.
Stan JVirtualization Engineer

Author

Commented:
I can’t get to the other host .
I unzipped the 6.5u1 build to get the VIB folder and went to the lsi_mr3 folder and pulled
   VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib
 
 i will try the below next week,,,,,

esxcli software vib install -v “/vmfs/volumes/datastore/patch/VMW_bootbank_lsi-mr3_6.910.18.00-1vmw.650.0.0.4564106.vib”

Stan JVirtualization Engineer

Author

Commented:
sorry for not updating..  my subscription expired and it was recently renewed.

here is where we are;;
after multiple emails to support with log info, they directed me to pull the same LSI driver for the SAS RAID that is working on the node (A) and install it on node (B).

I downloaded 6.5u1 and extracted the driver, and installed with the vib install command.
rebooted and still cannot see the SAS drive.

VMware verified it is the same driver installed on both nodes.

After several more rounds of email with log data, they said since a reinstall of the driver did not work, it looks to be a hardware problem and not an ESXi issue.  In one email they said it looks like a possible re-install is needed on node (B), but in now is saying it is hardware related.

again, i pointed out the hardware is exactly the same on node (A) and node (B) and the only difference is node (A) is running 6.5U1 and node (B) is runing 6.5U3.

since there is no way to move the VMs that work, the following may have need to be attempted..
--- execute an ESXi Backup using ESXi vim-cmd on node (B)
--- power off VMs on node (B) - these are on shared NetApp storage
--- install ESXi 6.5 U1 on node (B)   not overwriting the VMFS (Install, preserve)
---  execute an ESXi restore on node (B)  using ESXi vim-cmd

register/add VMs back to node (B) by accessing the .vmx files on shared NetApp storage

however, i am not sure the ESXi config restore will preserve the developers folders that they have been setup in vCenter that houses the VMs being registered ?

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
To be honest with you I would complete a fresh install and reconfigure, I would not restore a backup config.

Upgrades do go wrong, and hence why we never recommend them.

Fresh install is always better.

I would also make sure you have a backup of the VMs.

vCenter Server Database has all the configuration for folders, not ESXi.
Stan JVirtualization Engineer

Author

Commented:
i was leaning towards the backup config due to a complete re-install and reconfig would be very long for this environment.

there are 14 vSwitch's with multiple port groups in each that the VMs are assigned to them.  then there is a NetApp setup by another team that did the original install.

all of this (vSwitch names, port groups, vlans, IPs) would need to be documented prior to a new install with the possibility of missing something

also, the backup software is not in place yet due a requirement to switch to another vendor



Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
i was leaning towards the backup config due to a complete re-install and reconfig would be very long for this environment.

there are 14 vSwitch's with multiple port groups in each that the VMs are assigned to them.  then there is a NetApp setup by another team that did the original install.

all of this (vSwitch names, port groups, vlans, IPs) would need to be documented prior to a new install with the possibility of missing something

also, the backup software is not in place yet due a requirement to switch to another vendor

screenshots!

The only issue with restore config, is if it works, sometimes it doesn't, and if it sets a value which causes this issue.

You could, of course, check you have access to storage before you apply config restore.
Stan JVirtualization Engineer

Author

Commented:

screenshots of the configs?  if so, that is still a lot of re-configuring after the install !

Do you mean check access to the SAS storage that is not currently available and shows invalid VMs?

Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
Depending on licensing you may be able to take a host profile, redeploy host and get your host profile back.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
screenshots of the configs?  if so, that is still a lot of re-configuring after the install !

I would have to disagree, I rebuild these things on a daily basis, from screenshots and memory.

restore config if you must, but you might spend a day trying to restore a config, because restore configs is not a guarantee. (e.g. it fails with odd messages). Oh and also needs to be the same version of ESXi.

Do you mean check access to the SAS storage that is not currently available and shows invalid VMs?

Yes, that way you know it's fixed!
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
so we are actually back to this post, 2 months ago...

https://www.experts-exchange.com/questions/29181312/Upgrade-of-ESXi-6-5-no-longer-can-see-internal-server-RAID-drives.html#a43083710

Re-install. (not an upgrade) and select Preserve VMFS (do not overwrite).

and then re-configure from screenshots, and you'll be done, register Vms with the Inventory.

IF, you do not get the VMFS message, then the installer didn't find your storage controller, and you'll need to check hardware, firmware, or driver.

(another way to protect overwriting datastore, is disconnect the storage from storage controller)
Stan JVirtualization Engineer

Author

Commented:
we do have enterprise plus, but i do not think a host profile is set up.  
this may be an option

Which VMFS message should i not see?

KB article does not mention same version https://kb.vmware.com/s/article/2042141 

I did see the below at a web site saying to use -force..
The version, build number and UUID of an ESXi host on which the configuration is recovered must match the version, build number and UUID of an ESXi host whose backup you are using to restore configuration. Use the -force key in the command to skip the UUID check.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
if you re-install and it states Preserve VMFS, it's not detected the datastore, because of no valid storage controller driver.

The version, build number and UUID of an ESXi host on which the configuration is recovered must match the version, build number and UUID of an ESXi host whose backup you are using to restore configuration. Use the -force key in the command to skip the UUID check.

does it not imply that above?  versions must be the same.

if you do a restore to a different build or version, - not supported.

Lets us know, hopefully a new install will fix the issue, once you have re-installed and datastore is available, you can decide to restore config, or manually add config. (don't think you have to use the GUI!).
Stan JVirtualization Engineer

Author

Commented:
One would figure the KB article should point this out about the the same version

I not clear on the
"Preserve VMFS, it's not detected the datastore, because of no valid storage controller driver"

ESXi is installed on a small internal drive.  I think it is a 52 GB or so.
SAS drive(s) is RAID and that is where the VMs are that are not seen

Do you mean it will scan the hardware and provide a list of devices that ESXi can be installed on.
(like in the attached  image)
If i don't see the SAS drive, then it is a driver issue or a vendor issue?

Host profiles an option to restore te netwprk vs reconfiguring by hand?

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
One would figure the KB article should point this out about the the same version

Not everything is documented in VMware la la land. You can feedback on articles in the VMware KB if you wish.

VMware documentation is at best poor.

Hang on lets just rewind......

The current storage controller has disks attached which ESXi is installed on, which is the same storage controller, which has the datastore with the VMs which cannot be detected ?

OR, ESXi is installed on disks attached to SATA bus, and storage controller has a RAID configuration with the missing datastore ?
Paul SolovyovskySenior IT Advisor
CERTIFIED EXPERT
Top Expert 2008

Commented:
If you are using Netapp on the backed you can take a Netapp snapshot just in case, If the VMs are on there you can re-install and not worry about corrupting any data on the storage.
Stan JVirtualization Engineer

Author

Commented:
we have a NetApp, but i don't know if there is enough space configure for snapshots

 ESXi is installed on the small SATA SSD like this one
https://www.amazon.com/Supermicro-SSD-MS064-PHI-mSATA-half-card/dp/B00O7ZFJWS 

The missing (invalid) VMs are on the datastore which is the internal SAS RAID 
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Okay, well my comment still stands, when ESXi is installed you will see it scan for devices, if it does not report a VMFS datastore, it cannot detect your hardware. e.g. if it does not display the Preserve VMFS option.

https://filedb.experts-exchange.com/incoming/2020/04_w15/1449051/2020-04-06-15_47_21-JViewer--141.245.png
Stan JVirtualization Engineer

Author

Commented:
so this could be a test?

if i boot a 6.5U1 CD on that ESXi node, it will scan as you show

should i see the enteral 60gb SATA as well as the SAS RAID and the 4 internal SSDs that are also on the node?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Stan JVirtualization Engineer

Author

Commented:
assuming i use a 6.5u3 ISO,
if i don't see the SAS RAID, then is it a hardware or ESXi 6.5u3 issue or hadwaer?

i could then cancel, and try a ESXi 6.5u1 ISO and see if the SAS is recognized?

I am not sure how a ESXi  load works per se..
Is the install using info on  the CD ISO and running code to scan hardware looking for compatible drives to install on and checking other system internals before the actual install starts?

Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
basically it loads all the drivers present in the installer, and then performs the scanning for devices.

No compatible driver, no device.
Stan JVirtualization Engineer

Author

Commented:
ok,,,so by going from what we now know:

Node A running 6.5u1 can see the SAS
Node B upgraded to 6.5u3 cannot see the SAS

Booting the 6.5u1 ISO in Node B, it should see the SAS since that is working on Node A?
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
Booting the 6.5u1 ISO in Node B, it should see the SAS since that is working on Node A?

the theory holds true, unless hardware.

which was what we said before, after the upgrade did you notice the loss of SAS, otherwise rollback, which you couldn't because you then applied other updates.

and you may have to stick on 6.5u1.

vSphere 6.5 Support will end on November 15, 2021
Stan JVirtualization Engineer

Author

Commented:
ok,,,

next week,,
i will at least try a boot of a 6.5u1 iso to see if the SAS is seen,,then cancel and try 6.5u3


Stan JVirtualization Engineer

Author

Commented:
update....

I was ready to try my test of booting ESXi 6.5 and scanning the hardware to see if ESXi Node B would see the SAS drive.
 
First, I wanted to check In vCenter, the Hardware.  I was looking at the Hardware under the Configure tab for  Node B.
Under Hardware is -> PCI Devices.  
 Under the devices listed, it shows the ID of the SAS Controller..
     000:01:00.0 Available AVAGO (LSI) MegaRAID SAS Invader Controller.

When I clicked on the ID under the column, it shows –
The device is available for VMs to use 
It also list the info for the device: , Device ID, Vendor, Vendor ID, Bus Location, etc.
 But as we know , Node B does not see the SAS.

The strange thing is, on the other Node that can see the SAS, Node A , the Configure tab screen is not showing any detail info of any devices including the SAS Controller
It is blank!
If I click on edit and go through the list of PCI devices, I can see it..
     000:01:00.0   unavailable   AVAGO (LSI) MegaRAID SAS Invader Controller   vmhba2 -  the device is not currently available for VMs to use.
 
If I login directly to ESXI Node B via the web client, and go to PCI Devices, it shows…
     000:01:00.0 Available AVAGO (LSI) MegaRAID SAS Invader Controller and under Passthrough it shows as active
 
I asked the Admin for the other node, Node A, to login to Node A  via the web client and go to PCI Devices and check.
It shows…
     000:01:00.0 Available AVAGO (LSI) MegaRAID SAS Invader Controller  and under Passthrough it shows as disabled
 
 
So we were thinking what if I disable PCI passthrough for the LSI on Node B  to match what Node A shows?
 
I toggled the passthrough setting for the AVAGO (LSI) MegaRAID SAS Invader Controller to disabled and rebooted Node B.
 
Sure enough, the SAS drive is now seen and the VMs are back!!
 
I have no clue why or how this was changed via an ESXi Update.
 
VMware support  has not responded to my findings
 
So, now we have ESXi 6.5u3 with a 6.5u1 LSI MR3 driver.

It may have been possible that the ESXi 6.5u3 updated LSI driver would have been ok, but who would have know to go check PCI devices and turn off passthrough for the device...
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
This is to be expected if you have configured the device for Passthrough.

PCI HBA should not be configured for passthrough unless you need to present that controller to a VM.

We've never seen an update turn it on and enable it.

Because it effectively removes the device from the host to use as a device to create a datastore.

Strange that VMware Support did not spot this, as they had hands on remote access.

(but to be honest with you, not a surprise!)

You would have found the same, if booting an ISO, it would have seen the HBAs.

Good Catch, you've found the solution.
Stan JVirtualization Engineer

Author

Commented:
It is first for me,,,,i would have never thought to go look for PCI Passthrough settings for this issue!

I have only used Passthrough to turn on NVIDIA GPU cards on some other servers.

At the minimum, VMware support should have suggested looking at Passthrough even without accessing the systems.

thanks for all of the suggestions...they may come n handy for down the road or for someone else with similar issues.
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
VMware Support should have picked it up from the logs!

#crapsupport
Stan JVirtualization Engineer

Author

Commented:
true, but we could not provide a support bundle to to the server being on a air gapped system

they sent me ESXi commands to run on the host and i had to send them the replies from the commands

still, you figure the output from some of the over 20 commands or more would have shown something
Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
#crapsupport
Stan JVirtualization Engineer

Author

Commented:
i take it you have had some bad experiences with support or at least heard of some horror stories?

i do find that the federal support is somewhat better based in the US vs std support which this last one was under


Andrew Hancock (VMware vExpert PRO / EE Fellow)VMware and Virtualization Consultant
CERTIFIED EXPERT
Fellow
Expert of the Year 2017

Commented:
I think your statement says it all.

VMware Support is NOT free. It's a paid-for service which VMware promotes.

They have the ability to provide remote support, investigate and look at logs, they are the vendor, and missed this!

That is our opinion.
Virtualization Engineer
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.