Solved

Vmware ESXi 5.1 - Datastore missing suddenly

Posted on 2015-02-23
7
1,211 Views
Last Modified: 2015-03-04
Hi,
I am running a VMware ESXi server with two RAID controllers each hosting a data store A & B.

I had few virtual machines sitting on Data Store A and few on Data Store B ... One of my client complained about the VM being not reachable, which forced me to check vSphere Client for any obvious issues.

When connecting to the VM running on Data Store B, it showed a yellow ribbon saying the vm config file is not accessible. To trouble-shoot I went into the Data Store B and searched for the VM files but it was all empty.

I did a Data Store Refresh / Rescan after which the Data Store disappeared from the list. The devices still show both adapters LocalAdaptecDisk 1 TB (PrimaryRAID) and 3TB (SecondaryRAID) but only Data Store A (Primary Raid) is visible and Data Store B (Secondary RAID) is missing. But after some time even the second adapter device was gone.

The VM of Data Store A are fine and I can also connect to the server using SSH (WinSCP / Putty) ...

Is my data safe? or I lost the data?
Is there a way to re-discover the Data Store B without rebooting the server?

Output of  /dev/disks # ls /dev/disks/

mpx.vmhba2:C0:T0:L0                     vml.0000000000766d686261323a303a30
mpx.vmhba2:C0:T0:L0:1                   vml.0000000000766d686261323a303a30:1
mpx.vmhba2:C0:T1:L0                     vml.0000000000766d686261323a313a30
mpx.vmhba2:C0:T1:L0:1                   vml.0000000000766d686261323a313a30:1
mpx.vmhba32:C0:T0:L0                    vml.0000000000766d68626133323a303a30
mpx.vmhba32:C0:T0:L0:1                  vml.0000000000766d68626133323a303a30:1
mpx.vmhba32:C0:T0:L0:5                  vml.0000000000766d68626133323a303a30:5
mpx.vmhba32:C0:T0:L0:6                  vml.0000000000766d68626133323a303a30:6
mpx.vmhba32:C0:T0:L0:7                  vml.0000000000766d68626133323a303a30:7
mpx.vmhba32:C0:T0:L0:8                  vml.0000000000766d68626133323a303a30:8

If I click on Add Data store, it shows the disconnected Device but then it does not allow me to remount the disappeared Data Store, If I continue, it may create a new partition and just wipe the whole hard disk.

Hence, I cancelled the Add Storage step.

/vmfs/volumes # ls -al
drwxr-xr-x    1 root     root           512 Feb 23 18:42 .
drwxr-xr-x    1 root     root           512 Jan 18 16:04 ..
drwxr-xr-x    1 root     root             8 Jan  1  1970 343f43eb-cebf069a-2e3a-34165eb1baac
drwxr-xr-t    1 root     root          1680 Jun 27  2014 4dfa32f8-d53e6d8e-541d-001b215e0514
drwxr-xr-x    1 root     root             8 Jan  1  1970 50acdf0a-49686666-09f8-6c626daf4bfc
lrwxr-xr-x    1 root     root            35 Feb 23 18:42 PrimaryRAID1 -> 4dfa32f8-d53e6d8e-541d-001b215e0514
drwxr-xr-x    1 root     root             8 Jan  1  1970 c66e35e1-e51874c0-8f69-58451bad876f

As you can see there is no SecondaryRAID listed at all.

Regards,
Zen
0
Comment
Question by:zen shaw
  • 4
  • 3
7 Comments
 
LVL 61

Expert Comment

by:gheist
ID: 40627529
What happened recently?
0
 

Author Comment

by:zen shaw
ID: 40627967
Hi,
The VM machine was running fine and I assume the database engineer was running a big query (high read) at that time.

Speaking to him he said the server was running short of memory (peaking at 95%) and must have tried to dump the memory to the disk.

I have not restarted the server yet as the other RAID controller and the data store on it is fine and running Live applications. I could see the failed disk but not the volume/data store on it. How could the volume be missing for a RAID controller crash?

Do you want any diagnostic information?

Regards,
Zen
0
 
LVL 61

Accepted Solution

by:
gheist earned 500 total points
ID: 40628706
I am afraid you need to power everything off and then on.
If you still have chance set disk queue depth to something reasonable:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008113
That will prevent RAID controllers from locking up and reduce total IO wait for guests.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:zen shaw
ID: 40629470
Thanks ... I have no issues turning the server down ... at max it would be down for some time.

But I am worried about the Volume / Datastore and the VMDKs on them.

I'll give a reboot tomorrow and update ... meanwhile if you have any other hint/tip please help.

Regards,
Zen
0
 
LVL 61

Expert Comment

by:gheist
ID: 40645113
Power OFF with cable unplugging is a bit stronger than restart with modern hardware with 10 service guard processors sunning all the time.
0
 

Author Comment

by:zen shaw
ID: 40645834
Update: I tried to reboot the machine ... after the reboot boot the data stores on Primary RAID 1 and Secondary RAID 1 reappear. But after some time, the Secondary RAID 1 disappears again.

This makes me conclude that the RAID Controller is fine as the Primary RAID does not disappear. So I assume it is the problem with the disks used for Secondary RAID 1.

Now in this situation:
a) How would I recover the .VMDK or the SQL server data file on it?
b) Since primary RAID 1 is OK and still running a live server ... can I turn the server off for some time and remove the Secondary RAID 1 disks (non-hot swappable)  .... if then I am not sure if Adaptec raid controller would allow me to do that without deleting the RAID configuration or if it has a feature to disable the RAID detection for some time.
0
 
LVL 61

Expert Comment

by:gheist
ID: 40646218
I'd boot Ubuntu live disk and try to check SMART status on the disks and just pull out the bad one (Raid1 can handle that)
0

Featured Post

Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

Join & Write a Comment

Last article we focus in how to VMware: How to create and use VMs TAGs – Part 1 so before follow this article and perform the next tasks, you should read the first article how to create the TAG before using them in Veeam Backup Jobs.
In this step by step tutorial with screenshots, we will show you HOW TO: Enable SSH Remote Access on a VMware vSphere Hypervisor 6.5 (ESXi 6.5). This is important if you need to enable SSH remote access for additional troubleshooting of the ESXi hos…
Teach the user how to convert virtaul disk file formats and how to rename virtual machine files on datastores. Open vSphere Web Client: Review VM disk settings: Migrate VM to new datastore with a thick provisioned (lazy zeroed) disk format: Rename a…
This Micro Tutorial steps you through the configuration steps to configure your ESXi host Management Network settings and test the management network, ensure the host is recognized by the DNS Server, configure a new password, and the troubleshooting…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now