Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Vmware ESXi 5.1 - Datastore missing suddenly

Posted on 2015-02-23
7
1,365 Views
Last Modified: 2015-03-04
Hi,
I am running a VMware ESXi server with two RAID controllers each hosting a data store A & B.

I had few virtual machines sitting on Data Store A and few on Data Store B ... One of my client complained about the VM being not reachable, which forced me to check vSphere Client for any obvious issues.

When connecting to the VM running on Data Store B, it showed a yellow ribbon saying the vm config file is not accessible. To trouble-shoot I went into the Data Store B and searched for the VM files but it was all empty.

I did a Data Store Refresh / Rescan after which the Data Store disappeared from the list. The devices still show both adapters LocalAdaptecDisk 1 TB (PrimaryRAID) and 3TB (SecondaryRAID) but only Data Store A (Primary Raid) is visible and Data Store B (Secondary RAID) is missing. But after some time even the second adapter device was gone.

The VM of Data Store A are fine and I can also connect to the server using SSH (WinSCP / Putty) ...

Is my data safe? or I lost the data?
Is there a way to re-discover the Data Store B without rebooting the server?

Output of  /dev/disks # ls /dev/disks/

mpx.vmhba2:C0:T0:L0                     vml.0000000000766d686261323a303a30
mpx.vmhba2:C0:T0:L0:1                   vml.0000000000766d686261323a303a30:1
mpx.vmhba2:C0:T1:L0                     vml.0000000000766d686261323a313a30
mpx.vmhba2:C0:T1:L0:1                   vml.0000000000766d686261323a313a30:1
mpx.vmhba32:C0:T0:L0                    vml.0000000000766d68626133323a303a30
mpx.vmhba32:C0:T0:L0:1                  vml.0000000000766d68626133323a303a30:1
mpx.vmhba32:C0:T0:L0:5                  vml.0000000000766d68626133323a303a30:5
mpx.vmhba32:C0:T0:L0:6                  vml.0000000000766d68626133323a303a30:6
mpx.vmhba32:C0:T0:L0:7                  vml.0000000000766d68626133323a303a30:7
mpx.vmhba32:C0:T0:L0:8                  vml.0000000000766d68626133323a303a30:8

If I click on Add Data store, it shows the disconnected Device but then it does not allow me to remount the disappeared Data Store, If I continue, it may create a new partition and just wipe the whole hard disk.

Hence, I cancelled the Add Storage step.

/vmfs/volumes # ls -al
drwxr-xr-x    1 root     root           512 Feb 23 18:42 .
drwxr-xr-x    1 root     root           512 Jan 18 16:04 ..
drwxr-xr-x    1 root     root             8 Jan  1  1970 343f43eb-cebf069a-2e3a-34165eb1baac
drwxr-xr-t    1 root     root          1680 Jun 27  2014 4dfa32f8-d53e6d8e-541d-001b215e0514
drwxr-xr-x    1 root     root             8 Jan  1  1970 50acdf0a-49686666-09f8-6c626daf4bfc
lrwxr-xr-x    1 root     root            35 Feb 23 18:42 PrimaryRAID1 -> 4dfa32f8-d53e6d8e-541d-001b215e0514
drwxr-xr-x    1 root     root             8 Jan  1  1970 c66e35e1-e51874c0-8f69-58451bad876f

As you can see there is no SecondaryRAID listed at all.

Regards,
Zen
0
Comment
Question by:zen shaw
  • 4
  • 3
7 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40627529
What happened recently?
0
 

Author Comment

by:zen shaw
ID: 40627967
Hi,
The VM machine was running fine and I assume the database engineer was running a big query (high read) at that time.

Speaking to him he said the server was running short of memory (peaking at 95%) and must have tried to dump the memory to the disk.

I have not restarted the server yet as the other RAID controller and the data store on it is fine and running Live applications. I could see the failed disk but not the volume/data store on it. How could the volume be missing for a RAID controller crash?

Do you want any diagnostic information?

Regards,
Zen
0
 
LVL 62

Accepted Solution

by:
gheist earned 500 total points
ID: 40628706
I am afraid you need to power everything off and then on.
If you still have chance set disk queue depth to something reasonable:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008113
That will prevent RAID controllers from locking up and reduce total IO wait for guests.
0
Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:zen shaw
ID: 40629470
Thanks ... I have no issues turning the server down ... at max it would be down for some time.

But I am worried about the Volume / Datastore and the VMDKs on them.

I'll give a reboot tomorrow and update ... meanwhile if you have any other hint/tip please help.

Regards,
Zen
0
 
LVL 62

Expert Comment

by:gheist
ID: 40645113
Power OFF with cable unplugging is a bit stronger than restart with modern hardware with 10 service guard processors sunning all the time.
0
 

Author Comment

by:zen shaw
ID: 40645834
Update: I tried to reboot the machine ... after the reboot boot the data stores on Primary RAID 1 and Secondary RAID 1 reappear. But after some time, the Secondary RAID 1 disappears again.

This makes me conclude that the RAID Controller is fine as the Primary RAID does not disappear. So I assume it is the problem with the disks used for Secondary RAID 1.

Now in this situation:
a) How would I recover the .VMDK or the SQL server data file on it?
b) Since primary RAID 1 is OK and still running a live server ... can I turn the server off for some time and remove the Secondary RAID 1 disks (non-hot swappable)  .... if then I am not sure if Adaptec raid controller would allow me to do that without deleting the RAID configuration or if it has a feature to disable the RAID detection for some time.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40646218
I'd boot Ubuntu live disk and try to check SMART status on the disks and just pull out the bad one (Raid1 can handle that)
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show you how to create an ISO CD-ROM/DVD-ROM image (*.iso), and MD5 checksum signature, for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5). It's a good idea to compare checksums, because many installations fail because of a corr…
In this article, I am going to show you how to simulate a multi-site Lab environment on a single Hyper-V host. I use this method successfully in my own lab to simulate three fully routed global AD Sites on a Windows 10 Hyper-V host.
This video shows you how to use a vSphere client to connect to your ESX host as the root user. Demonstrates the basic connection of bypassing certification set up. Demonstrates how to access the traditional view to begin managing your virtual mac…
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question