[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

Vmware ESXi 5.1 - Datastore missing suddenly

Posted on 2015-02-23
7
Medium Priority
?
1,677 Views
Last Modified: 2015-03-04
Hi,
I am running a VMware ESXi server with two RAID controllers each hosting a data store A & B.

I had few virtual machines sitting on Data Store A and few on Data Store B ... One of my client complained about the VM being not reachable, which forced me to check vSphere Client for any obvious issues.

When connecting to the VM running on Data Store B, it showed a yellow ribbon saying the vm config file is not accessible. To trouble-shoot I went into the Data Store B and searched for the VM files but it was all empty.

I did a Data Store Refresh / Rescan after which the Data Store disappeared from the list. The devices still show both adapters LocalAdaptecDisk 1 TB (PrimaryRAID) and 3TB (SecondaryRAID) but only Data Store A (Primary Raid) is visible and Data Store B (Secondary RAID) is missing. But after some time even the second adapter device was gone.

The VM of Data Store A are fine and I can also connect to the server using SSH (WinSCP / Putty) ...

Is my data safe? or I lost the data?
Is there a way to re-discover the Data Store B without rebooting the server?

Output of  /dev/disks # ls /dev/disks/

mpx.vmhba2:C0:T0:L0                     vml.0000000000766d686261323a303a30
mpx.vmhba2:C0:T0:L0:1                   vml.0000000000766d686261323a303a30:1
mpx.vmhba2:C0:T1:L0                     vml.0000000000766d686261323a313a30
mpx.vmhba2:C0:T1:L0:1                   vml.0000000000766d686261323a313a30:1
mpx.vmhba32:C0:T0:L0                    vml.0000000000766d68626133323a303a30
mpx.vmhba32:C0:T0:L0:1                  vml.0000000000766d68626133323a303a30:1
mpx.vmhba32:C0:T0:L0:5                  vml.0000000000766d68626133323a303a30:5
mpx.vmhba32:C0:T0:L0:6                  vml.0000000000766d68626133323a303a30:6
mpx.vmhba32:C0:T0:L0:7                  vml.0000000000766d68626133323a303a30:7
mpx.vmhba32:C0:T0:L0:8                  vml.0000000000766d68626133323a303a30:8

If I click on Add Data store, it shows the disconnected Device but then it does not allow me to remount the disappeared Data Store, If I continue, it may create a new partition and just wipe the whole hard disk.

Hence, I cancelled the Add Storage step.

/vmfs/volumes # ls -al
drwxr-xr-x    1 root     root           512 Feb 23 18:42 .
drwxr-xr-x    1 root     root           512 Jan 18 16:04 ..
drwxr-xr-x    1 root     root             8 Jan  1  1970 343f43eb-cebf069a-2e3a-34165eb1baac
drwxr-xr-t    1 root     root          1680 Jun 27  2014 4dfa32f8-d53e6d8e-541d-001b215e0514
drwxr-xr-x    1 root     root             8 Jan  1  1970 50acdf0a-49686666-09f8-6c626daf4bfc
lrwxr-xr-x    1 root     root            35 Feb 23 18:42 PrimaryRAID1 -> 4dfa32f8-d53e6d8e-541d-001b215e0514
drwxr-xr-x    1 root     root             8 Jan  1  1970 c66e35e1-e51874c0-8f69-58451bad876f

As you can see there is no SecondaryRAID listed at all.

Regards,
Zen
0
Comment
Question by:zen shaw
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40627529
What happened recently?
0
 

Author Comment

by:zen shaw
ID: 40627967
Hi,
The VM machine was running fine and I assume the database engineer was running a big query (high read) at that time.

Speaking to him he said the server was running short of memory (peaking at 95%) and must have tried to dump the memory to the disk.

I have not restarted the server yet as the other RAID controller and the data store on it is fine and running Live applications. I could see the failed disk but not the volume/data store on it. How could the volume be missing for a RAID controller crash?

Do you want any diagnostic information?

Regards,
Zen
0
 
LVL 62

Accepted Solution

by:
gheist earned 2000 total points
ID: 40628706
I am afraid you need to power everything off and then on.
If you still have chance set disk queue depth to something reasonable:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008113
That will prevent RAID controllers from locking up and reduce total IO wait for guests.
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 

Author Comment

by:zen shaw
ID: 40629470
Thanks ... I have no issues turning the server down ... at max it would be down for some time.

But I am worried about the Volume / Datastore and the VMDKs on them.

I'll give a reboot tomorrow and update ... meanwhile if you have any other hint/tip please help.

Regards,
Zen
0
 
LVL 62

Expert Comment

by:gheist
ID: 40645113
Power OFF with cable unplugging is a bit stronger than restart with modern hardware with 10 service guard processors sunning all the time.
0
 

Author Comment

by:zen shaw
ID: 40645834
Update: I tried to reboot the machine ... after the reboot boot the data stores on Primary RAID 1 and Secondary RAID 1 reappear. But after some time, the Secondary RAID 1 disappears again.

This makes me conclude that the RAID Controller is fine as the Primary RAID does not disappear. So I assume it is the problem with the disks used for Secondary RAID 1.

Now in this situation:
a) How would I recover the .VMDK or the SQL server data file on it?
b) Since primary RAID 1 is OK and still running a live server ... can I turn the server off for some time and remove the Secondary RAID 1 disks (non-hot swappable)  .... if then I am not sure if Adaptec raid controller would allow me to do that without deleting the RAID configuration or if it has a feature to disable the RAID detection for some time.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40646218
I'd boot Ubuntu live disk and try to check SMART status on the disks and just pull out the bad one (Raid1 can handle that)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When rebooting a vCenters 6.0 and try to connect using vSphere Client we get this issue "Invalid URL: The hostname could not parsed." When we get this error we need to do some changes in the vCenter advanced settings to fix the issue.
This article outlines why you need to choose a backup solution that protects your entire environment – including your VMware ESXi and Microsoft Hyper-V virtualization hosts – not just your virtual machines.
This Micro Tutorial walks you through using a remote console to access a server and install ESXi 5.1. This example is showing remote access and installation using a Dell server. The hypervisor is the very first component of your virtual infrastructu…
In this video tutorial I show you the main steps to install and configure  a VMware ESXi6.0 server. The video has my comments as text on the screen and you can pause anytime when needed. Hope this will be helpful. Verify that your hardware and BIO…

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question