ESXI 4.0 "Lost access for volume due to connectivity issues"

Dear Experts,

I am at about my wits end with a problem I have attempted to diagose, so far unsuccessfully, with an IBM System X3650 server. I am running ESXI 4.0 on the machine and am getting constant "Lost access for volume due to connectivity issues" errors in the event log. It seems when ever there are steady writes on any of the datastores, I get these errors. I notice it primarily when I am doing a backup of the domain controller (SBS2003 machine on Datastore1) with StorageCraft ShadowProtect which is installed on that machine. When I begin to see the errors below, I also simultaneously see disk errors in the event log of the SBS2003 server and almost every time the error: "Timeout (30000 milliseconds) waiting for a transaction response from the NtFrs service". I am working with IBM and VMware engineers and both have enamined logs of the Hardware and ESXI but so far no results. Yesterday we replaced the system board which has the RAID controller on it (IBM/Adaptec 8K controller) but it did not help.

Any feedback would be greatly appreciated!!


 ESXI-Error.GIF
JohnnyD74Asked:
Who is Participating?
 
JohnnyD74Connect With a Mentor Author Commented:
I found out it was a problem with the array itself. I moved SBS2003 from datastore1 to datastore2 and no problems since. I have not determined the exact problem...but with ESXI now alone on datastore1, things are fine.
0
 
Danny McDanielClinical Systems AnalystCommented:
I am guessing you have local storage.  Did you check esxtop to see if there was high latency when the backups ran?  I'm also guessing that you confirmed that the RAID card has a battery backed write cache and it's enabled and working, right?  Are all of the disks in the RAID the same hardware, too?  Not sure it makes a difference but trying to think of anything that would cause it to run sub-optimally under load.
0
 
JohnnyD74Author Commented:
Thanks Dnam66,

Yes...all local storage. Yes, the RAID is in write back mode (cached writes) and all drives are in right through. Yes...all disks in the RAID are exactly  the same but I do have 3 RAID volumes each with differnt disks:
datastore1: 2 250GB sata disks in RAID1
datastore2: 3 300GB SAS 15000 RPM disks in RAID5
datastore3: 1 500GB sata disk (simple volume attached to SBS2003 server on datastore1)
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
Danny McDanielClinical Systems AnalystCommented:
I thought sata disks were only supported for use as an installation drive for ESX/ESXi...  Is that the volume that's reporting issues?
0
 
JohnnyD74Author Commented:
Well, it seems to start there mostly but when the backup is happening, other datastores throw a few disconnection errors too. I talked with a Vmware engineer who told me it is fine to both put ESXI and VM's on the same disk array but maybe he is incorrect. Tonight I tried moving SBS2003 on datastore1 to datastore3 and the problems went away....no errors. It seems that possibly when the SBS2003 server is running on datastore1, it somehow affects the ESXI hypervisor on that same disk array.
0
 
Danny McDanielClinical Systems AnalystCommented:
I guess the SATA requirement is a figment of my imagination or it might be a 3.5 requirement.  The main issue is that SATA disks just can't support the IOPS that SCSI or SAS can so maybe that is the core issue???  Another contributing factor would be if you have a snapshot on the disk(s).
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.