Avatar of Brandon Bazemore
Brandon Bazemore
 asked on

Connectivity "blips" on a server 2012 R2 Terminal Server

Hi there!  We've been troubleshooting this issue for months without much success, so wanted to put it out there to people smarter than us. ;)  

Dell c6100 xs23  (4 nodes)
Host nodes include one running a file share that includes user redirected folders as well as two terminal server session hosts.

Users will notice the (multiple times a day) that:
-Desktop will flicker (icons go away and back quickly)
-File shares will have to be refreshed to see updates (so if another user saves a file, it won't just "appear" for the rest of the users viewing that folder)
-Chrome (which is using redirected data folders) will crash.

The only logging issue we see is delayed write failures.  That in combo with above indicates to us that somehow there is a loss in connectivity between the session hosts and the file server (simple windows file server - no clustering, DFS or otherwise).  Easy, right?  

Here is what we've tried:
-turning off VMQ on the base NIC, the hyper-v switch, and the host switch
-turning off srv-io on all
-running both the file server and one of the session hosts on the same physical node while running the other session host on a separate node.  No change here which eliminated most of our network suspicions as it should literally be communicating only over the hyper-v vSwitch.
-disabling LACP and using a single connection
-upgrading firmware on the broadcom NIC's (there was a problem on previous firmware, but it was a FULL disconnect requiring reboot).
-Looked at KB 2842111 - we don't see any massive number of handles, so didn't apply the fix.
-Looked at KB 2878182 - we don't see any non-responsive threads, so didn't apply.
-The actual even id error of the delayed write is:
EventID 50 / Source: MUP or mrxsmb  (some of both)
  " {Delayed Write Failed} Windows was unable to save all the data for the file \;H:00000000387e736f\data\Users\iamauserexample .. nts\Chrome_Settings\CrashpadMetrics-active.pma. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere."
-no corresponding errors are seen on the file server that happen at the same time
-no corresponding errors on the hyper-v host

Interestingly, they previously had a server 2008 instance which did not have the same issue. With all of the above this is leading us to believe it may be an issue with server 2012 R2.
DellWindows Server 2012

Avatar of undefined
Last Comment
Brandon Bazemore

8/22/2022 - Mon
Mal Osborne

View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Brandon Bazemore

Hey!  You may be on to something here - I do see two correlations with GP "periodic processing" and the error (out of 3 examples).  I was under impression that redirected folders weren't subject to that - but will change the processing time and retest.
Brandon Bazemore

So far so good on this - I ended up setting the group policy refresh setting to not refresh while they are logged in.  Also went in and tested with a GPO update force and all of the same symptoms happened (chrome crash, explorer.exe close, delayed write logged).  Pretty good sign there - if everything works in production tomorrow I'll report back.  Thanks for your quick response!
Brandon Bazemore

Thanks so much for your help - that was the fix!
Your help has saved me hundreds of hours of internet surfing.