Link to home
Start Free TrialLog in
Avatar of DI
DI

asked on

The Remote Desktop Gateway service terminated unexpectedly

Windows Server 2016 RDS environment with two Gateways.

The Remote Desktop Gateway service terminates from time to time. All users on the specific Gateway are disconnected as a result.
Happend:
06.03.2018 - One time
07.03.2018 - One time
08.03.2018 - Three times
14.03.2018 - Two Times


Log Name: System
Event ID: 7031
The Remote Desktop Gateway service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 300000 milliseconds: Restart the service.

Log Name: Microsoft-Windows-TerminalServices-Gateway/Admin
Event ID: 700
The following exception code "3221225477" occured in the RD Gateway server. The RD Gateway will be restarted. No user action is required.

Log Name: Application
Event ID: 1000
Faulting application name: svchost.exe_TSGateway, version: 10.0.14393.0, time stamp: 0x57899b1c
Faulting module name: aaedge.dll, version: 10.0.14393.1532, time stamp: 0x5965ac53
Exception code: 0xc0000005
Fault offset: 0x000000000006e960
Faulting process id: 0x30cc
Faulting application start time: 0x01d3bba22ce0512a
Faulting application path: C:\Windows\system32\svchost.exe
Faulting module path: c:\windows\system32\aaedge.dll
Report Id: 9d62d65b-8584-42ff-bdb5-adca0fe9768b
Faulting package full name:
Faulting package-relative application ID:

Any idea how to fix this?
Avatar of McKnife
McKnife
Flag of Germany image

1st: your server is not up2date. The version of aaedge.dll should be 10.0.14393.2097 if I am not mistaken. See if updating it improves anything.
2nd step would be to revert to a backup when everything was working - if you don't have that, do an inplace upgrade installation using the 2016 server ISO t repair internals. That normally gets rid of many problems caused by corrupt internals.
Avatar of DI
DI

ASKER

Hi McKnife,

Thanks for your quick reply.

1. Servers were patched with latest updates on Thursday 15.03.2018. We haven't seen the crash after updating, but it does not say in Microsoft's change log that they fixed this specific issue. I am not convinced that the latest updates fixed this problem. Remains to be seen.

2. This issue have occured from time to time since last year, so restore from backup is not a viable option. The first occurence I can see from event logs was 29.12.2017.
Is this a VMware VM? If so, I've seen this issue with earlier versions of VMware tools due to the VMXNET3 drivers not working well in 2016. Upgrade VMware tools to the latest version available on the VMware site (10.2 I believe now) and see if that helps out.
1 let's see - it just met the eye that the server was several versions behind.
2 then if it happens again, consider the inplace upgrade. Applications, files and program settings are kept, while all system internals are exchanged for fresh components with default settings.
Avatar of DI

ASKER

Sorry, forgot to say. It is running on a Hyper-V (2012 R2) host.
Avatar of DI

ASKER

You are correct, it was an old text with the Version number in my first post. Latest Version when the error ocurred is 10.0.14393.1532.

Current Version on both Gateways now is 10.0.14393.2097, same as you mentioned.

User generated image
A collection can have only one gateway. That gateway, and broker, can be made highly available.

There can be several collections passing through that one gateway. Those collections can have user access delimited by security group.

IIS Pages needs to have the DefaultTSGateway setting configured to the one gateway.

All need to be in the same AD.

EDIT: One can set up another gateway but the collections set up to work with it can only pass through it not another on the network.
Avatar of DI

ASKER

Gateway and broker are already highly available (two servers).
Then my statement applies to them.

Why are there two gateways?
Avatar of DI

ASKER

Same error occured 20 minutes ago.

Faulting application name: svchost.exe_TSGateway, version: 10.0.14393.0, time stamp: 0x57899b1c
Faulting module name: aaedge.dll, version: 10.0.14393.2097, time stamp: 0x5a820a8e
Exception code: 0xc0000005
Fault offset: 0x000000000006e960
Faulting process id: 0x157c
Faulting application start time: 0x01d3bc99b7c19998
Faulting application path: C:\Windows\system32\svchost.exe
Faulting module path: c:\windows\system32\aaedge.dll
Report Id: 05d8ad6f-4813-4533-b789-ff18664e5e51
Faulting package full name:
Faulting package-relative application ID:
Did you do the inplace upgrade by now? It does not hurt and very often fixes internal problems.
Avatar of DI

ASKER

We made a completely new test-environment. One Gateway/Connection broker and one session host. Same issue occur.
If no 3rd party software is involved in your test that might interfere, I am afraid, you will have to ask Microsoft support for help.
We are actually facing this same issue the past couple days also, we can't find any kind of resource and going through all logs to figure it out....

Event 700

The following exception code "3221225477" occured in the RD Gateway server. The RD Gateway will be restarted. No user action is required.

Event 1000 -  Application Error

Faulting application name: svchost.exe_TSGateway, version: 10.0.14393.0, time stamp: 0x57899b1c
Faulting module name: aaedge.dll, version: 10.0.14393.2097, time stamp: 0x5a820a8e
Exception code: 0xc0000005
Fault offset: 0x000000000006e960
Faulting process id: 0x338
Faulting application start time: 0x01d3cb66dfec7049
Faulting application path: C:\Windows\system32\svchost.exe
Faulting module path: c:\windows\system32\aaedge.dll
Report Id: 6452344d-ea55-4f56-8024-d4af82862b2a
Faulting package full name:
Faulting package-relative application ID:
RD Gateway servers: Windows Server 2016

RD Virtualization Host: 2016

Workstation: Windows 10 2016 Enterprise
Please update the servers in question: https://support.microsoft.com/en-us/help/4096309 (Server 2016 March 29 latest).
Thank you for your help, I have updated my server and will update tomorrow how it's going.
Avatar of DI

ASKER

We already updated with the latest patches in our test-environment on 3rd of April. The issues still occured on 4th of April.

The file-version mentioned in event 1000 are not changed in the latest updates.

I tried contacting Microsoft support, but so far they have not been able to help solving this problem.
Same here, we applied the suggested update on Thursday and it happened twice on Friday and once today.
Still no luck, I have a case open with Microsoft but not really moving fast. We did collect some Dump file that they are analyzing.  The dump shows this error...

In svchost1.exe.5468.dmp the assembly instruction at aaedge!CAAHttpServerTransport::WebSocketSendLoop+48c in c:\Windows\System32\aaedge.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to write to memory location 0x000000a0 on thread 32
Avatar of DI

ASKER

I tried enabling Websockets feature in IIS as a test yesterday, but still the same problem. Error occurred two times today.
Update: I have been working with Microsoft for going on a week now... They are aware of the issue and say it is an ongoing issue. This morning they collected RDS Traces and now I am waiting for an engineer to look over the results and get back to me in 1-2 days.
Avatar of DI

ASKER

Thanks for the update Thomas. Lets hope you have better luck with MS support. They closed our case even if the problem still occur.
I am having the same issue...

aaedge.dll version is 10.0.14393.2097

Do you guys who have a ticket opened new from Microsoft??
I currently have a ticket opened and they have collected event logs, Crash Dumps, and RDS Traces and just waiting for the Engineers to analyze them. I have had the case open for about three weeks waiting for an answer. All they have told me was that it is an ongoing issue for all users and they are looking into it.
We have went back and forth over and over with Microsoft. They did put our server in testing mode and applied a test patch that didn’t work. But I believe I found the problem and a temp fix. Our Wyse thin clients were on firmware 8.5 and I rolled them back to 8.4 last Friday. So far this week I have not had a single crash and I also moved four more users over. It’s not a permanent solution of course but hopefully we can stay on 8.4 until they work it out
Oh man, that's painful. :S

I'm glad to hear that the source may finally be pinpointed.
I opened a case with Dell Wyse and received the following in an email today...

"Dear Thomas,
 In order to inform that we received an update from our senior engineer that Gateway service crashing In RDS server 2016 is a known issue in 8.5_009 version and it has already been reported to our next level engineer team ."

Looks like they have known about it and I can confirm that rolling back to 8.4 has temporary resolved the issue.
If anyone is having this same issue with a 2016 Windows Server and dell Wyse thin clients with firmware 8.5 please let me know. Dell is aware of the issue and is trying to collect logs from users who are facing this same issue.
Avatar of DI

ASKER

We have as previously mentioned the same problem and we also have a case open with Dell.

There is also another customer over at technicalhelp.de forum that have the same problem.
I also have the same problem, and open tickets with MS and Dell.  The issue is related to the websockets protocol.  Dell provided a demo build of WTOS that allows the websockets protocol to be disabled; without websockets, which is what 8.4xxx and prior use, no crashing of the RDGW service.  As of today, MS engineering is still 'reviewing debug logs.'  Dell's solution is that they are building an official version that simply allows the end user to disable websockets - not much of a fix if you ask me.  However, I believe the actual crashing is something that MS should and can fix, since the RDGW app itself is not handling exceptions correctly.
Microsoft has replied to my case that the bug has been identified and that a fix will be available at the end of August 2018. They did not provide any details, although they did say that they could provide me with the patch KB number when available. If they actually do, I will update this post.
Hello,

i'm experiencing this issue as well, with windows server 2016 and Dell Wyse thin client 8.5 Firmeware.
aaedge.dll is up to date !

@Leigh Warner please let us know that KB number once you get it.
Easiest thing to do is rollback to 8.4, Wyse is testing a patch but not sure when it will be released publicaly. Microsoft tested a patch but failed. Microsoft and Wyse both say that the final fix is planned to be released in Windows Server 2019.
From Dell Wyse
Hi Thomas,

Good day. I hope you are doing well. I am Goutam Seshadri from Dell Wyse escalation support team.

I was assisting Naresh on this escalation to get the demo code from our engineering team. Glad to hear about the resolution through demo code. I verified with the engineering was informed that the official release with the ini parameter is scheduled for next quarter. You shall wait for it or continue to deploy the demo code already given, provided please self-certify the code testing on 2-3 clients to ensure it meets your requirement and no further issues are seen. Once official firmware is released, you shall proceed with the upgrade.

FYI, the demo code is not supported by our standard support team. You shall need to reply to this email for any issues seen in demo code.

Regards,
Goutam Seshadri
Enterprise Tech Support Advisor

From Microsoft
Hello Thomas,

Hope you are doing fine.

I am the senior engineer with Microsoft.
I would like to let you know that the issue which you are facing with RD Gateway service crashing will be fixed in 2019 update.
Hence as per the scope of the ticket we will have to keep the ticket on hold for now till the fix is released.

With Best Regards,
Niyoti Pathak       
Support Engineer
Windows Platforms User Experience
Broad Commercial Team
Customer Service & Support
Microsoft has replied to my case that the fix is available in patch KB4343884 released Aug 30. This is for Windows Server 2016 build 1607.

After extensive testing, I have found that the GW service crash and subsequent disconnects no longer occur.  However, we seem to have a new problem; after some time, the client (in our case Wyse 3030LT thin clients) appears to hang, with a static image on the screen. What's happening is that the video stream has stopped - you can do a RD Shadow session and see that the session is still actually working.

It is interesting to note that this fix is only posted for build 1607, so I'm wondering if the newer Server 2016 builds have this behavior.  Time for more testing!
Newer builds are Semi-Annual Channel (SAC) and are Server Core only. There's no RDS in there.

The next iteration for RDS will be Windows Server 2019.
Avatar of DI

ASKER

I just installed KB4343884 on my test-environment today.

Will start testing it now!
Avatar of DI

ASKER

No crash yet!
To early to say With just one WYSE Client Connected to my test enviroment, but so far it looks promising.

"Addresses an issue that causes users to disconnect from a remote session when the Remote Desktop Gateway service stops working."

Ref: https://support.microsoft.com/en-us/help/4343884/windows-10-update-kb4343884
ASKER CERTIFIED SOLUTION
Avatar of DI
DI

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial