Link to home
Start Free TrialLog in
Avatar of Jim Semmelroth
Jim SemmelrothFlag for United States of America

asked on

Mail flow from Edge Server to Mailbox Server Hangs

Hello Experts

Within about 24 hours after a reboot of my edge server, mail stops flowing from the edge server to the mailbox server.

I have one edge server and one mailbox server.  The mailbox is Exchange 2016 on Server 2012 R2 and the edge is Exchange 2016 on Server 2016.  The edge server is in a public-IP-addressed DMZ.  This edge server is a new deployment.  The old edge server was was on the same subnet as the mailbox server and did not have this problem.

This has been going on for a week.  I restore mail flow by rebooting the edge server with its windows firewall turned off.  Then mail flows to the mailbox server for about a day.  When I reboot the edge server with the windows firewall still up, mail does not start flowing.  I turn the windows firewall back on immediately after rebooting.  

The Application event log on the edge server shows events 1022, 12025, and 8019 after reboot regardless whether the firewall was up or down.

Nslookup on edge shows that it obtains DNS for the Active Directory DNS server and for the mailbox server.

Telnet port 25 from edge to mailbox, and mailbox to edge, is successful.

A port query tool on the mailbox server indicates port 50636 on edge is listening.

Test-EdgeSynchronization is always "Normal".

The edge firewall has an inbound rule allowing traffic on all ports from the mailbox server.

I believe this all shows that DNS and edge synchronization are working correctly.

As time goes by, after a reboot of edge, the connections indicated in Event ID 8019 will start incrementing.  Right after the reboot Event ID 8019 shows the following:

"Creating extra connection for idle queue: 3 with queue type: SmartHostConnectorDelivery and next hop domain: mailbox.blah.blah. Current number of connections is: 1"

As time goes by the "number of connections is:" will increment up to 19 and then will go no higher.  I don't know yet whether mail stops flowing when the connections start incrementing or when they reach 19.

I figure this has to be a edge firewall issue, but I have been unable to nail it down. The above confirms that the ports indicated as necessary at the following link are available. Thanks.

https://docs.microsoft.com/en-us/exchange/plan-and-deploy/deployment-ref/network-ports?view=exchserver-2019
Avatar of Tony J
Tony J
Flag of United Kingdom of Great Britain and Northern Ireland image

Have you checked through the Edge Transport logs?

%ExchangeInstallPath%TransportRoles\Logs\Edge

Various folders in there with logs in them but I'd probably start with the transport service and SMTP send connector logs  here: %ExchangeInstallPath%TransportRoles\Logs\Edge\PipelineTracking and here: %ExchangeInstallPath%TransportRoles\Logs\Edge\ProtocolLog\SmtpSend
Avatar of Jim Semmelroth

ASKER

Tony, thanks for the tip.  I have enabled both logs on the edge server and am waiting for messages to stop flowing.  I will report back once I see something interesting in the logs.

Jim
Tony,

In the SMTPSend log, I get the following pattern repeated over and over until I start up mail flow again.  I have found I can get mail flow from edge to mailbox simply by turning off the firewall for a while and than back on.  I don't have to reboot.  Mail will then continue to flow with the firewall up, until such time as it doesn't anymore.

2018-11-13T18:11:48.090Z,EdgeSync - Inbound to Default-First-Site-Name,08D64976D0EC3720,0,,192.168.555.555:25,*,SendRoutingHeaders,Set Session Permissions
2018-11-13T18:11:48.090Z,EdgeSync - Inbound to Default-First-Site-Name,08D64976D0EC3720,1,,192.168.555.555:25,*,,attempting to connect
2018-11-13T18:11:48.090Z,EdgeSync - Inbound to Default-First-Site-Name,08D64976D0EC3720,2,216.55.55.55:1029,192.168.555.555:25,+,,

I was able to send an email in as the designated sender for a pipeline trace while mail was not flowing from edge to mailbox, but none of those logs look different than a set when mail flow is working.  There are a lot of those logs.  Are some supposed to be more interesting than others?

Does the above log snippet suggest what I should look at next?

Jim
ASKER CERTIFIED SOLUTION
Avatar of Tony J
Tony J
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Tony,

That log has provided a wealth of clues.  It has allowed me to correct an unrelated mistake of which I was unaware, and reminded me to block RDP from the outside.

It has also suggested that the source of my pain is the result of a lack of coordination between my firewall configuration and my routing configuration between the inside LAN and the DMZ.  I am testing this now.  I won't be confident about this fix until I let it run for a day or so.

I'm a newbie here but I presume once I know I have the solution, I should detail it for others reading this thread.  Yes?  Then I'll also mark it as the solution.

Thanks for the tip about the firewall log.

Jim
Hi Jim

Sorry for the delay replying - I did read your last comment but was up to my ears and this is the first chance I've had.

Yes please - if you can come back and share the cause and what resolution worked for you. It's especially important when the issue is as esoteric as you are seeing as, as you've found yourself, there's no real information out there.

If you feel your solution is worth marking as the valid solution, that is absolutely fine. If you think someone (me, in this case, of course) helped to guide you down the right path, you can split the points accordingly.

To be honest, I would have suggesting the firewall log sooner but I wanted to be sure nothing was showing up in the transport logs first.
Hi Tony

The source of all my joy regarding this incident was learning of the existence of the Windows Firewall log and how to turn it on.  So that is where the points go.

The bottom line is the log revealed to me that the traffic moving between my edge and mailbox servers did not follow the route for which I had opened firewall ports.  Learning that, I was able to put a persistent route on my edge server which seems to have solved the issue.

Thanks Tony.

Jim
Brilliant news! Glad you managed to work it out - I almost mentioned the routing log but made the presumption that as mail was flowing for a while, the routing must be ok: so that's something I've learned to look out for as well!

Glad I could help to point you in the right direction but well done on tracking it down. Excellent work.

Thank you for the points.

By the way - welcome to EE :)