Exchange 2013 mail flow broken

We have an Exchange 2013 VM in Hyper-V running CU3.  We took a checkpoint and attempted to install CU9, which failed.  We then applied the previous checkpoint, expecting everything to go back as was; however no email can be sent or received.  Users can connect to mailboxes, but if mail is sent it shows in Sent Items but is not sent (if sent from Outlook) or sticks in Drafts (if sent from OWA).

If we telnet to the server's external address on port 25 the connection is made but there is no response from the server; it's the same when connecting from another machine on the internal network (e.g. the DC).  Connecting from the server itself locally by telnet gives the correct, expected response.

We have uninstalled the F-Secure for Exchange AV, and disabled the Malware Agent filter, and also disabled the Windows Firewall - but there's no change to the problem.  We also deleted and recreated the Default Frontend receive connector, also with no change.

Nothing shows in the mail queue.  Test-SMTPConnectivity reports "success" for all connectors.  Test-Mailflow reports *FAILURE*.

How can we restore normal mail flow?  Thanks in advance for any assistance.
LVL 2
David HaycoxConsultant EngineerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Simon Butler (Sembee)ConsultantCommented:
Snapshots of Exchange are not a supported recovery method. You shouldn't use them at all.

I am pretty sure that the problem is that you have attempted to install CU9 and then tried to go back. Exchange is now confused because the domain is expecting CU9 to be there.

Why did CU9 fail to install? Did it give a reason?

You need to get the server back to CU9, as that is what the domain is expecting.

If you were my client I would be advising a new installation of Exchange 2013 CU9, move all the mailboxes to the new server. The rollback with a snapshot will have caused damage because the domain is expecting one thing and you have another. Exchange is not a standalone product that can be treated like a regular application. It is heavily integrated in to AD and the installation of updates makes changes to the AD which are not reversible, or cannot be reversed by the rollup.

You can try reinstalling CU9 - but that may cause further damage or fail because the domain will think that CU9 is already installed. The other option is to call Microsoft and see if they can assist. However they may tell you that you aren't supported because you used a snapshot.

Just restoring mail flow is not going to fix the problems you have.

Simon.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
David HaycoxConsultant EngineerAuthor Commented:
Wow, fair enough.

CU9 failed on step 10 of 18, with a page full of error messages.  We didn't take a note of them, but it was transport-related.

A new installation of Exchange is possible, so we'll do that if needs be.  In the meantime we can restore both the DC and the Exchange VMs to a previous state (we stopped incoming email from outside before starting work, and the DC is only used for this Exchange server).

If this doesn't work, is it possible to restore the mail flow in the interim - giving us time to deploy a new Exchange server without user impact?

Thanks
Simon Butler (Sembee)ConsultantCommented:
AD is a living product.

Restoring the domain controllers and Exchange is just going to continue to cause you problems.
Build a new machine with Windows 2012 R2 if you have the licences and then use the CU9 to install the new server. It shouldn't take more than two hours to get to that stage. You can then start moving mailboxes.

CU3 hasn't been supported for over a year - MS only support the latest and previous CU. If you had transport errors during the installation of CU9, that is probably why it isn't working now.

Simon.
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

David HaycoxConsultant EngineerAuthor Commented:
Fair comments, I'll start working on a new server now.

Surely though if we restore both machines from backups that were taken minutes apart (Altaro which is Exchange and AD-aware) then it should be back as is (which was working)?

Thanks again

David
Simon Butler (Sembee)ConsultantCommented:
I don't know if the restore would work, because it is simply a situation I would never find myself in.

If a client asked after they had done so, then they would get the same advise that I provided to you above.

Simon.
David HaycoxConsultant EngineerAuthor Commented:
Ok, thanks.  So if you were applying e.g. CU9 to a working single server, how would you recover if it failed?
David HaycoxConsultant EngineerAuthor Commented:
Sorry, let me reword that question as you've already answered it (set up a new Exchange server).  What steps would you take before a CU install to mitigate against a possible failure?  Just regular Exchange-aware backups?
Simon Butler (Sembee)ConsultantCommented:
It depends on what the failure is.
The logs will be key, as they will say what the problem is. If I can resolve it, then I will, then resume the update installation. It will pick up from where it left off if possible. Rarely do I have to rebuild the server. The only reason you are having to do so is because you rolled back.

As for preparation - about the only thing you can do is an Exchange aware backup of the database. Transport errors are often caused by third party tools - AV, Antispam, so those should be disabled or have the correct exclusions in it.

Simon.
David HaycoxConsultant EngineerAuthor Commented:
What we think happened is that F-Secure (specifically the transport agents) caused the failure of the CU9 install.  So after having rolled back to the checkpoint taken beforehand, we resolved the problem by:

1. Uninstalling all F-Secure products (probably just the Exchange and Anti-Spam would have done it though).
2. Reinstalling CU9.
3. Reinstalling F-Secure.

While we didn't see any evidence of problems having been caused by snapshots / checkpoints on this occasion, we take the advice on board and will avoid using these in future (to be honest, this was the first time we had used them and only because we wanted to be able to roll back in the event of failure of the CU install).  We will shortly be deploying a DAG which should mitigate against this in the future, all being well (and for good measure will be moving all mailboxes away from this server).

Thanks for the advice.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.