DAG member shows status of Failed Passive for all DB copies

I have an Exchange 2016 on premises DAG. One of the members was fine and was acting as the Active copy for a couple of databases. There were patches to be installed so the active copies were moved to other members. Since then I have been unable to make the member active again. The copy always has a status of Passive Failed. The error generated is:-

The Microsoft Exchange Replication service encountered an unexpected error in log replay for database 'DB01-AU\LFGEXSYD1'. Error MapiExceptionNetworkError: Unable to mount database. (hr=0x80004005, ec=2423)Diagnostic context:    Lid: 65256      Lid: 10722   StoreEc: 0x977         Lid: 1494    ---- Remote Context Beg ----    Lid: 59596   dwParam: 0x4BFCBE5  Msg: SM01    Lid: 59596   dwParam: 0x4BFCBE5  Msg: SM02    Lid: 59596   dwParam: 0x4BFCBE5  Msg: SM03    Lid: 59596   dwParam: 0x4BFCBE5  Msg: SM04    Lid: 59596   dwParam: 0x4BFCBE5  Msg: SM05    Lid: 39576   StoreEc: 0x977         Lid: 59596   dwParam: 0x4BFCBE5  Msg: SM06    Lid: 35200   dwParam: 0x1908    Lid: 59768   StoreEc: 0x977         Lid: 59596   dwParam: 0x4C00BBD  Msg: SM07    Lid: 59596   dwParam: 0x4C00BBD  Msg: SM08    Lid: 59596   dwParam: 0x4C00BBD  Msg: SM12    Lid: 35200   dwParam: 0x1908    Lid: 1750    ---- Remote Context End ----    Lid: 1047    StoreEc: 0x977    

In Event Viewer I see ID126 regularly:-

At '24/08/2018 2:16:07 AM' the Exchange store database 'DB01-AU' copy on this server encountered an error that caused the database to be dismounted. For more detail about the failure, consult the Event log on the server for other "ExchangeStoreDb" or "msexchangerepl" events. A successful failover restored service.

I was also getting errors about the FastSearch service but rebuilding the indexes from scratch seems to have fixed that.

I have done a fair bit of googling and tried various things such as:-
Rolling back the patches on the server
Suspending the copy and then using Update from the active copy.
Deleting the copy completely and starting from scratch
Removing the member from the DAG and re-adding.

So I'm out of ideas at this point other than a server recovery build.

Any suggestions would be greatly appreciated.
smcauleyAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

AmitIT ArchitectCommented:
Can you answer few more questions:

Are these servers located at same site or they are on different site? Like Active/Passive
what patches you installed? OS or Exchange. Provide KB numbers.
How many NIC's you have on both servers?
Do you have separate NIC for replication?
Are these VM servers? If VM, all VM tools are up to date or not?
0
Michael B. SmithExchange & Active Directory ExpertCommented:
So this is the significant part: "Error MapiExceptionNetworkError: Unable to mount database. (hr=0x80004005, ec=2423)". 0x80004005 just means "call failed". But ec=2423 means that the two servers can't talk to each other.

Exchange servers needs to have both long-name (FQDN - e.g., server1.contoso.com) and short-name (NetBIOS - e.g., - server1) access to each other. That's something to check.

Also verify that the network adapters are properly configured. That's something else.

You write that you removed the server from the DAG and re-added it, is that correct? If so, that does point to something network-based. Are you using IP-less or IP-ed DAGs? If you are using IPs, then you can verify the cluster inside of FCM and see if that points you to something.

And don't forget the event log. :-)
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
Did you perhaps install .net 4.7.2 on your server?
0
Making Bulk Changes to Active Directory

Watch this video to see how easy it is to make mass changes to Active Directory from an external text file without using complicated scripts.

smcauleyAuthor Commented:
Sure...

Same site. Active/Passive. Patches were OS and the following:-

KB890830
KB4345592
KB4343898

One NIC per server, no separate replication network.
VM Servers, VM tools up to date.
0
smcauleyAuthor Commented:
Hi Edward,

I did install 4.7.2 as a pre-req when I deployed my Exchange 2016 servers a few months ago. Is that significant?
0
AmitIT ArchitectCommented:
Run this and share the result:
(Get-HotFix | sort installedon)[-1]
1
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
It is not supported at all on any version of exchange
2
smcauleyAuthor Commented:
Hi Michael,

NETBIOS and FQDN both work fine each way. It is an IP less DAG and no issues with communicating to members in other sites. I did remove this member from the DAG and readded it successfully. Not seeing much more in the logs other than what I mentioned originally. All very strange.
0
AmitIT ArchitectCommented:
Can you run cluster validation test. That can provide you more details, what is going on.
0
smcauleyAuthor Commented:
@Edward Ouch... Just reading that now. Is that the issue?
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
It can cause issues, you might want to look at rolling back or building a new DAG.
0
Michael B. SmithExchange & Active Directory ExpertCommented:
I have not heard of this particular problem with 4.7.2, but it's certainly possible.

This blog tells you how to remove a particular version of the .NET Framework and replace it. To be clear, your destination version should be 4.7.1.

https://blogs.technet.microsoft.com/exchange/2017/06/13/net-framework-4-7-and-exchange-server/
0
Michael B. SmithExchange & Active Directory ExpertCommented:
Also, both Amit and I have recommended a cluster validation/verification test. it could be very useful.
0
smcauleyAuthor Commented:
Amit, the only hotfix reported is KB3138962. No issues on the cluster checks...
0
smcauleyAuthor Commented:
Thanks Michael, attempting to rollback now.
0
smcauleyAuthor Commented:
I can't find the update KB number  listed as per the article. I'm assuming this is because I installed 4.7.2 using the Offline Installer. If I look at the roles and features however I only see 4.5 listed. So I'm unsure how to rollback at this stage.
0
Michael B. SmithExchange & Active Directory ExpertCommented:
KB4054530 is the update. The URL to download the offline installer is below:

https://support.microsoft.com/en-us/help/4054530/microsoft-net-framework-4-7-2-offline-installer-for-windows
0
AmitIT ArchitectCommented:
If you are not sure, I advise you to open MS case.
0
smcauleyAuthor Commented:
Thanks Michael, from the article it looks like the update to uninstall is KB4054566. Just uninstalling now.
0
smcauleyAuthor Commented:
Hi Michael, I'm now running 4.6.2. Is it safe to download the offline installer for 4.7.1 and install?
0
Michael B. SmithExchange & Active Directory ExpertCommented:
yes.
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
4.7.1 is fine to install. Is your DAG operational now after the downgrade?
0
smcauleyAuthor Commented:
Sorry, I've been travelling today so the testing has been a little sporadic. The server is now running 4.7.1 but I'm still seeing the same issue. For good governance I deleted all copies again and cleaned up the database and log drives on the failing server and restarted it. The strange thing is I keep seeing the one database called DB01 recreated after reboot on the server. It's a folder with a GUID which then contains sub folders, indexmeta, journal and ms. No edb file as such but I can't work out where it is picking that up from.

So I'm currently adding a copy of another database which doesn't show anywhere on that server. The seeding operation is running at the moment, I'll report back when it's complete.
0
smcauleyAuthor Commented:
Ok, so that didn't make a difference either, still the same behaviour. The only thing I'm seeing consistently in the logs are EventID 1026 followed by 1000.

Application: Microsoft.Exchange.Store.Worker.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.NullReferenceException
   at Microsoft.Exchange.Diagnostics.ExTraceInternal.Trace[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Int32, Microsoft.Exchange.Diagnostics.TraceType, System.Guid, Int32, Int64, System.String, System.__Canon, System.__Canon)
   at Microsoft.Exchange.Diagnostics.SystemTrace.ReportFailure(System.Exception)
   at Microsoft.Exchange.Diagnostics.SystemTrace.SafeInitialize(System.Reflection.Assembly)
   at Microsoft.Exchange.Diagnostics.SystemNetLogging..cctor()

Exception Info: System.TypeInitializationException
   at Microsoft.Exchange.Diagnostics.SystemTraceControl.Update(System.Collections.Generic.Dictionary`2<System.Guid,System.Collections.BitArray>, System.Collections.BitArray, Boolean)
   at Microsoft.Exchange.Diagnostics.ExTraceConfiguration.UpdateTrace(Microsoft.Exchange.Diagnostics.ConfigurationDocument)
   at Microsoft.Exchange.Diagnostics.ExTraceConfiguration.TraceConfigUpdate()
   at Microsoft.Exchange.Diagnostics.ExTraceConfiguration..ctor()
   at Microsoft.Exchange.Diagnostics.ExTraceConfiguration..cctor()

Exception Info: System.TypeInitializationException
   at Microsoft.Exchange.Diagnostics.BaseTrace..ctor(System.Guid, Int32)
   at Microsoft.Exchange.Diagnostics.Components.ManagedStore.Service.ExTraceGlobals.get_StartupShutdownTracer()
   at Microsoft.Exchange.Server.Storage.Worker.WorkerProcess.Main(System.String[])

1000
Faulting application name: Microsoft.Exchange.Store.Worker.exe, version: 15.1.1531.3, time stamp: 0x5b08b04b
Faulting module name: KERNELBASE.dll, version: 6.3.9600.18938, time stamp: 0x5a7ddf0a
Exception code: 0xe0434352
Fault offset: 0x0000000000008eac
Faulting process id: 0x2534
Faulting application start time: 0x01d43b8736a7707f
Faulting application path: C:\Program Files\Microsoft\Exchange Server\V15\bin\Microsoft.Exchange.Store.Worker.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll
Report Id: 7d9ad099-a77a-11e8-80c5-0050568ce7b5
Faulting package full name:
Faulting package-relative application ID:

Does anyone else have any suggestions or do I need to raise a case with Microsoft?

Thanks.
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
If you reseed your DAG with the delete content switch does it actually create the store on the other side? Can you advise if windows firewall or av is not blocking it?
0
Michael B. SmithExchange & Active Directory ExpertCommented:
The "folder with a guid" is your search index.

Store crashes are pretty rare these days. What CU are you on?
0
smcauleyAuthor Commented:
Sorry Edward, just to clarify you want me to try Update-MailboxDatabaseCopy DB\ServerName -DeleteExistingFiles?

When I have added the copy from the GUI I do see the database and log files replicated across to the other side so it doesn't appear to be blocked, it just won't mount and remains Passive Failed.

Firewall is turned off on both servers and there is no AV running.
0
smcauleyAuthor Commented:
Michael, running CU10. No content index state errors at all, just Passive Failed consistently.
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
Yes that is correct. Try reseed the database
0
smcauleyAuthor Commented:
Same issue. Content Index State Healthy and Passive Failed.
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
How is the networking between the 2? Ping working on the DAG name and can you ping each server from each side? When you created your DAG object did you grant rights on the account in AD for both computers to have full access to it?
0
smcauleyAuthor Commented:
no issue with networking between them. Ping via NETBIOS name and FQDN both work fine from both directions. It's an IP less DAG so nothing to ping.

Not sure I following on the AD account. Are your expecting the DAG to be there as a computer? If so it isn't.
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
Okay great. And CU version? You on the latest? Are these physical machines or virtual? If virtual have you tried to migrate to another node?
0
smcauleyAuthor Commented:
CU10 on VMWare. They were on different hosts so put them on the same one and rebooted.  Same issue though.
0
Michael B. SmithExchange & Active Directory ExpertCommented:
Is NET Framework 4.7.2 on the healthy node?
0
smcauleyAuthor Commented:
Hi Michael, I have 5 other healthy nodes all running 4.7.2. I'm not sure this is the issue so I'm would be cautious about changing those right now and risking further problems
0
Michael B. SmithExchange & Active Directory ExpertCommented:
It's not supported and it's known to cause networking issues. There are also issues known with the July Windows Server updates (https://blogs.technet.microsoft.com/exchange/2018/07/16/issue-with-july-updates-for-windows-on-an-exchange-server/).

I respect your concern.

I'm afraid the only other suggestion I have is to open a MSFT case. I can't speak for the other contributors.
0
Edward van BiljonMessaging and Collaboration Technical Lead (Exchange MVP)Commented:
If you log a support call with Microsoft they are going to tell you that you are not on a supported version of .net, I would strongly advise that you downgrade all nodes to 4.7.1 and then look at your DAG. It might seem to be working but .net 4.7.2 will not be supported on any version of exchange, ever, just like 4.7
1
smcauleyAuthor Commented:
Yes, I agree that's what I'm going to have to do. I suspect I will still have the same issue and will need to raise a case with MS anyway. As you said though they will just tell me I'm on an unsupported .net version.

I'll start doing that and report back.
0
smcauleyAuthor Commented:
I have downgraded all nodes to 4.7.1and  removed all copies from the node causing the issue. I then removed and re-added to the DAG and tried creating a new DB just on that node. When I try to mount the new DB i see the same error.

Error: Database action failed with transient error. Error: A transient error occurred during a database operation. Error: MapiExceptionNetworkError: Unable to mount database. (hr=0x80004005, ec=2423) Diagnostic context: Lid: 65256 Lid: 10722 StoreEc: 0x977 Lid: 1494 ---- Remote Context Beg ---- Lid: 59596 dwParam: 0x2E8731 Msg: SM01 Lid: 59596 dwParam: 0x2E8731 Msg: SM02 Lid: 59596 dwParam: 0x2E8731 Msg: SM03 Lid: 59596 dwParam: 0x2E8731 Msg: SM04 Lid: 59596 dwParam: 0x2E8741 Msg: SM05 Lid: 39576 StoreEc: 0x977 Lid: 59596 dwParam: 0x2E8741 Msg: SM06 Lid: 35200 dwParam: 0x429C Lid: 59768 StoreEc: 0x977 Lid: 59596 dwParam: 0x2ECB9D Msg: SM07 Lid: 59596 dwParam: 0x2ECB9D Msg: SM08 Lid: 59596 dwParam: 0x2ECB9D Msg: SM12 Lid: 35200 dwParam: 0x429C Lid: 1750 ---- Remote Context End ---- Lid: 1047 StoreEc: 0x977 [Database: test, Server

Any further suggestions or do I need to call Microsoft?
0
MAS (MVE)EE Solution Guide - Technical Dept HeadCommented:
1
smcauleyAuthor Commented:
Yes, I was made aware of that earlier in the thread. That's why I'm on 4.7.1, not 4.7 or 4.7.2.
0
AmitIT ArchitectCommented:
MS case is better option. From my exp if you have .net issue, you might end up rebuilding your server again.
1
smcauleyAuthor Commented:
Ok, I have a resolution. Logged a case with MS Support and after a few hours of troubleshooting they replaced all the Store files in the Bin folder with fresh copies from my CU10 installation folder:-

Microsoft.Exchange.Store.Service.exe
Microsoft.Exchange.Store.Worker.exe
Microsoft.Exchange.Store.ObjectsService.exe
Microsoft.Exchange.Store.Service.exe.config
Microsoft.Exchange.Store.Worker.exe.config

After that I was able to mount databases without an issue. Thanks everyone for your comments.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Michael B. SmithExchange & Active Directory ExpertCommented:
I don't want to belabor this - but you had corrupted files?

How in the HELL did the files get corrupt?

I seriously recommend you validate your backup procedures!
0
smcauleyAuthor Commented:
They said some form of corruption but basically they didn't know. Not much point in continuing to try and find out why when the issue is resolved.

I back up the passive copies of my mail databases. This server was not one of those. My backups are fine and are tested very regularly.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Database Availability Group (DAG)

From novice to tech pro — start learning today.