DFSR guru question - crash recovery
Posted on 2013-01-22
The other day we experienced a network outage that caused our DFSR 2008 R2 infrastructure to crash hard. After bringing the systems back online, DFS replication did not begin for hours though no matter what we did we could not see any problems in the logs. We sat and waited for maybe 8 hours and even though replication and health checks passed and we could find no DFSR specific events in the logs and replcation would not continue. Over night, magically, it began again. This lead us to the conclusion that what it was doing during the long pause before replication began was comparing every file with its replication partner and learning if there were changes or not that needed to be replicated. It needed to start from scratch because of the crash; it didn't know where it had left off.
My question is, does this sound like the correct assessment of the cause for the replication delay and do you know of a way to expose logging or event messages that would warn us in the future that this "start from the beginning comparison" process is in progress?