Link to home
Start Free TrialLog in
Avatar of lltc78
lltc78Flag for Australia

asked on

Domain Replication failing - secure channel problem

Hi guys,

I have an environment which consists of 10 writable domain controllers and approx 60 RODCs deployed in remote sites.

Just recently there has been a problem found where it looks like Secure Channel has been compromised or has become corrupt and DC replication looks to be failing. DC shares are not veiwable and there are kerberos errors being logged on all DCs (mainly Event ID 3 & 4).

This was first identified when DFS stopped working. I found that the namespace server DCs could not open when I ran a 'net view \\servername'.

A reboot did not resolve this, but after seeing the schannel event logs I found that after I stopped KDC, ran klist purge and then ran the "netdom /resetpwd /server:server2
/userd:domain.com\administrator /passwordd:password" command, these servers become available for shares and DFS started working again.

However the problem seems to have ramped up. After further digging and looking into the environment, I have found that all DCs are having the same problem. I cannot run 'net view \\servername' to any of them with successful results.

When I run nltest /server:hostname /sc_verify:domain.com, they come back with 'I_NetLogonControl failed: Status = 5 0x5 ERROR_ACCESS_DENIED'

I have fixed this on all the writable DCs by running the netdom resetpwd command, but it still has the error on all the RODCs.

What do you recommend? There are 60+ servers that are having this issue.
Surely there is a better way than logging on to each one and running this manually? And I don't really want to dcpromo them all.

To run the netdom resetpwd, it seems that it's best to stop the KDC service on ALL DCs? Is that correct? How do I do that to all DCs without logging onto all of them at the same time?

This is disastrous, and I'm hoping someone knows of a better way to fix the environment.

Root cause comes at a later date and is irrelevant at the moment. I just want this fixed before jumping into that.

Thanks guys
SOLUTION
Avatar of pwindell
pwindell
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The last time I seen anything similar to what you decribe,  was when I had the clocks get over an hour out of sync to due DST problems where I had some machine jump 2 hours ahead instead of 1 hour.
Avatar of lltc78

ASKER

I appreciate your comment pwindell...MS support have not been very helpful thus far, which is why I have posted on here to see if others have any ideas.

Fixing the incident is priority. Problem management always comes in a 2nd.
This is the way it always should work. Get systems operating and then find the permanent solution.

I wish it was a time issue. I have looked at that and it isn't the case.

I thought I mentioned it briefly in my original post, but it seems I didn't and it's a key piece of information regarding the root cause. This occurred after a penetration test had commenced.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of lltc78

ASKER

The solution was what I stated in my original posting, its just I was hoping for an alternative. Microsoft support confirmed this is the only way
Avatar of dazzlinz
dazzlinz

ltc78,

Question, I am also running into our secure channels breaking on a handful of systems. I'm suspecting (based on the Windows Event logs) its being caused by our IT-Security vulnerability scanning of our systems. Can you elaborate on your comment of "This occurred after a penetration test had commenced"?

It seems the vendor that's running the penetration test states they don't do anything intrusive to "break" anything but would love to hear more from you on this.

We have yet to verify the scan is the issue, or come up with a resolution.