Sam_Rendell
asked on
Computers can't see their AD accounts and won't allow domain logon. Computer accounts are present and not disabled.
Hi everyone,
This has been going on for some time originally just with user machines after quite a bit of searching I never found the cause of the problem and was just rejoining the machines to the domain. I tend to find once it has happened to a machine it is more likely it will happen again.
Recently this has become a serious problem as servers have started to do the same. Domain controllers remain unaffected however when the exchange server refuses to it's computer account it is a serious pain!
I have tried removing the machine from the domain, ensuring that the computer accounts are completely gone then remaking the account and rejoining the domain. I can't really change the server names as this would cause too much trouble.
So far it seems to be happening it random some times the same machine will go twice in a week some times they will all stay happy for a month.
However when it does happen it will often be several at once.
I have one server at the moment experiencing the problem which I have deliberately left that way to try and trouble shoot the issue. The server it's self logged no errors just before it developed the problem. The DC's also show no relevant errors. The server is fully updated and antivirused. As are the other servers that show the same problem.
Any suggestions other than re-add to the domain are welcomed as that's all I've found so far.
S.
This has been going on for some time originally just with user machines after quite a bit of searching I never found the cause of the problem and was just rejoining the machines to the domain. I tend to find once it has happened to a machine it is more likely it will happen again.
Recently this has become a serious problem as servers have started to do the same. Domain controllers remain unaffected however when the exchange server refuses to it's computer account it is a serious pain!
I have tried removing the machine from the domain, ensuring that the computer accounts are completely gone then remaking the account and rejoining the domain. I can't really change the server names as this would cause too much trouble.
So far it seems to be happening it random some times the same machine will go twice in a week some times they will all stay happy for a month.
However when it does happen it will often be several at once.
I have one server at the moment experiencing the problem which I have deliberately left that way to try and trouble shoot the issue. The server it's self logged no errors just before it developed the problem. The DC's also show no relevant errors. The server is fully updated and antivirused. As are the other servers that show the same problem.
Any suggestions other than re-add to the domain are welcomed as that's all I've found so far.
S.
ASKER
The server in question did have an external DNS server registered but only as backup it was resolving correctly. I have removed the external server completely and rebooted but this has not solved the issue.
S.
S.
ASKER
Just wanted to add that I have checked all the other servers and none of them have external DNS servers registered.
S.
S.
Check if your SRV resource entries are correctly registered on your DCs:
How to verify that SRV DNS records have been created for a domain controller
http://support.microsoft.com/?kbid=816587
How to verify that SRV DNS records have been created for a domain controller
http://support.microsoft.com/?kbid=816587
ASKER
All DCs have _kerberos and _ldap entries for their site.
I'll provide a little more back ground maybe it will help.
Our WAN presently contains 18 sites which have their own DC, DNS server. Each site has a site container in ADS&S with the appropriate subnet applied to it. Each site also has a folder with the correct SRV records in it in AD.
S.
I'll provide a little more back ground maybe it will help.
Our WAN presently contains 18 sites which have their own DC, DNS server. Each site has a site container in ADS&S with the appropriate subnet applied to it. Each site also has a folder with the correct SRV records in it in AD.
S.
A common cause of this is computers with duplicate SIDs (often as a result of cloning a machine and not using SYSPREP), as a result computers, often with different names, try to identify themselves to Active Directory with the same SID, AD naturally gets totally confused and authentication issues occurt with the computer account which in turn stops users logging in,
The solution is to remove the computer from the domain and run NewSID http://technet.microsoft.com/en-us/sysinternals/bb897418.aspx to generate a new and unique SID, then rejoin the computer to the domain. This will prevent the issue re-occuring on that computer.
The solution is to remove the computer from the domain and run NewSID http://technet.microsoft.com/en-us/sysinternals/bb897418.aspx to generate a new and unique SID, then rejoin the computer to the domain. This will prevent the issue re-occuring on that computer.
ASKER
I could give that a try but none of the servers have been cloned, they were all built from scratch.
S.
S.
Have any workstations been cloned - that is normally the issue.
ASKER
Yes work stations are cloned without sysprep but they rarely suffer the problem and it's not that concerning. It's the servers I'm worried about.
S.
S.
What service pack are you running on the Server?
__________________________ __________ __________ __________ __________ _
SP1 issues can cause intermittant communications:
There is another issue with SP1 for 2003 server. Are you running Service pack 1 on your server? I have seen intermittant DNS caused by NIC flooding due to a discrepancy in SP1. You may/may not also see intermittant DHCP and event ID 333 with this issue.
http://support.microsoft.com/default.aspx?scid=kb;en-us;898060
__________________________ __________ __________ __________ __________ _
SP2 issues are networking:
I haven't really read these in great detail.
http://www.lan-2-wan.com/2003-SP2.htm
__________________________ __________ __________ __________ __________
What are the switches and router make and model? And are them managed?
>>>This really sounds like a portfast issue.
https://www.experts-exchange.com/questions/23147843/Windows-Server-2003-dropping-clients.html
A little explaination of spanning tree and portfast.
http://itt.theintegrity.net/pmwiki.php?n=ITT.Spanning-TreeAndPortfast
(NOTE: Portfast is necessary for XP clients. XP clients will time out otherwise.)
An Event error usually associated with a Spanning tree portfast problem:
Event ID 5719, spanning tree portfast:
http://support.microsoft.com/kb/247922
__________________________ __________ __________ __________ __________ __________ ___
Also it could be the mode of operation. Cisco products have a quirk to them. They both have to be on the same mode. If a switch is configured to "Auto negotiate" while the router is configured to 100mb full duplex, you will see intermittant comms between them. I know it doesn't seem right that they don't work.
__________________________ __________ __________ __________ __________ __________ _____
What I know is SP1, portfast, and the Cisco mode of operation usually are hard to see in event viewer or dcdiag reports. So, they are not easy to track down. I know little of the SP2 discrepancies. I need to read up on those myself.
__________________________
SP1 issues can cause intermittant communications:
There is another issue with SP1 for 2003 server. Are you running Service pack 1 on your server? I have seen intermittant DNS caused by NIC flooding due to a discrepancy in SP1. You may/may not also see intermittant DHCP and event ID 333 with this issue.
http://support.microsoft.com/default.aspx?scid=kb;en-us;898060
__________________________
SP2 issues are networking:
I haven't really read these in great detail.
http://www.lan-2-wan.com/2003-SP2.htm
__________________________
What are the switches and router make and model? And are them managed?
>>>This really sounds like a portfast issue.
https://www.experts-exchange.com/questions/23147843/Windows-Server-2003-dropping-clients.html
A little explaination of spanning tree and portfast.
http://itt.theintegrity.net/pmwiki.php?n=ITT.Spanning-TreeAndPortfast
(NOTE: Portfast is necessary for XP clients. XP clients will time out otherwise.)
An Event error usually associated with a Spanning tree portfast problem:
Event ID 5719, spanning tree portfast:
http://support.microsoft.com/kb/247922
__________________________
Also it could be the mode of operation. Cisco products have a quirk to them. They both have to be on the same mode. If a switch is configured to "Auto negotiate" while the router is configured to 100mb full duplex, you will see intermittant comms between them. I know it doesn't seem right that they don't work.
__________________________
What I know is SP1, portfast, and the Cisco mode of operation usually are hard to see in event viewer or dcdiag reports. So, they are not easy to track down. I know little of the SP2 discrepancies. I need to read up on those myself.
ASKER
Thanks lot sto check there I'll get on it. I should point out that the problem never goes away on it's own, so once I get the error once the server will not allow domain account on again until I remove from and rejoin to the domain. So my first thought is it's not a comms issue but I''ll read the above anyway.
S.
S.
ASKER
Ok from the above I can say.
Servers are running SP2, except one Win2k SP4 server which also has the problem, sorry I forgot this one earlier.
None of the SP2 networking errors really match our problem. None of the error codes listed were found.
Our backbone switch is managed it's a 3com 4500G which is presently at default settings. I'm not sure obviously, but I would have thought that if this was the problem, that rejoining the machine to the domain wouldn't fix it?
I did find 5719 on one server a couple of times, but not around the time it lost the domain.
S.
Servers are running SP2, except one Win2k SP4 server which also has the problem, sorry I forgot this one earlier.
None of the SP2 networking errors really match our problem. None of the error codes listed were found.
Our backbone switch is managed it's a 3com 4500G which is presently at default settings. I'm not sure obviously, but I would have thought that if this was the problem, that rejoining the machine to the domain wouldn't fix it?
I did find 5719 on one server a couple of times, but not around the time it lost the domain.
S.
>> Yes work stations are cloned without sysprep but they rarely suffer the problem and it's not that concerning <<
This will cause authentication errors on the workstations, it needs to be recified as it will get progressively worse
This will cause authentication errors on the workstations, it needs to be recified as it will get progressively worse
ASKER
I will pass the NewSID link on to the work station man thanks, but it's not the objective here.
S.
S.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
I've taken a look at this, under properties>computer name, the name is normal. It is also normal in AD, however in the Security window for files and folders it comes up with a $. Is this what you meant or is that normal?
If this is the problem should it fix a machine without it being removed from and readded to the domain?
S.
If this is the problem should it fix a machine without it being removed from and readded to the domain?
S.
ASKER
I made the registry change on the server that was experiencing the problem and restarted it. This didn't solve the problem.
S.
S.
If you added this via GPO, you may have to do a GPUPDATE to pass this down to the clients.
ASKER
I changed the registry directly on the server showing the problem but there was no change. I have applied the GP to the rest anyway just in case.
S.
S.
You know, I reread the hole post and think KCTS has it absolutely correct. I think you should follow his advice.
ASKER
Could someone please go into more detail as to how cloned workstations can cause non-cloned servers to do this?
S.
S.
Cloned machines have the same SID (Security Identifyer), as you know. Running sysprep gives each machine a Unique ID. Same with Newsid>
Side note: Newsid, from what I have heard, is not the recomended practice. Sysprep is.
When one computer logs on with a Unique SID, it will network out to the server and requests access with the server. The server will reply with what kind of Security Identification do you have? The client responds with the security Identification and a some other encrypted forms of ID. Then, Kerberose, verifies that and sends out a Access ticket. That access ticket is used to identify what networking and file services you are able to access. It does this by matching it up to the ACL, (Access control List).
A little more info on the ACL, where it says:
http://en.wikipedia.org/wiki/Access_control_list
""In networking, ACL refers to a list of rules detailing service ports or (network) daemon names that are available on a host or other layer 3 device, each with a list of hosts and/or networks permitted to use the service. Both individual servers as well as routers can have network ACLs. Access control lists can generally be configured to control both inbound and outbound traffic, and in this context they are similar to firewalls.""
You are not able to join the domain because of a kerberose violation. Kerberos sees two computers with the same sid, trying to get an access ticket. Kerberose want a unique identifyer to each node on the network, so ACCESS controlls can be sent to that computer.
How does this effect your servers, your servers can not have two computer SIDs within Active directory or they will conflict and kerberos will not know what computer is what.
There isn't much written on the roots of kerberos and chronology of a kerberos or ACL transaction due to that info being used for melicious attacks.
Regardless of all of that>>KCTS provided you with the right answer. A cloned machine that doesn't have its own unique SID will cause problems with the roots of your domain's authentication and access processes.
Side note: Newsid, from what I have heard, is not the recomended practice. Sysprep is.
When one computer logs on with a Unique SID, it will network out to the server and requests access with the server. The server will reply with what kind of Security Identification do you have? The client responds with the security Identification and a some other encrypted forms of ID. Then, Kerberose, verifies that and sends out a Access ticket. That access ticket is used to identify what networking and file services you are able to access. It does this by matching it up to the ACL, (Access control List).
A little more info on the ACL, where it says:
http://en.wikipedia.org/wiki/Access_control_list
""In networking, ACL refers to a list of rules detailing service ports or (network) daemon names that are available on a host or other layer 3 device, each with a list of hosts and/or networks permitted to use the service. Both individual servers as well as routers can have network ACLs. Access control lists can generally be configured to control both inbound and outbound traffic, and in this context they are similar to firewalls.""
You are not able to join the domain because of a kerberose violation. Kerberos sees two computers with the same sid, trying to get an access ticket. Kerberose want a unique identifyer to each node on the network, so ACCESS controlls can be sent to that computer.
How does this effect your servers, your servers can not have two computer SIDs within Active directory or they will conflict and kerberos will not know what computer is what.
There isn't much written on the roots of kerberos and chronology of a kerberos or ACL transaction due to that info being used for melicious attacks.
Regardless of all of that>>KCTS provided you with the right answer. A cloned machine that doesn't have its own unique SID will cause problems with the roots of your domain's authentication and access processes.
Can I just add that although NewSid is not "recommended" it is tried and tested, it is a useful saftey net which can be used to non-destructively chnage the SID on a machine when the recommended sysprep procedures have not been followed and dulplcate SID issues arise as a result. The link I gave details this and provides other information as well as a link to the download.
@ KCTS: good note!
I don't think I used proper etiquette. I was just trying to add another possibility that NTLMhash and Kerberos don't talk with eachother unless configured to do so. When I reread the post, I reallized you had it right.
Since I think you have the answer correct, I would like to take a back seat and watch you in action. Sorry for butting in.
@SAM & KCTS: I hope my comments helped.
I don't think I used proper etiquette. I was just trying to add another possibility that NTLMhash and Kerberos don't talk with eachother unless configured to do so. When I reread the post, I reallized you had it right.
Since I think you have the answer correct, I would like to take a back seat and watch you in action. Sorry for butting in.
@SAM & KCTS: I hope my comments helped.
ASKER
Thanks for sticking with me on this one guys it's not that I don't believe you it's just I like to make sure I understand a problem rather than just blindly fixing it.
Are you saying my servers have ended up with more than one SID some how? Or is it the cloned work stations that are stopping the servers authenticating?
I can obviously go through the process of removing NewSID'ing and re-adding the servers but we have hundreds and hundreds of work stations spread across the country. I'm going to have a hard time getting the resource signed off to have someone go round and do it to all of them. I know thats my problem but I want to be absolutly sure how to explain this to the boss.
Cheers guys your expertise is greatly appreciated.
S.
Are you saying my servers have ended up with more than one SID some how? Or is it the cloned work stations that are stopping the servers authenticating?
I can obviously go through the process of removing NewSID'ing and re-adding the servers but we have hundreds and hundreds of work stations spread across the country. I'm going to have a hard time getting the resource signed off to have someone go round and do it to all of them. I know thats my problem but I want to be absolutly sure how to explain this to the boss.
Cheers guys your expertise is greatly appreciated.
S.
ASKER
If anyone can answer the above question I'd really appreciate it, as this is going no where atm.
S>
S>
ASKER
If anyone can answer the above question I'd really appreciate it, as this is going no where atm.
S>
S>
ASKER
As I suspected I had no joy getting the go ahead to rejoin all the company machines to the domain. However 2 months after having implemented this suggestion the problem has not reocured.
Thanks to all for your help.
S.
Thanks to all for your help.
S.
In short: make sure that *all* domain members (including the DCs!) use *only* your DCs as DNS servers; for external lookups, configure forwarders on your DCs to point to your ISP's DNS servers. The Forwarders section is the *only* place in your network where an external DNS server may appear.
Check these articles for details:
10 DNS Errors That Will Kill Your Network
http://redmondmag.com/features/article.asp?EditorialsID=413
Frequently asked questions about Windows 2000 DNS and Windows Server 2003 DNS
http://support.microsoft.com/?kbid=291382
Best practices for DNS client settings in Windows 2000 Server and in Windows Server 2003
http://support.microsoft.com/?kbid=825036