Link to home
Start Free TrialLog in
Avatar of Dwight Crane
Dwight CraneFlag for United States of America

asked on

Half network can't authenticate/join Network

Win Srv 2003

I have a SERSIOUS issue I can't track down.

Today 1/2 the people in the company couldn't join the network. It seems mostly the people that shutdown their pc's over the weekend can't sign in. but there are a few that did turn off their pc and stil signed in.

I pretty much tracked it initially to the certserve on my PDC. The service was crashing and after hours it came that I had to reinstall the Certservice. Once installed my dcdiag came up clean and the errors in EV weren't populating anymore. After I rebooted, I did dcdiag one more time and and SystemLog had Failure. It also is not Replicating to 2nd DC. I've seen errors pointing to DNS can't resolve and something with KDC.

KDC Error"
The currently selected KDC certificate was once valid, but now is invalid and no suitable replacement was found.  Smartcard logon may not function correctly if this problem is not remedied.  Have the system administrator check on the state of the domain's public key infrastructure.  The chain status is in the error data.

I was getting the error prior to reinstalling the CertServ on the PDC (not after):
 Automatic certificate enrollment for local system failed to renew one Domain Controller certificate (0x800706ba).  The RPC server is unavailable

The PDC is also controls printers, and no print jobs from anyone are going through.

***This is the warning I get on the PDC in regards to replicating to other DC:

This server is the owner of the following FSMO role, but does not consider it valid. For the partition which contains the FSMO, this server has not replicated successfully with any of its partners since this server has been restarted. Replication errors are preventing validation of this role.
 
Operations which require contacting a FSMO operation master will fail until this condition is corrected.
 
FSMO Role: CN=Schema,CN=Configuration,DC=DOMAINNAME,DC=WP
***

when running DCDIAG these errors occur:
[Replications Check,GRANT] A recent replication attempt failed:
   From DC2 to GRANT
   Naming Context: DC=DomainDnsZones,DC=DOMAINNAME,DC=WP
   The replication generated an error (1908):
   Could not find the domain controller for this domain.
   The failure occurred at 2009-04-13 18:56:25.
   The last success occurred at 2009-04-13 18:50:52.
   1 failures have occurred since the last success.
   Kerberos Error.
   A KDC was not found to authenticate the call.
   Check that sufficient domain controllers are available.

 Starting test: kccevent
    An Warning Event occured.  EventID: 0x80250828
       Time Generated: 04/13/2009   18:56:02
       (Event String could not be retrieved)
    ......................... GRANT failed test kccevent



Running dcdiag /test:dns  results in:
Testing server: Default-First-Site-Name\GRANT
   Starting test: Connectivity
      The host 21ce9160-7378-4a3c-b3d1-b0713fdd3391._msdcs.DOMAINNAME.WP could not be resolved to an
      IP address.  Check the DNS server, DHCP, server name, etc
      Although the Guid DNS name (21ce9160-7378-4a3c-b3d1-b0713fdd3391._msdcs.DOMAINNAME.WP) couldn't be
      resolved, the server name (grant.DOMAINNAME.WP) resolved to the IP address (10.10.1.125) and was
      pingable.  Check that the IP address is registered correctly with the DNS server.
      ......................... GRANT failed test Connectivity

In AD Sites and Services, if I right click on DC2 and check topology, I get error "The RPC server is unavailable.


I can ping via Name of the 2nd DC.

I AM AT A POINT WHERE REMOVING THE SECOND DC IS AN OPTION BUT NOT SURE HOW! I can't have another day of 1/2 the company not being able to work. I dont' understand why some pc's won't even find the Domain.


What's Odd is I have a laptop that works, however if I plug it into another working port, it does not see the network anymore, then I plug it back into the old port and it starts working again.

Odd thing #2 - I updated with all patches from MS. However the .Net Framework 3.5 SP1 .. will not install, 1/2 way through it comes back with error "Ectration Failed: File is Corrupt".. but it's not I can take the exact file and run it on other machines, I even tried redownloading.. everytime, samthing.
Avatar of Dwight Crane
Dwight Crane
Flag of United States of America image

ASKER

I have since removed the 2nd DC via the Manage Server wizard.

After Removing 2nd DC.. I get these errors on PDC

Certificate Services could not process request 16 due to an error: The revocation function was unable to check revocation because the revocation server was offline. 0x80092013 (-2146885613).  The request was for CN=Dyno1Aux.Sturman.WoodlandPark.  Additional information: Error Verifying Request Signature or Signing Certificate
now going on hour #12 straight.. i'm tired..

If I remove a machine from the domain, I can not add it back. It says no domain controller found .. no dns..  WTF!!!! I am getting 0 errors in any log on the PDC and half the machines on the network are still working..
How did you setup the DNS?
Where are the FSMO roles hosted?
Are you able to do a forward- or reverse-lookup for any host from your network and for your DC(s)?
This was an inherited network from a predecessor. I assume the DNS was created at the time of the domain. FSMO roles are all hosted on the PDC.. I now only have the one DC.  Some machines are able to lookup.. others are not. There really is no significant difference between the ones that can and can't. I'm thinking it has something to do with certificates and ones that have timed out.  

Since I'm using DHCP and all computers get the same info I'm totally baffled by the fact why some work and some dont'. Although, NONE of them can print. (print server is on the PDC)
ASKER CERTIFIED SOLUTION
Avatar of Dwight Crane
Dwight Crane
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Paranormastic
Run these, in order:
certutil -dcinfo deletebad
certutil -pulse
gpupdate /force
Ok.. did that.. it said it required a reboot for some of the policies, so I did.  I see no difference. Is there a particular policy setting that might effect this? I reviewed the policies (4)... nothing jumps out at me.

Interesting note... There are 4 out of 15 printers that still work. the 11 that don't all have an offline status. (no they are not really offline).  If people had one of the 4 printers already added, they are able to print. However I can not add one of the working computers to anyone that doesn't already have them. It comes back asking for credentials. When I enter my username/password, it comes back and says there are existing credentials and asks if I want to replace. However, when I try to replace, it comes back saying the existing credentials can not be overwritten.
UNBELIEVABLE.. I have found the problem.. it was 2 fold.. There were the issues listed above and a Switch going bad issue. Thanks everyone !..
Is there a particular policy setting that might effect this?
-- No, it is the autoenrollment that is advertised through AD, which this is one way to pull it.

Glad you got it all straightened out!