I will offer Points for even good suggestions, if not fixes.. (Server 2k8X64 Active Directory and possibly Desktop Authentication issues

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

Knife,
It seems to happen on machines sporadically, There are currently about 8 machines with this issue, as of today, but yesterday there were 9 machines, so it appears that one had corrected itself with logon today. The error logs on the domain controller show nothing out of the ordinary, as in nothing that would make me think the issue is on the DC side. The DC errors that are shown are..

Failed extract of third-party root list from auto update cab at: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: A required certificate is not within its validity period when verifying against the current system clock or the timestamp in the signed file.

Event ID 11
Source CAPI2

--and--

System

Driver xxx required for printer xxx is unknown contact the administrator to install the driver before you log in again..

Event ID 1111
Source Terminal Service Printers

Both of these is assume were null errors..
As the Application one i assumed was due to the system not getting the updated Information from windows update, since the server is segregated and the system error from people logging into the server (admins) who have the printer thing checked in RDS.

The systems (Clients) never show no logon server, they log in fine, but only show it in the information of the system, or if you query the logon server from the client side. Which is confusing, since the policy again, is set to not let the clients log into the desktop without the DC being available. But at no time does the desktop fail to log into the domain. The Policy is set, i can see it in the desktop, but again.. it's got me baffled.

My desktop, which does not have the issue, is on the same switch, has the same software, plus additional administrator tools, and logs into the same domain. The only difference between my desktop and the others that i can tell is that i log into it with an administrative account. But there are many other machines here without the issue logged into with a user account, so i can't seem to figure out the difference.. Same policy and all.. I added myself to the policy when i noticed the error occurring, which i figured if it were policy based i would eventually have the same issue.. Just luck i guess.. LOL

Do you have NTP set up as a GPO? all the end devices on the correct time?

ASKER

Greg,
No it is physical, it was a left over after we migrated all the main servers to a HyperV based virtual server farm. The "Overkill" server was just an older Dell Server we had to use as the DC, which was upgraded from 2k3 about a year ago.. I can't tell if the issue started around then or not, since i was not working here during that time.. It is possible that something is jacked from that time..

ASKER

Greg,
No and yes.. No NTP source internal set through group policy.. Although i have contemplated testing that out to see if that would help.. But the clocks all look to be the correct time.. They are pretty much spot on too.. Like change in time of about .5 seconds when the minute changes between the desktops and the server.. At most i have seen is on one of the Old.. I mean REALLY Old machines the time can get out of sync as much as 20 seconds.. but the threshold should be set for 30 seconds, and ironically that machine doesn't have the issue.

Was the DC a fresh install of 2008R2 then DCPromo'd into the domain?

ASKER

Greg,
No it was an "upgrade" which as far as i can tell was originally NT4 Directory services, migrated to 2000 Advanced server AD, migrated to 2003 Standard AD, migrated to 2k3R2x64 AD, migrated to 2008 x64 standard AD... Which is where it currently sits..

you have other DC's running on the network?

SOLUTION

hecgomrec

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

Greg,
No just a single Domain controller, for roughly 25 end users.. But they log in at varying times.. typically from 7-9AM with each dept of about 5-10 people..

Hec,
Power management is disabled through policy, in that there is no hibernation, sleep, and the NIC does not turn off, nor does it allow power saving.. The NIC is set to auto-negotiate but the switch ports are set to full duplex 1GB. All desktops have 1GB full or 10/100/1000 auto.

You have a Hyper-V environment available?

I would start up a new 2012R2 instance - DCPromo - transfer all roles to it. Demote the DC in place now. disjoin it from domain, you will also need to adjust DHCP service to reflect the new DC DNS

tear down the current server - get rid of the raid 6, make it raid 10. Make the current DC box a hyperV server. Startup a new 2012R2 VM - dc promo it so you have minimum two DC's in place (this is best practice)

You can use the extra capacity to run a network monitor

SOLUTION

McKnife

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

Knife,
Yes, i actually had paid Microsoft to connect remotely to it and figure it out.. that ticket was open about 60 days ago, and i have yet to get any kind of usable solution from them, they can't figure it out either.. LOL..

I did google the hell out if it, thinking i would find something..
It's not constant.. One day you will get no logon server, but if you log off, and back on, after clicking switch user, and doing domain\username and password it will show the logon server as the correct server for about 7-10 days, then it is gone again.. The Other odd thing is that if you do the same thing, but log out of the user and try to log in as domain\administrator it will fail with the error "no logon servers available".. Which made me think.. "Cached profile" but when i deleted the cache, and tried again, it worked for the user and not the admin, so i pulled the machine, moved it to another network port and tried again.. still the same error.. but then the next day, both accounts worked fine on that system.. No updates, no physical changes anywhere...

Greg,
I actually have another system with the DC cloned on a different Vlan, with a few test VM desktops connected to it.. I can't replicate it on the other side.. which is weird since everything is identical.. except the OS is on a 2k8R2 Hyper-V instead of physical.. and the switches are ancient Cisco's.

That made me think..
I am pretty sure the test side i changed the DNS Domain Name in the DHCP settings..
Which makes me wonder if maybe this is a DNS/DHCP issue..?

Does anyone who knows the Scope Options on here know what the 015 Scope option will do if it is not correct, or set to a secondary DNS name rather then the current DNS name of the Domain name?

For instance if the domain was originally test1.com
and you change it to Test2.local
but do not updated 015 in the scope options of the DHCP list, will that cause issues?
Does anyone have a good link to what that 015 option does, or what it controls?

Cloning DC's is never a good thing....

Scope option 15 should have you Active Directory name in it

DNS option should point to your two Active Directory servers. This could also be an issue with desktops not finding logon server - DNS must be the Active Directory servers.

Make sure you do not have two DHCP servers on your network (unless it's 2012R2 DHCP failover) - this can cause logon issues too

ASKER

Greg, Sorry for the delay in getting back..
The DNS is on the current AD server, while there are two servers, there is only one in each Vlan, so while there are two servers, only one on each side is visible. There isn't any holes punched through on either Vlan, so there shouldn't be any DHCP duplication, although even if that were the case, they are on two completely different subnets, so i don't think that would be the issue.
The 015 is set to the old DNS name, which is not the same as the new one, so i am curious if this could be the culprit, but i don't know enough about the Scope options and what the thought was on the configuration being the old DNS name in that option, but i have a feeling that changing that to the correct DNS name would be the solution.. Thoughts?

Any idea what changing that could jack up in the current environment?
Also, i found that they had once had WINS in the mix, which is long gone, and i removed the traces of that on Saturday.

Thanks so far for the Q/A's

Have you done an overnight ping test from one or more of the problem machines to the DC.

Have you started up a new VM to replace this DC?

Is the 'cloned' dc completely separated from the production network?
(the production DC isn't trying to replicate to it? or vice-versa)

also try a Message Analyzer capture on one of the problem machines. This app can break down IP Traffic by conversation with the DC

ASKER

Ran a ping last night, along with wireshark, never saw a single issue, no dropped packets, no problems at all.. Also ran a separate wireshark from one of the effected machines, again no issues, never a dropped packet or error.

I don't plan on migrating this to a new VM, If i can't get this to work, i am going to buy a new physical server, and start from scratch, as i would rather dump the time into getting this to work correctly, then migrate a problem from one piece of hardware to another.

The cloned server is in no way connected to the live network.
There is no replication setup.. outside of Veam to send data to our offsite data center. Which is configured through VPN, but it is only a one way trust.. and the DC isn't setup to know about it.. it's segregated..

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

I actually figured this out about 30min ago.. Turns out it is not an issue on the DC side at all. The issue appears to be with the switch that these machines are plugged into. There are 2 Juniper gigabit switches and an old 10mb switch that i was told no one was connected to, but it seems that the one machines with this issue are actually connected to this switch. I have since moved them to the other switch, had them log into the machines and the issues are gone.. Anyone know a good home for a 20 year old Cisco switch? lol

glad you found the network issue!

thats why I kept going back to wireshark to troubleshoot!

Enjoy your win!