Solved

XP Winlogon/SFC boot errors following Server 2008 R2 upgrade

Posted on 2011-09-02
18
869 Views
Last Modified: 2013-11-22
A few weeks after we installed Server 2008 R2 on an existing site, transferred AD and removed the old server, virtually all of the PCs on that site (all Windows XP Pro SP3) started getting serious boot errors.   (Not all at once; it started gradually.)

The exact text of the messags is:

"winlogon.exe - Unable to Locate Component:  This application has failed to start because sfc_os.dll was not found.   Re-installing the application may fix this problem. [OK]"

The problem is occurring consistently on most systems whenever they are rebooted.  Rebooting with F8 and selecting last known good allows the boot to proceed -- in fact, the users at this site have gotten used to doing it themselves -- but the problem often re-occurs on next boot.  Doing a system state rollback (system restore) lasts longer but is apparently not permanent.   There is one system on the site which is not a domain member:  it alone appears to remain unaffected.

We have run MD5 checks on both the winlogon.exe and sfc_os.dll on systems both before and after the error occurs (in the latter case, with immediate power-down and attachment as a secondary drive.)  They show clearly that this files are in fact both present and unmodified when the error occurs.   That indicates the error message must be erroneous, or that something is actively interfering with the function of one or both of them.

Rebooting these systems with last known good and running SFC /SCANNOW has been performed on a number of these systems.  It does not resolve the issue.   Two of the systems have been completely reinstalled, and they have also had this issue reoccur.  

The whole site has been swept thoroughly for viruses and malware, using multiple tools (superantispyware, combofix, malwarebytes, ESET NOD32, tdsskillerm, gmer, catchme, rootkit revealer).   In some cases, drive removal and scanning as attached drives using root kit detection software.   Although some malware was found on some of the systems, it has been removed, none of it is known to be associated with this behavior, and a number of the systems experiencing this behavior have never had malware detected on them.

We are still looking for a way to detect in system event logs whether this has occurred.  We are not normally on site, and as a result do not have perfect information about every time the problem occurs. Although we have asked for reports when it happens, we probably only get about a third, typically a day later.

Some questions:
1. Is it concievable that group policy might have this effect?
2. How can we determine what this message is actually complaining about, since it is demonstrably incorrect?
3. Would boot logging be useful here?
4. How can we confirm whether this is caused by a virus?  I'm thinking monitoring network activity might be a good indicator, if I knew what to look for.  Online descriptions of the VIrut virus may be indicative.

We could "nuke and pave" the entire site, but until we know for sure what is causing this behavior, it could just be a waste of time.

/kenw Screen shot (via camera)
0
Comment
Question by:wallewek
  • 6
  • 5
  • 3
  • +2
18 Comments
 
LVL 1

Assisted Solution

by:leebrumbaugh
leebrumbaugh earned 50 total points
ID: 36476523
That sounds like GPO is doing something funny to me.  Are you pushing files/applications through GPO's?  What I'm thinking is it might be trying to install something that it can no longer find.  I don't know exactly what that dll is for though.
You say you've rebuilt the boxes, and it doesn't happen until you connect them to the domain, correct?
Are you using a prebuilt Windows image or installing from scratch?
What if you build a machine and don't put it in your default Computers group, do you still get that error?
There is a way to turn on verbose boot messages in AD, you might want to try that.
0
 
LVL 66

Assisted Solution

by:johnb6767
johnb6767 earned 100 total points
ID: 36476977
I would look at your paths, to make sure they have the Windows, Windows\System32 and Windows\System32\Wbem directories listed.....
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36476979
And youncoukdmalsomuse evtcomb.exe from the Account Lockout Tools to scan the domain for specific log entries....
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 66

Expert Comment

by:johnb6767
ID: 36476982
I hate this spell checker on the iPads... Disregard gibberish above....

That was supposed to say....

You could also use....

Details here....

http://support.microsoft.com/kb/824209
0
 
LVL 6

Accepted Solution

by:
joeyfaz earned 100 total points
ID: 36478667
Most definitely a GPO issue, disable the GPOs and run a GPUPDATE on the workstations and then reboot them. The issue should go away. Once it does, just start with a fresh GPO.
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36479557
You're talking like this is a remote site.  Are you using MSP software?  
0
 

Author Comment

by:wallewek
ID: 36481420
Yes, this is a remote site, and we are using Kaseya to maanage the site.   It's nearly an hour away if we need to go on site.   We are putting together a list of things for someone to do on a site visit, though.

One challenge is that the WInlogon/SFC_OS error is virtually invisible to us:  when it occurs, we have no remote access, and it leaves no trace we have so far found in the event logs.

My initial inclination was that this could not be a GPO issue, but I like joeyfaz's perscription for testing.  I've done some digging, and there's enough evidence to merit more.

More respomses:
- We are not pushing files or applications via GP.  
- we have one system which is not a domain member, which has not experienced the problem so far.   But it's only one system, which is different in a number of other ways as well *e.g. not runnint the same software).   Like may other  aspects of this issue, it's more of an indication than solid proof of anything.
- rebuild boxes are full scratch builds (albeit without FDISK/MBR and power-down RAM wipes, TTBOMK)
- I'm not aware of a way to "turn on verbose boot messages in AD".  Are you perhaps referring to the way of doing in in safe mode boot, or BOOT.INI?  

/kenw
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36482086
It sounds to me like a problem with Kaseya.  The first place I would be looking, since this only seems to affect this site, is what updates and modifications Kaseya is scheduled to make.  If you have a custom script set to run, for example, as part of doing automated updates, that could easily account for the behaviour you're describing.

You said in your original message that a system rollback 'lasts longer' but it shouldn't if it was a GPO being applied.  When the workstation reboots one of the first things it does is go look for updated domain group policies and applies them.  Therefore the only other logical conclusion is there's something else managing the systems that operates a) automatically and b) with administrative access, and c) on a schedule.

0
 

Author Comment

by:wallewek
ID: 36499917
Good call on the GPOs -- it turns out we needed to do some work there -- but that wasn't it.  

It appears that the issue was the system PATH environment variable.  johnb6767, if this stands up, you'll be gettting the points on this one.  I have _never_ seen something like this before, and I would love to know what did it.  

Get this:  we're finding system PATH entries in the registry (at HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment) that are over 32000 bytes long!   When they get that long, a SET command won't display them completely, and if you look in system properlties -> environment variables, it won't even display the path at all.   It shows the first couple of system environment variables and stops.  

The actual contents of the variable are maybe a a hundred or so bytes that look OK, and then many, many repentitions of what would actually be legitimate registry entries, nothing remarkable.

I don't think it's possible for a logon script or an AUTOEXEC.BAT to make the PATH variable this long.  If a GPO did it, damned if I know how.  Some app, possibly...?   I'll definitely be watching this site closely in the future.

Does anybody here have any idea what could have done this?

/kenw
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36506272
Lots of programs will add themselves to the PATH statement by appending the path to include their executables.  There is a limitation on the length of the path statement though so frequently programs that add more to the end of the path statement end up not working as expected.  Because of that some programs will append to the BEGINING of the path statement (such as Windows Live! products) pushing the system required elements to the middle or the end.

That's why I was asking if you were using MSP software.  I've used Kaseya (and got away from it) and LabTech and in each instance I've noticed the odd occasion where the agent has 'tweaked' the operating system.  For example I've set up most of the networks I manage to use GPO's to install the Labtech agent when the user logs in if the agent isn't already installed to the system.  An automated install like that, if scripted incorrectly, could actually do what you're experiencing where the PATH statement is being appended each time the user(s) log in rather than checking first to see if the PATH statement needs to be appended.

Of course I don't mean to say that Kaseya is definitely the culprit, just a really good place to start looking because Kaseya logs all the activities it performs on systems it has an agent.
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36507656
Can you fix the path and reboot the machine, in an OU that doesnt have any policies applied?
0
 

Author Comment

by:wallewek
ID: 36512564
OK, we found the cuplrit. It was a group policy preferences entry, with a name indicating that it was a security setting, created in April 1, 2009, which tried to append a series of entries to the system path environment variable of XP systems.  At the time it was created, it apparently had no effect (and was probably forgotten about), and didn’t do anything until the Server 2008 GP extensions were eventually pushed out to those PCs recently when the new server was installed.

So it was both a PATH and an GPO issue.  You guys were a great help!  How the heck do I award points for this??

/kenw
0
 
LVL 6

Expert Comment

by:joeyfaz
ID: 36513840
Split them up if you wish, I don't mind.
0
 

Author Closing Comment

by:wallewek
ID: 36514345
It was a complex issue, no single response provided the whole answer, but the responses did provide key approaches for resolution.
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36514527
Glad you're fixed....
0
 

Author Comment

by:wallewek
ID: 36514563
Yes, now if I could figure out why it took two years for the GPO to start causing trouble.   Yes, we upgraded/migrated from Server 2008 toi Server 2008 R2.   So what?

[You know, I'm getting really tired of having this EE user interface let me post a comment, and then throwing it away and making me retype the whole damn thing because it says I have to be logged in to post a comment.  And I keep telling it to remember we!]
0
 
LVL 1

Expert Comment

by:leebrumbaugh
ID: 36514588
I had an interesting thing like that happen recently to me as well where I had a bad script.  It'd probably been quickly dying for years, but a recent MS update some how got it partly working so it would hang until it timed out.  A pain in the butt for sure.
0
 

Author Comment

by:wallewek
ID: 36514626
Yeah, I figure something like that.  My money is on something to do with the XP Group Policy extensions for Server 2008.   This had to do with a Group Policy Preferences policy.   Maybe there was an update for R2.  Might have something to do with XP SP3 to, but I don't think that's recent here.
0

Featured Post

Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Can I legally transfer my OEM version of Windows to another PC?  (AKA - Can I put a new systemboard in my OEM PC?) Few of us are both IT and legal experts but we all have our own views of Microsoft's licensing rules and how they apply.  There are…
You might have come across a situation when you have Exchange 2013 server in two different sites (Production and DR). After adding the Database copy in ECP console it displays Database copy status unknown for the DR exchange server. Issue is strange…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question