Solved

XP Winlogon/SFC boot errors following Server 2008 R2 upgrade

Posted on 2011-09-02
18
858 Views
Last Modified: 2013-11-22
A few weeks after we installed Server 2008 R2 on an existing site, transferred AD and removed the old server, virtually all of the PCs on that site (all Windows XP Pro SP3) started getting serious boot errors.   (Not all at once; it started gradually.)

The exact text of the messags is:

"winlogon.exe - Unable to Locate Component:  This application has failed to start because sfc_os.dll was not found.   Re-installing the application may fix this problem. [OK]"

The problem is occurring consistently on most systems whenever they are rebooted.  Rebooting with F8 and selecting last known good allows the boot to proceed -- in fact, the users at this site have gotten used to doing it themselves -- but the problem often re-occurs on next boot.  Doing a system state rollback (system restore) lasts longer but is apparently not permanent.   There is one system on the site which is not a domain member:  it alone appears to remain unaffected.

We have run MD5 checks on both the winlogon.exe and sfc_os.dll on systems both before and after the error occurs (in the latter case, with immediate power-down and attachment as a secondary drive.)  They show clearly that this files are in fact both present and unmodified when the error occurs.   That indicates the error message must be erroneous, or that something is actively interfering with the function of one or both of them.

Rebooting these systems with last known good and running SFC /SCANNOW has been performed on a number of these systems.  It does not resolve the issue.   Two of the systems have been completely reinstalled, and they have also had this issue reoccur.  

The whole site has been swept thoroughly for viruses and malware, using multiple tools (superantispyware, combofix, malwarebytes, ESET NOD32, tdsskillerm, gmer, catchme, rootkit revealer).   In some cases, drive removal and scanning as attached drives using root kit detection software.   Although some malware was found on some of the systems, it has been removed, none of it is known to be associated with this behavior, and a number of the systems experiencing this behavior have never had malware detected on them.

We are still looking for a way to detect in system event logs whether this has occurred.  We are not normally on site, and as a result do not have perfect information about every time the problem occurs. Although we have asked for reports when it happens, we probably only get about a third, typically a day later.

Some questions:
1. Is it concievable that group policy might have this effect?
2. How can we determine what this message is actually complaining about, since it is demonstrably incorrect?
3. Would boot logging be useful here?
4. How can we confirm whether this is caused by a virus?  I'm thinking monitoring network activity might be a good indicator, if I knew what to look for.  Online descriptions of the VIrut virus may be indicative.

We could "nuke and pave" the entire site, but until we know for sure what is causing this behavior, it could just be a waste of time.

/kenw Screen shot (via camera)
0
Comment
Question by:wallewek
  • 6
  • 5
  • 3
  • +2
18 Comments
 
LVL 1

Assisted Solution

by:leebrumbaugh
leebrumbaugh earned 50 total points
ID: 36476523
That sounds like GPO is doing something funny to me.  Are you pushing files/applications through GPO's?  What I'm thinking is it might be trying to install something that it can no longer find.  I don't know exactly what that dll is for though.
You say you've rebuilt the boxes, and it doesn't happen until you connect them to the domain, correct?
Are you using a prebuilt Windows image or installing from scratch?
What if you build a machine and don't put it in your default Computers group, do you still get that error?
There is a way to turn on verbose boot messages in AD, you might want to try that.
0
 
LVL 66

Assisted Solution

by:johnb6767
johnb6767 earned 100 total points
ID: 36476977
I would look at your paths, to make sure they have the Windows, Windows\System32 and Windows\System32\Wbem directories listed.....
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36476979
And youncoukdmalsomuse evtcomb.exe from the Account Lockout Tools to scan the domain for specific log entries....
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36476982
I hate this spell checker on the iPads... Disregard gibberish above....

That was supposed to say....

You could also use....

Details here....

http://support.microsoft.com/kb/824209
0
 
LVL 6

Accepted Solution

by:
joeyfaz earned 100 total points
ID: 36478667
Most definitely a GPO issue, disable the GPOs and run a GPUPDATE on the workstations and then reboot them. The issue should go away. Once it does, just start with a fresh GPO.
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36479557
You're talking like this is a remote site.  Are you using MSP software?  
0
 

Author Comment

by:wallewek
ID: 36481420
Yes, this is a remote site, and we are using Kaseya to maanage the site.   It's nearly an hour away if we need to go on site.   We are putting together a list of things for someone to do on a site visit, though.

One challenge is that the WInlogon/SFC_OS error is virtually invisible to us:  when it occurs, we have no remote access, and it leaves no trace we have so far found in the event logs.

My initial inclination was that this could not be a GPO issue, but I like joeyfaz's perscription for testing.  I've done some digging, and there's enough evidence to merit more.

More respomses:
- We are not pushing files or applications via GP.  
- we have one system which is not a domain member, which has not experienced the problem so far.   But it's only one system, which is different in a number of other ways as well *e.g. not runnint the same software).   Like may other  aspects of this issue, it's more of an indication than solid proof of anything.
- rebuild boxes are full scratch builds (albeit without FDISK/MBR and power-down RAM wipes, TTBOMK)
- I'm not aware of a way to "turn on verbose boot messages in AD".  Are you perhaps referring to the way of doing in in safe mode boot, or BOOT.INI?  

/kenw
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36482086
It sounds to me like a problem with Kaseya.  The first place I would be looking, since this only seems to affect this site, is what updates and modifications Kaseya is scheduled to make.  If you have a custom script set to run, for example, as part of doing automated updates, that could easily account for the behaviour you're describing.

You said in your original message that a system rollback 'lasts longer' but it shouldn't if it was a GPO being applied.  When the workstation reboots one of the first things it does is go look for updated domain group policies and applies them.  Therefore the only other logical conclusion is there's something else managing the systems that operates a) automatically and b) with administrative access, and c) on a schedule.

0
 

Author Comment

by:wallewek
ID: 36499917
Good call on the GPOs -- it turns out we needed to do some work there -- but that wasn't it.  

It appears that the issue was the system PATH environment variable.  johnb6767, if this stands up, you'll be gettting the points on this one.  I have _never_ seen something like this before, and I would love to know what did it.  

Get this:  we're finding system PATH entries in the registry (at HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment) that are over 32000 bytes long!   When they get that long, a SET command won't display them completely, and if you look in system properlties -> environment variables, it won't even display the path at all.   It shows the first couple of system environment variables and stops.  

The actual contents of the variable are maybe a a hundred or so bytes that look OK, and then many, many repentitions of what would actually be legitimate registry entries, nothing remarkable.

I don't think it's possible for a logon script or an AUTOEXEC.BAT to make the PATH variable this long.  If a GPO did it, damned if I know how.  Some app, possibly...?   I'll definitely be watching this site closely in the future.

Does anybody here have any idea what could have done this?

/kenw
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36506272
Lots of programs will add themselves to the PATH statement by appending the path to include their executables.  There is a limitation on the length of the path statement though so frequently programs that add more to the end of the path statement end up not working as expected.  Because of that some programs will append to the BEGINING of the path statement (such as Windows Live! products) pushing the system required elements to the middle or the end.

That's why I was asking if you were using MSP software.  I've used Kaseya (and got away from it) and LabTech and in each instance I've noticed the odd occasion where the agent has 'tweaked' the operating system.  For example I've set up most of the networks I manage to use GPO's to install the Labtech agent when the user logs in if the agent isn't already installed to the system.  An automated install like that, if scripted incorrectly, could actually do what you're experiencing where the PATH statement is being appended each time the user(s) log in rather than checking first to see if the PATH statement needs to be appended.

Of course I don't mean to say that Kaseya is definitely the culprit, just a really good place to start looking because Kaseya logs all the activities it performs on systems it has an agent.
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36507656
Can you fix the path and reboot the machine, in an OU that doesnt have any policies applied?
0
 

Author Comment

by:wallewek
ID: 36512564
OK, we found the cuplrit. It was a group policy preferences entry, with a name indicating that it was a security setting, created in April 1, 2009, which tried to append a series of entries to the system path environment variable of XP systems.  At the time it was created, it apparently had no effect (and was probably forgotten about), and didn’t do anything until the Server 2008 GP extensions were eventually pushed out to those PCs recently when the new server was installed.

So it was both a PATH and an GPO issue.  You guys were a great help!  How the heck do I award points for this??

/kenw
0
 
LVL 6

Expert Comment

by:joeyfaz
ID: 36513840
Split them up if you wish, I don't mind.
0
 

Author Closing Comment

by:wallewek
ID: 36514345
It was a complex issue, no single response provided the whole answer, but the responses did provide key approaches for resolution.
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36514527
Glad you're fixed....
0
 

Author Comment

by:wallewek
ID: 36514563
Yes, now if I could figure out why it took two years for the GPO to start causing trouble.   Yes, we upgraded/migrated from Server 2008 toi Server 2008 R2.   So what?

[You know, I'm getting really tired of having this EE user interface let me post a comment, and then throwing it away and making me retype the whole damn thing because it says I have to be logged in to post a comment.  And I keep telling it to remember we!]
0
 
LVL 1

Expert Comment

by:leebrumbaugh
ID: 36514588
I had an interesting thing like that happen recently to me as well where I had a bad script.  It'd probably been quickly dying for years, but a recent MS update some how got it partly working so it would hang until it timed out.  A pain in the butt for sure.
0
 

Author Comment

by:wallewek
ID: 36514626
Yeah, I figure something like that.  My money is on something to do with the XP Group Policy extensions for Server 2008.   This had to do with a Group Policy Preferences policy.   Maybe there was an update for R2.  Might have something to do with XP SP3 to, but I don't think that's recent here.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I was supporting a handful of Windows 2008 (non-R2) 2 node clusters with shared quorum disks. Some had SQL 2008 installed and some were just a vendor application that we supported. For the purposes of this article it doesn’t really matter which so w…
Sometimes people don't understand why download speed shows differently for Windows than Linux.Specially, this article covers and shows the solution for throughput difference for Windows than a Linux machine. For this, I arranged a test scenario.I…
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now