Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

XP Winlogon/SFC boot errors following Server 2008 R2 upgrade

Posted on 2011-09-02
18
Medium Priority
?
903 Views
Last Modified: 2013-11-22
A few weeks after we installed Server 2008 R2 on an existing site, transferred AD and removed the old server, virtually all of the PCs on that site (all Windows XP Pro SP3) started getting serious boot errors.   (Not all at once; it started gradually.)

The exact text of the messags is:

"winlogon.exe - Unable to Locate Component:  This application has failed to start because sfc_os.dll was not found.   Re-installing the application may fix this problem. [OK]"

The problem is occurring consistently on most systems whenever they are rebooted.  Rebooting with F8 and selecting last known good allows the boot to proceed -- in fact, the users at this site have gotten used to doing it themselves -- but the problem often re-occurs on next boot.  Doing a system state rollback (system restore) lasts longer but is apparently not permanent.   There is one system on the site which is not a domain member:  it alone appears to remain unaffected.

We have run MD5 checks on both the winlogon.exe and sfc_os.dll on systems both before and after the error occurs (in the latter case, with immediate power-down and attachment as a secondary drive.)  They show clearly that this files are in fact both present and unmodified when the error occurs.   That indicates the error message must be erroneous, or that something is actively interfering with the function of one or both of them.

Rebooting these systems with last known good and running SFC /SCANNOW has been performed on a number of these systems.  It does not resolve the issue.   Two of the systems have been completely reinstalled, and they have also had this issue reoccur.  

The whole site has been swept thoroughly for viruses and malware, using multiple tools (superantispyware, combofix, malwarebytes, ESET NOD32, tdsskillerm, gmer, catchme, rootkit revealer).   In some cases, drive removal and scanning as attached drives using root kit detection software.   Although some malware was found on some of the systems, it has been removed, none of it is known to be associated with this behavior, and a number of the systems experiencing this behavior have never had malware detected on them.

We are still looking for a way to detect in system event logs whether this has occurred.  We are not normally on site, and as a result do not have perfect information about every time the problem occurs. Although we have asked for reports when it happens, we probably only get about a third, typically a day later.

Some questions:
1. Is it concievable that group policy might have this effect?
2. How can we determine what this message is actually complaining about, since it is demonstrably incorrect?
3. Would boot logging be useful here?
4. How can we confirm whether this is caused by a virus?  I'm thinking monitoring network activity might be a good indicator, if I knew what to look for.  Online descriptions of the VIrut virus may be indicative.

We could "nuke and pave" the entire site, but until we know for sure what is causing this behavior, it could just be a waste of time.

/kenw Screen shot (via camera)
0
Comment
Question by:wallewek
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 3
  • +2
18 Comments
 
LVL 1

Assisted Solution

by:leebrumbaugh
leebrumbaugh earned 150 total points
ID: 36476523
That sounds like GPO is doing something funny to me.  Are you pushing files/applications through GPO's?  What I'm thinking is it might be trying to install something that it can no longer find.  I don't know exactly what that dll is for though.
You say you've rebuilt the boxes, and it doesn't happen until you connect them to the domain, correct?
Are you using a prebuilt Windows image or installing from scratch?
What if you build a machine and don't put it in your default Computers group, do you still get that error?
There is a way to turn on verbose boot messages in AD, you might want to try that.
0
 
LVL 66

Assisted Solution

by:johnb6767
johnb6767 earned 300 total points
ID: 36476977
I would look at your paths, to make sure they have the Windows, Windows\System32 and Windows\System32\Wbem directories listed.....
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36476979
And youncoukdmalsomuse evtcomb.exe from the Account Lockout Tools to scan the domain for specific log entries....
0
When ransomware hits your clients, what do you do?

MSPs: Endpoint security isn’t enough to prevent ransomware.
As the impact and severity of crypto ransomware attacks has grown, Webroot has fought back, not just by building a next-gen endpoint solution capable of preventing ransomware attacks but also by being a thought leader.

 
LVL 66

Expert Comment

by:johnb6767
ID: 36476982
I hate this spell checker on the iPads... Disregard gibberish above....

That was supposed to say....

You could also use....

Details here....

http://support.microsoft.com/kb/824209
0
 
LVL 6

Accepted Solution

by:
joeyfaz earned 300 total points
ID: 36478667
Most definitely a GPO issue, disable the GPOs and run a GPUPDATE on the workstations and then reboot them. The issue should go away. Once it does, just start with a fresh GPO.
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36479557
You're talking like this is a remote site.  Are you using MSP software?  
0
 

Author Comment

by:wallewek
ID: 36481420
Yes, this is a remote site, and we are using Kaseya to maanage the site.   It's nearly an hour away if we need to go on site.   We are putting together a list of things for someone to do on a site visit, though.

One challenge is that the WInlogon/SFC_OS error is virtually invisible to us:  when it occurs, we have no remote access, and it leaves no trace we have so far found in the event logs.

My initial inclination was that this could not be a GPO issue, but I like joeyfaz's perscription for testing.  I've done some digging, and there's enough evidence to merit more.

More respomses:
- We are not pushing files or applications via GP.  
- we have one system which is not a domain member, which has not experienced the problem so far.   But it's only one system, which is different in a number of other ways as well *e.g. not runnint the same software).   Like may other  aspects of this issue, it's more of an indication than solid proof of anything.
- rebuild boxes are full scratch builds (albeit without FDISK/MBR and power-down RAM wipes, TTBOMK)
- I'm not aware of a way to "turn on verbose boot messages in AD".  Are you perhaps referring to the way of doing in in safe mode boot, or BOOT.INI?  

/kenw
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36482086
It sounds to me like a problem with Kaseya.  The first place I would be looking, since this only seems to affect this site, is what updates and modifications Kaseya is scheduled to make.  If you have a custom script set to run, for example, as part of doing automated updates, that could easily account for the behaviour you're describing.

You said in your original message that a system rollback 'lasts longer' but it shouldn't if it was a GPO being applied.  When the workstation reboots one of the first things it does is go look for updated domain group policies and applies them.  Therefore the only other logical conclusion is there's something else managing the systems that operates a) automatically and b) with administrative access, and c) on a schedule.

0
 

Author Comment

by:wallewek
ID: 36499917
Good call on the GPOs -- it turns out we needed to do some work there -- but that wasn't it.  

It appears that the issue was the system PATH environment variable.  johnb6767, if this stands up, you'll be gettting the points on this one.  I have _never_ seen something like this before, and I would love to know what did it.  

Get this:  we're finding system PATH entries in the registry (at HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment) that are over 32000 bytes long!   When they get that long, a SET command won't display them completely, and if you look in system properlties -> environment variables, it won't even display the path at all.   It shows the first couple of system environment variables and stops.  

The actual contents of the variable are maybe a a hundred or so bytes that look OK, and then many, many repentitions of what would actually be legitimate registry entries, nothing remarkable.

I don't think it's possible for a logon script or an AUTOEXEC.BAT to make the PATH variable this long.  If a GPO did it, damned if I know how.  Some app, possibly...?   I'll definitely be watching this site closely in the future.

Does anybody here have any idea what could have done this?

/kenw
0
 
LVL 3

Expert Comment

by:WiReDWolf
ID: 36506272
Lots of programs will add themselves to the PATH statement by appending the path to include their executables.  There is a limitation on the length of the path statement though so frequently programs that add more to the end of the path statement end up not working as expected.  Because of that some programs will append to the BEGINING of the path statement (such as Windows Live! products) pushing the system required elements to the middle or the end.

That's why I was asking if you were using MSP software.  I've used Kaseya (and got away from it) and LabTech and in each instance I've noticed the odd occasion where the agent has 'tweaked' the operating system.  For example I've set up most of the networks I manage to use GPO's to install the Labtech agent when the user logs in if the agent isn't already installed to the system.  An automated install like that, if scripted incorrectly, could actually do what you're experiencing where the PATH statement is being appended each time the user(s) log in rather than checking first to see if the PATH statement needs to be appended.

Of course I don't mean to say that Kaseya is definitely the culprit, just a really good place to start looking because Kaseya logs all the activities it performs on systems it has an agent.
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36507656
Can you fix the path and reboot the machine, in an OU that doesnt have any policies applied?
0
 

Author Comment

by:wallewek
ID: 36512564
OK, we found the cuplrit. It was a group policy preferences entry, with a name indicating that it was a security setting, created in April 1, 2009, which tried to append a series of entries to the system path environment variable of XP systems.  At the time it was created, it apparently had no effect (and was probably forgotten about), and didn’t do anything until the Server 2008 GP extensions were eventually pushed out to those PCs recently when the new server was installed.

So it was both a PATH and an GPO issue.  You guys were a great help!  How the heck do I award points for this??

/kenw
0
 
LVL 6

Expert Comment

by:joeyfaz
ID: 36513840
Split them up if you wish, I don't mind.
0
 

Author Closing Comment

by:wallewek
ID: 36514345
It was a complex issue, no single response provided the whole answer, but the responses did provide key approaches for resolution.
0
 
LVL 66

Expert Comment

by:johnb6767
ID: 36514527
Glad you're fixed....
0
 

Author Comment

by:wallewek
ID: 36514563
Yes, now if I could figure out why it took two years for the GPO to start causing trouble.   Yes, we upgraded/migrated from Server 2008 toi Server 2008 R2.   So what?

[You know, I'm getting really tired of having this EE user interface let me post a comment, and then throwing it away and making me retype the whole damn thing because it says I have to be logged in to post a comment.  And I keep telling it to remember we!]
0
 
LVL 1

Expert Comment

by:leebrumbaugh
ID: 36514588
I had an interesting thing like that happen recently to me as well where I had a bad script.  It'd probably been quickly dying for years, but a recent MS update some how got it partly working so it would hang until it timed out.  A pain in the butt for sure.
0
 

Author Comment

by:wallewek
ID: 36514626
Yeah, I figure something like that.  My money is on something to do with the XP Group Policy extensions for Server 2008.   This had to do with a Group Policy Preferences policy.   Maybe there was an update for R2.  Might have something to do with XP SP3 to, but I don't think that's recent here.
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After seeing many questions for JRNL_WRAP_ERROR for replication failure, I thought it would be useful to write this article.
Curious about the latest ransomware attack? Check out our timeline of events surrounding the spread of this new virus along with tips on how to mitigate the damage.
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question