A few weeks after we installed Server 2008 R2 on an existing site, transferred AD and removed the old server, virtually all of the PCs on that site (all Windows XP Pro SP3) started getting serious boot errors. (Not all at once; it started gradually.)
The exact text of the messags is:
"winlogon.exe - Unable to Locate Component: This application has failed to start because sfc_os.dll was not found. Re-installing the application may fix this problem. [OK]"
The problem is occurring consistently on most systems whenever they are rebooted. Rebooting with F8 and selecting last known good allows the boot to proceed -- in fact, the users at this site have gotten used to doing it themselves -- but the problem often re-occurs on next boot. Doing a system state rollback (system restore) lasts longer but is apparently not permanent. There is one system on the site which is not a domain member: it alone appears to remain unaffected.
We have run MD5 checks on both the winlogon.exe and sfc_os.dll on systems both before and after the error occurs (in the latter case, with immediate power-down and attachment as a secondary drive.) They show clearly that this files are in fact both present and unmodified when the error occurs. That indicates the error message must be erroneous, or that something is actively interfering with the function of one or both of them.
Rebooting these systems with last known good and running SFC /SCANNOW has been performed on a number of these systems. It does not resolve the issue. Two of the systems have been completely reinstalled, and they have also had this issue reoccur.
The whole site has been swept thoroughly for viruses and malware, using multiple tools (superantispyware, combofix, malwarebytes, ESET NOD32, tdsskillerm, gmer, catchme, rootkit revealer). In some cases, drive removal and scanning as attached drives using root kit detection software. Although some malware was found on some of the systems, it has been removed, none of it is known to be associated with this behavior, and a number of the systems experiencing this behavior have never had malware detected on them.
We are still looking for a way to detect in system event logs whether this has occurred. We are not normally on site, and as a result do not have perfect information about every time the problem occurs. Although we have asked for reports when it happens, we probably only get about a third, typically a day later.
1. Is it concievable that group policy might have this effect?
2. How can we determine what this message is actually complaining about, since it is demonstrably incorrect?
3. Would boot logging be useful here?
4. How can we confirm whether this is caused by a virus? I'm thinking monitoring network activity might be a good indicator, if I knew what to look for. Online descriptions of the VIrut virus may be indicative.
We could "nuke and pave" the entire site, but until we know for sure what is causing this behavior, it could just be a waste of time.