Link to home
Start Free TrialLog in
Avatar of Patch 70
Patch 70Flag for United States of America

asked on

Fix corrupted file system on (semi-functional) Server 2012 R2

We are attempting to assist someone with a migration from a VMWare Esxi VM running 2012 R2 server - to a 2019 VM on Hyper-V 2019. The 2012 server is running a few medical databases, etc... so we are having to coordinate with the developers to migrate everything to 2019.
This is a production machine

We would like to move the 2012 VM to HyperV before the migration, but its OS is unstable and backups are failing, so I think that needs to be fixed before attempting to convert the VM from VMWare to Hyper-V.

We cannot complete a SFC in regular Windows or in safe mode... in an admin cmd window, we get "Windows Resource Protection could not start the repair service."  

All updates, cumulative patches, etc. are installed.... and after the latest updates... we get a BlueScreen on boot... not sure what the issue is, but we can let it bluescreen twice and go into recovery mode... then select the option for Windows Startup Options, "Disable Driver signature Enforcement" and it boots fine for now.   As I stated, this is a production machine, so we haven't had time to troubleshoot except late in the evenings.

In an admin powershell window, it will run, but we get "Verification 100% complete. Windows Resource Protection found corrupt files but was unable to fix some of them. Details are included in the CBS.Log windir\Logs\CBS\CBS.log. For example C:\Windows\Logs\CBS\CBS.log. Note that logging is currently not supported in offline servicing scenarios."

In the admin powershell, If we attempt to run DISM, we get this...
PS C:\Windows\system32> Dism /Online /Cleanup-Image /RestoreHealth
Deployment Image Servicing and Management tool
Version: 6.3.9600.19408
Image Version: 6.3.9600.19397
[==========================100.0%==========================]
Error: 0x800f0906
The source files could not be downloaded.
Use the "source" option to specify the location of the files that are required to restore the feature. For more informat
ion on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077.
The DISM log file can be found at C:\Windows\Logs\DISM\dism.log

We have also attempted to point the source for DISM to a mounted 2012 R2 ISO, but that doesn't seem to work either.... maybe we don't have the correct build??

I saw a post saying to remove KB3022345, but it isn't installed... it must have been replaced by a CU.
We have tried to restart with a chkdsk /f on the C:\
We have gone though the process of resetting updates: removing/renaming the SoftwareDistribution folder, the Download folder, catroot2, etc.

All to no avail...

Please help!
Thank you in advance!

Avatar of rindi
rindi
Flag of Switzerland image

If it's a corrupt file-system, you repair it with chkdsk /f. As long as you run it against NTFS, it is safe, as NTFS is a file-system that creates transaction logs, from which chkdsk then repairs the corruption if possible. But during the repair the file-system must be not mounted, which means it is either run during bootup, ot you can't access it's data until done. So you need to do it when no one is using the server, & also plan some down time, as, depending on the size of the file-system & the severity of corruptions, it can take a long time.
It is not good to migate the virtual machine to Hyper-V as the issues will still exist.

But what is the issue with the virtual machine ?

but its OS is unstable and backups are failing,

why uis it unstable ? How and What is happening ? Errors, event logs ?

Backup's failing - how are they being performed  with what Veeam ?

BSOD are not very good ?
Avatar of Patch 70

ASKER

@rindi - as I mentioned "We have tried to restart with a chkdsk /f on the C:\" - it did take a long time... but I'm not sure where to check to see what it did (CBS.log??)

@Andrew - I say it is unstable because of most reasons I mentioned... 1) SFC fails  2) DISM fails 3) BSOD on boot until we disable driver signature enforcement
Backups are done with Acronis Cloud - and were working until about 3 weeks ago.
The CBS.log has many many errors upon trying to run sfc or DISM... should I post it?
but we don't often run SFC and DISM on our servers..... unless something points us to that!

e.g. I have never run it on most of our servers.

What is failing with the backups, are these backing up a VMware ESXi server via vCenter Server ?

What driver is the VMware vSphere VM using ? Storage controller ?

what driver is causing the issue ?

So if you restore a VM from 3 weeks ago, is the VM okay ? you can disable networking ?

I'm really struggling to work out, what the issue with your server is ?

In the past, there have been 2012 R2 patches which have caused issues, but that's a wild card stab in the dark, because it's not the last 3 weeks, unless these are old patches.

Can you just not restore the VM from 3 weeks ago, when it all worked, and migrate the database (export)
"We have also attempted to point the source for DISM to a mounted 2012 R2 ISO, but that doesn't seem to work either.... maybe we don't have the correct build??"
My experience has been that the version must match exactly.  For example, with Windows 10, I've not been able to get 19044.2604 to work as a source for 19044.1288 or vice versa.  The problem comes in that a clean install will have one release number, but anything in production seems to have a later release because of updates.

My solution has been to check the release of the problem drive (run DISM and read the release number for the installation, NOT the version of DISM) and try to recreate it on a new drive.  I'll install the last released version that the drive is using, check the release, do some updates, check release, etc.  I've even started keeping images of these different versions so I'll have them ready when needed.

The new drive can be used for the /source option.


"The DISM log file can be found at C:\Windows\Logs\DISM\dism.log"
What have you found in the log file?  If you post it here you may get some informed opinions about the issues.
The 2012 server is running a few medical databases, etc... so we are having to coordinate with the developers to migrate everything to 2019.
This is a production machine

Why fix the server, 2012 is going end of life, mount the disks in a helper VM on VMware vSphere ESXi.

Extract the medical databases, build a new virtual machine on Hyper-V, and import the data.

No requirement to fix this server, if the server has faulty disks, and issues that need to use chkdsk, then can you even trust the files and databases?

something tells me that there is a bad driver on the system probably as the result of a bad physical to virtual migration. Virtual machines use the hypervisors drivers 


having to disable driver signature enforcement is definitely a problem.  This would indicate to me that the driver has been changed possibly by a malicious process on the system


I would start off by cloning the virtual machine and working on the clone to fix things. 

which driver is causing the BSOD ?

have you tried an over install of the OS to fix things? 

I've never had  to run chkdsk on a vm's disk this would possibly indicate to me that the host system's disk subsystem has problems. 



Something did point us to running the scans... I can't remember exactly what it was, but I think last week, when we attempted to run something in the cmd window (maybe chkdsk), it said that we were attempting to run a 32-bit app in a 64-bit command prompt or vice-versa, but I think one of the CUs may have fixed that.
In the next several weeks, if I am going to run something like Starwind converter to convert from an ESXi vm to HyperV... Generally when I do this, I like to run sfc, dism, defrag, drive cleanup, etc. and have never had an issue with sfc or dism completing like this... so even if you don't run those often, they still should work.
Currently, there is an Acronis appliance VM that runs on the ESXi host - it backs ups the VM.
The database migration is not so easy... they are integrated into 2 medical applications and we do not have sa access to them and will have to go through the headache of coordinating 2 support teams with 2 different medical applications to make the migration, but why restore an entire machine if it is running AND IF there is a way to simply fix the system files?
So that begs the question... since I am now able to get a good Acronis backup... is there a path to simply restore that backup to a HyperV VM?  That would solve the first problem (getting everything onto HyperV)... then we could coordinate the migration of the appplication platforms to a clean, new 2019 VM.
Here is the portion of the log from yesterday and today.
I know that the image versions are supposed to be exactly the same... and see that most are not.... I *DO* have a good, clean 2012R2 that is the same version, but cannot figure out the proper DISM capture command to get the image from that one to try on this server....?
If you have restore to dissimilar hardware - yes, you can restore to Hyper-V.

Most Backup applications have this technology now, and Acronis have had it since 2008. otherwise you can use Veeam Free Agent, which I wrote an article on and how to do it.

So there is generally nothing wrong with the VM, although not sure about the validity of chkdsk and BSOD, (that does not seem right) if the VM is just running fine, this is just not the ability to run dsim, which you like to do before a Starwind conversion ?

So

1. Backup and Restore
2.  Microsoft Virtual Machine Converter (old and defunked but still works for old hardware)
@Andrew,
Thank you... that is correct... it runs fine, other than a recent update causing us to have to Disable Driver signature Enforcement each time it boots... we may try to remove recent updates tonight to see if that problem gets fixed.

We have the full Acronis Cloud Platform, so we can use or download whatever we need, to include Universal Restore, but could you please post a link to your article on Veeam - as well as the MS VM Converter?

Thanks again
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Andrew... I will mark that as a solution because I have no doubt one would have worked, although we just created a new domain and added the machines to it since there were only 8 workstations.
Thank you