Avatar of HeronTech
HeronTech
 asked on

VSS causing random lockups, Server 2008

Hello Experts

Having a hard time trying to track down the source of one of our servers intermittent freezing

What happens on the odd occasion (3 times last month, 1 time this month so far) is that the server completely 'locks up'

We have a 2 server site setup, so when it happens, we can connect to serverB and ping serverA

We cannot however connect to any resources on ServerA (ServerA runs exchange and some other file shares)

Its a Dell poweredge R710 server, so when I connect to the DRAC - the console responds to mouse, but not keyboard. The only remedy at this point is to restart the server

Once the server starts back up, there is a pysical 'gap' in the event logs. As in - when the server crashes until when the server is back up, there is a gap in all event logs (system, application, security etc)

The only thing that seems to be happening is a VSS start command

Log Name:      System
Source:        Service Control Manager
Date:          16/03/2012 7:15:01 AM
Event ID:      7036
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      ServerA
Description:
The Volume Shadow Copy service entered the running state.

Shadow copies are disabled via 'My computer' - however we do run shadow protect as a backup solution that runs on the hour (15 minutes past the hour) that backs up the server volumes to a NAS over gigabit network

It's only the odd occasion when the server locks up, but the symptoms are exactly the same

Vss admin list writers show all writers as stable, all system volumes are 0% fragmented, Dell drivers for the RAID controller are up to date

Any other ideas?
Windows Server 2008Storage SoftwareDell

Avatar of undefined
Last Comment
SupermanTB

8/22/2022 - Mon
brammer90

Hi
This is a tricky one.

I'm assuming the exchange server is freezing intermittently when you try to back it up.
Its all a process of ilimination.

You mention the drivers for the controllers are upto date but have you checked for any backup software updates & service packs.

I would change to job to 2 seperate ones.

Backup the server without backing up the exchange databases then backup the databases as a seperate job using the exchange mailbox backup and not just backup the edb files like you would the other files.

See how you get on then, this could be a problem with the database being scanned by A/V as the snapshot is taken.

initially I would be performing a manual backup daily rather than leaving it automated untill we've tracked down the problem.

if the backup fails, you will have a good idea what part is failing.

while its manual, you have the opertunity to disable the A/V from scanning while you carry out the backup, thats what I would do, but that up to you.

if you find it works ok as 2 seperate jobs then try it automated as 2 jobs for a while.

it very important that your backup solution isnt actually 'file' backing up the exchange database file itself as it should be backing up using exchange backup.

let me know how you get on with that and lets take it from there.

Good luck

Regards
Dave
HeronTech

ASKER
Hello

The backup isnt being ran whilst any a/v 'scheduled' scans are being ran - and the exchange EDB files are already excluded by the 'real time scan'

The version of shadow protect is as up to date as we'd care it to be - its version 4.05 - I know version 4.2 is out now - but that in itself has issues (as did 4.1.5) causing SQL VDI 'errors' to be logged every backup.

There is already 2 backup jobs being ran - a full backup (each Saturday) then incrementals every hour to the NAS - and full backups (each night) to an external USB drive

It seems to be the network backup that its falling over on - the server has not crashed once during the normal full backup to the external drives

The backup jobs are identical - backing up both volumes. The only difference is the location

For now, I have a job open with shadow protect as well as here - for now I have disabled the backup to the NAS as a preventative measure
HeronTech

ASKER
Hello

Going right way back to basics - if a server has a pagefile configured that is too small and runs out of virtual memory trying to execute a vss - will the whole server lock up as in the behaviour described?
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER CERTIFIED SOLUTION
HeronTech

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
HeronTech

ASKER
Solved externally by 3rd party software provider
SupermanTB

I'm having this exact same issue.  Would you mind sharing what 3rd party software you used?