Solved

SBS 2003 randomly freezes during backup but not all the time.

Posted on 2011-02-26
21
1,343 Views
Last Modified: 2012-05-11
This is driving me nuts, I manage a lot of sbs boxes, this one I just inherited.  Dell Poweredge SC1430 with SBS 2003 R2 SP2.  The server has randomly frozen several times where video goes out, mouse, keyboard and network all unresponsive the only option is hard reboot.  There are no errors in event logs, event logs are very clean, and this doesn't occur all the time.  Using sbsbackup to an external usb hard drive, here are the only consistencies:

2/5/11 ntbackup starts 5:30 server hangs at 6:11
2/14/11 ntbackup starts at 6:30 server hangs at 6:57
2/23/11 ntbackup starts at 7:30 server hangs at 7:46

all ntbackups in between the above listed are successful and usually complete about 11:00 pm.
Here is what I have done to troubleshoot:
Changed out usb backup drive after the first time, but happened again.
Changed start time of the backup 1hour later each time
Disabled antivirus (it is set to avoid, exchange and backup drives anyway)
checked server hardware (memory, hard drive)
Checked scheduled tasks for conflicts no other tasks other than system tasks listed.
There are no other backup programs running, server has updates, plenty of space on both internal and external hard drives.  There were scheduled nt backup routines running unsuccessfully when I took over the server but I deleted and removed the scripts and set up sbsbackup to run instead.  Seems odd that this occurs almost every 9 days maybe just a coincidence?  I am out of ideas, any help is appreciated!

Thanks
0
Comment
Question by:mjkisic
  • 9
  • 8
  • 2
  • +2
21 Comments
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
One of the root causes can be the result of using desktop-class instead of enterprise class HDDs.  You can get this when the disk goes into a deep recovery state due to a bad block.   Windows will not necessarily report a problem in event log, and if the HDD recovers the data the disk will pass all diagnostics.

NTbackup "causes" this to happen because it is most likely the only time during the week that you try to access the data.

A solution in this case would be to use either hardware or O/S-based RAID1 with enterprise-class disks.  
0
 
LVL 15

Expert Comment

by:markdmac
Comment Utility
Have you tried enabling verbose logging in NTBackup to see if it is always hanging on the same files? Maybe a client has some files locked?

When you schedule the SBS backup, it creates an NTBackup job that you can modify.
0
 
LVL 42

Expert Comment

by:Davis McCarn
Comment Utility
The random freezing indicates dirt, bad ram, items needing reseating, bad capacitors, or a flakey power supply, pretty much in that order
I, quite literally, take them outside and use a leaf blower which cleans things out much better than Blow-Off.  Blow in through the two exit fan openings (P/S and CPU).  Don't try to get the fans going supersonic and you'll be OK.
I keep seeing 2-5 year old Samsung memory go bad which bugs the heck out of me and use the free ISO from http://www.memtest86.com to test.
If the system has one, lift the green shroud over the CPU heatsink and inspect for bad capacitors: http://www.google.com/imgres?imgurl=http://www.thenakedpc.com/dan/Bulging_Capacitors/close-up.jpg&imgrefurl=http://www.thenakedpc.com/dan/Bulging_Capacitors/index.html&usg=__I8uVpjpR_5DoPAsyyxnX-gEshcM=&h=526&w=786&sz=49&hl=en&start=0&zoom=1&tbnid=IXu-R6JwUeRtqM:&tbnh=146&tbnw=213&ei=9mdqTZmWGIausAO-vNGoBA&prev=/images%3Fq%3Dbad%2Bcapacitors%26hl%3Den%26safe%3Doff%26biw%3D1242%26bih%3D766%26gbv%3D2%26tbs%3Disch:1&itbs=1&iact=hc&vpx=363&vpy=101&dur=4609&hovh=184&hovw=275&tx=136&ty=110&oei=9mdqTZmWGIausAO-vNGoBA&page=1&ndsp=21&ved=1t:429,r:1,s:0  If you see them, either get a new systemboard or have someone who knows how to desolder replace them.
Next, reseat everything, especially ther cpu chip and put some new heatsink compound on it while your at it.
And last, if that still hasn't fixed it, replace the power supply.
0
 

Author Comment

by:mjkisic
Comment Utility
The system ran a full backup last night with out issue as well as the previous evening.  Markdmac, I will enable to verbose logging to see if there is an issue with a locked or corrupt file.  They do use an access based database heavily during the day that there have been intermiten problems requiring a database repair, however these issues have been on days where there has been no sever freezes during the times mentioned.

My first thoughts were hardware such as HD and RAM.  I have done chkdsk and Memtest without any errors reported.  I still find it odd that the three times this has happened has been almost exactly a 9 day span between freezing, hardware issues would be more random than that.  

I will report back after collecting some ntbackup log details.

Thanks!
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Run the sbs best practices analyzer and see if anything stands out.

You are saying this only occurs during a backup,right.

Could be a memory leak

To see if it is mem leak,reboot server daily and see if the issue goes away.
If so ,you've got an errant program or device driver causing the issue.

Make sure all device drivers and BIOS is flashed and up to date.
I see the Broadcom driver for that system has a memory leak depending upon version.

Have you run a chkdsk and or defrag?
0
 

Author Comment

by:mjkisic
Comment Utility
pgm554,  I have run best practices, and everything looks good.  As stated above this has happened on a little more than three occasions, including 2/5, 2/14 and 2/23.  Sbsbackup is set to run everyday, it is just on those days the system has frozen.  Have run chkdsk, and will do the defrag just incase and will check the drivers version.
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Just to be mathematical and see if it is a pattern.
Reboot the server halfway between the 9 day cycle and if it hangs 9 days from that reboot,you've got a memory leak.
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Also you should get server logs emailed to you daily on the SBS and check for any process that looks like it may be growing.

Like this:

Top 5 Processes by Memory Usage

Process Name - ID       Memory Usage       
sqlservr - 2052       204 MB                      
sqlservr - 1928       88 MB                      
services - 428       88 MB                      
sqlservr - 1980       42 MB                      
w3wp - 4556       39 MB                      



Top 5 Processes by CPU Usage

Process Name - ID       CPU Time       
sqlservr - 2052       0.4 %                      
svchost - 876       0.3 %                      
wmiprvse - 3712       0.2 %                      
jqs - 1828       0.2 %                      
0
 

Author Comment

by:mjkisic
Comment Utility
If the 9 day pattern holds true, the next lock up will be March 4th, unless it is just a coincidence, but today would be about half way through so I will reboot the server to day.  I do get the reports emailed to me, I will review them for a process that is growing and will report back.
0
 
LVL 47

Expert Comment

by:dlethe
Comment Utility
You know, since this is driving you nuts, and you have gotten a lot of wonderful advice on where to look ... you should consider a different strategy.   Something that the professionals use.

For around $100 you can get a diagnostic board.  Plug it into the backplane, run some software and it will usually (nothing is infallible, and the more money you spend the more things it will look at) tell you exactly what it is.  Even if it can't tell you what it is, it can certainly tell you what it ISN'T.   Just eliminating (truly eliminating CPU, memory, interrupts, hardware, etc..) will go a long way to finding the ultimate solution.

My gosh, your backups are failing, so this is pretty important to get resolved.  Google "PC diagnostic board", and choose something.  (Usually you can find a nice selection on ebay and equivalent auction sites).

Software-based hardware diagnostics can and WILL sometimes give you bad info.  A board in a slot with it's own CPU and memory that monitors traffic on the bus can easily detect faults that can be missed.  Even DOS-booted relatively simple memory diagnostic programs will easily miss memory problems.  

Also, if your PC is a server class, then is there a BIOS accessible event log?  

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:mjkisic
Comment Utility
OK in review sever health reports the top two memory using processes are dbsrv9 and store from 2/19 to 2/27 they fluctuate from 547 MB to 621 Mb, there is no pattern to the fluctuations.  On CPU usage there NTbackup consistently has the highest CPU usage, and there is a pattern here:

2/19      5.8%
2/20      6.3%
2/21      5.9%
2/22      4.9%
2/23      7.2%
2/24 server crashed, restarted
2/25 stopped nt backups for trouble shooting
2/26   resumed ntbackup    3.7%

These numbers may be just a reflection of the server freezing on the 24th?
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
0
 

Author Comment

by:mjkisic
Comment Utility
Thanks pgm554, we did install SEP 12.0  on the 9th of Feb, we had a crash previous to that though on the 5th.  However I am going to restart dbsrv9 service and see if that reduces memory hogging.
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Looking a some of the posts on the Sym site and such,leads me to believe that is the culprit.

Lots of people are complaining about that product.

I've had my issues with Symantec in the past and their products are sometimes ,just barely tolerable.

They have a nasty tendency of acquiring a decent product and ruining the support and QA.
0
 

Author Comment

by:mjkisic
Comment Utility
pgm554 It is sure easy enough to troubleshoot down that path, I will disable the dbsrv9 service for the week and see if that resolves the issue.  Perhaps it is the combination of memory hog of dbsrv9 and the cpu usage of the ntbackup routine that is causing the perfect storm.
I will report back on server status in the next couple of days and cross my fingers that that is the culprit.

Thanks for all of your input!
0
 
LVL 30

Accepted Solution

by:
pgm554 earned 500 total points
Comment Utility
It also,according to what else I read,messes with SQL DB's like the one found in the built in backup utility.

Which,if I remember correctly, is just a stripped down version Of BEX.

So you've got several instances of SQL running,all of which can be affected by the dbsrv9 issue.

That's the trouble with running everything off of one box,too many programs can cause issues with others.

The old Novell Small business server let you install up to 5 servers in one tree,the only limit was the seat license count.

Man I miss their products.
0
 

Author Closing Comment

by:mjkisic
Comment Utility
Thanks for the logical approach.
0
 

Author Comment

by:mjkisic
Comment Utility
I disabled the dbsrv9 service altogether, after, 9 plus days the server has not locked up during a normal scheduled backup.  I think pgm554 nailed that one, you would think a brand new updated  install of SEP would be a little better than that, I've run the program in one version or another on a lot of servers without that type of issue before.

Thanks!
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
Not a big Symantec fan
I am trying to install BEX 2010 on an SBS 2008 box and am just having fun,fun,fun.
It's the same old ,same old with Symantec,
Live updates not working or not finding installed products,cryptic errors and interfering with Spack installs.
Man between them and CA ,I don't know who is more mediocre.
0
 

Author Comment

by:mjkisic
Comment Utility
I know what you mean, but I do use the BESR on my bigger machines and networks and have to say when it does work well it has saved many a system.  I have been able to do a full restore of an entire sbs 2003 box supporting exchange and an AD of 35 as well as about 200 GB of data in less than 3 hours of down time.  I have found the BESR to be somewhat more reliable and use it instead of BEX, havent had any issues on sbs 2008.
0
 
LVL 30

Expert Comment

by:pgm554
Comment Utility
>I have found the BESR to be somewhat more reliable

That's a PowerQuest product,thank them for the stable code base.

Tried it on SBS 2011 and was unable to do a restore,Symantec says still unsupported.

I like BEXSR when it works and the dissimilar restore has worked pretty well on some upgrades,but when it screws up,it's usually the Symantec tweaks.

Live update breaks quite often in my experience.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

If you are a user of the discontinued Microsoft Office Accounting 2008 (MSOA) and have to move to a new computer running Windows 8, you will be unhappy to discover that it won't install.  In particular, Microsoft SQL Server 2005 Express Edition (SSE…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now