?
Solved

SBS 2003 randomly freezes during backup but not all the time.

Posted on 2011-02-26
21
Medium Priority
?
1,356 Views
Last Modified: 2012-05-11
This is driving me nuts, I manage a lot of sbs boxes, this one I just inherited.  Dell Poweredge SC1430 with SBS 2003 R2 SP2.  The server has randomly frozen several times where video goes out, mouse, keyboard and network all unresponsive the only option is hard reboot.  There are no errors in event logs, event logs are very clean, and this doesn't occur all the time.  Using sbsbackup to an external usb hard drive, here are the only consistencies:

2/5/11 ntbackup starts 5:30 server hangs at 6:11
2/14/11 ntbackup starts at 6:30 server hangs at 6:57
2/23/11 ntbackup starts at 7:30 server hangs at 7:46

all ntbackups in between the above listed are successful and usually complete about 11:00 pm.
Here is what I have done to troubleshoot:
Changed out usb backup drive after the first time, but happened again.
Changed start time of the backup 1hour later each time
Disabled antivirus (it is set to avoid, exchange and backup drives anyway)
checked server hardware (memory, hard drive)
Checked scheduled tasks for conflicts no other tasks other than system tasks listed.
There are no other backup programs running, server has updates, plenty of space on both internal and external hard drives.  There were scheduled nt backup routines running unsuccessfully when I took over the server but I deleted and removed the scripts and set up sbsbackup to run instead.  Seems odd that this occurs almost every 9 days maybe just a coincidence?  I am out of ideas, any help is appreciated!

Thanks
0
Comment
Question by:mjkisic
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
  • 2
  • +2
21 Comments
 
LVL 47

Expert Comment

by:David
ID: 34991480
One of the root causes can be the result of using desktop-class instead of enterprise class HDDs.  You can get this when the disk goes into a deep recovery state due to a bad block.   Windows will not necessarily report a problem in event log, and if the HDD recovers the data the disk will pass all diagnostics.

NTbackup "causes" this to happen because it is most likely the only time during the week that you try to access the data.

A solution in this case would be to use either hardware or O/S-based RAID1 with enterprise-class disks.  
0
 
LVL 15

Expert Comment

by:markdmac
ID: 34991589
Have you tried enabling verbose logging in NTBackup to see if it is always hanging on the same files? Maybe a client has some files locked?

When you schedule the SBS backup, it creates an NTBackup job that you can modify.
0
 
LVL 43

Expert Comment

by:Davis McCarn
ID: 34991712
The random freezing indicates dirt, bad ram, items needing reseating, bad capacitors, or a flakey power supply, pretty much in that order
I, quite literally, take them outside and use a leaf blower which cleans things out much better than Blow-Off.  Blow in through the two exit fan openings (P/S and CPU).  Don't try to get the fans going supersonic and you'll be OK.
I keep seeing 2-5 year old Samsung memory go bad which bugs the heck out of me and use the free ISO from http://www.memtest86.com to test.
If the system has one, lift the green shroud over the CPU heatsink and inspect for bad capacitors: http://www.google.com/imgres?imgurl=http://www.thenakedpc.com/dan/Bulging_Capacitors/close-up.jpg&imgrefurl=http://www.thenakedpc.com/dan/Bulging_Capacitors/index.html&usg=__I8uVpjpR_5DoPAsyyxnX-gEshcM=&h=526&w=786&sz=49&hl=en&start=0&zoom=1&tbnid=IXu-R6JwUeRtqM:&tbnh=146&tbnw=213&ei=9mdqTZmWGIausAO-vNGoBA&prev=/images%3Fq%3Dbad%2Bcapacitors%26hl%3Den%26safe%3Doff%26biw%3D1242%26bih%3D766%26gbv%3D2%26tbs%3Disch:1&itbs=1&iact=hc&vpx=363&vpy=101&dur=4609&hovh=184&hovw=275&tx=136&ty=110&oei=9mdqTZmWGIausAO-vNGoBA&page=1&ndsp=21&ved=1t:429,r:1,s:0  If you see them, either get a new systemboard or have someone who knows how to desolder replace them.
Next, reseat everything, especially ther cpu chip and put some new heatsink compound on it while your at it.
And last, if that still hasn't fixed it, replace the power supply.
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 

Author Comment

by:mjkisic
ID: 34992222
The system ran a full backup last night with out issue as well as the previous evening.  Markdmac, I will enable to verbose logging to see if there is an issue with a locked or corrupt file.  They do use an access based database heavily during the day that there have been intermiten problems requiring a database repair, however these issues have been on days where there has been no sever freezes during the times mentioned.

My first thoughts were hardware such as HD and RAM.  I have done chkdsk and Memtest without any errors reported.  I still find it odd that the three times this has happened has been almost exactly a 9 day span between freezing, hardware issues would be more random than that.  

I will report back after collecting some ntbackup log details.

Thanks!
0
 
LVL 30

Expert Comment

by:pgm554
ID: 34992240
Run the sbs best practices analyzer and see if anything stands out.

You are saying this only occurs during a backup,right.

Could be a memory leak

To see if it is mem leak,reboot server daily and see if the issue goes away.
If so ,you've got an errant program or device driver causing the issue.

Make sure all device drivers and BIOS is flashed and up to date.
I see the Broadcom driver for that system has a memory leak depending upon version.

Have you run a chkdsk and or defrag?
0
 

Author Comment

by:mjkisic
ID: 34992375
pgm554,  I have run best practices, and everything looks good.  As stated above this has happened on a little more than three occasions, including 2/5, 2/14 and 2/23.  Sbsbackup is set to run everyday, it is just on those days the system has frozen.  Have run chkdsk, and will do the defrag just incase and will check the drivers version.
0
 
LVL 30

Expert Comment

by:pgm554
ID: 34992434
Just to be mathematical and see if it is a pattern.
Reboot the server halfway between the 9 day cycle and if it hangs 9 days from that reboot,you've got a memory leak.
0
 
LVL 30

Expert Comment

by:pgm554
ID: 34992459
Also you should get server logs emailed to you daily on the SBS and check for any process that looks like it may be growing.

Like this:

Top 5 Processes by Memory Usage

Process Name - ID       Memory Usage       
sqlservr - 2052       204 MB                      
sqlservr - 1928       88 MB                      
services - 428       88 MB                      
sqlservr - 1980       42 MB                      
w3wp - 4556       39 MB                      



Top 5 Processes by CPU Usage

Process Name - ID       CPU Time       
sqlservr - 2052       0.4 %                      
svchost - 876       0.3 %                      
wmiprvse - 3712       0.2 %                      
jqs - 1828       0.2 %                      
0
 

Author Comment

by:mjkisic
ID: 34992489
If the 9 day pattern holds true, the next lock up will be March 4th, unless it is just a coincidence, but today would be about half way through so I will reboot the server to day.  I do get the reports emailed to me, I will review them for a process that is growing and will report back.
0
 
LVL 47

Expert Comment

by:David
ID: 34992551
You know, since this is driving you nuts, and you have gotten a lot of wonderful advice on where to look ... you should consider a different strategy.   Something that the professionals use.

For around $100 you can get a diagnostic board.  Plug it into the backplane, run some software and it will usually (nothing is infallible, and the more money you spend the more things it will look at) tell you exactly what it is.  Even if it can't tell you what it is, it can certainly tell you what it ISN'T.   Just eliminating (truly eliminating CPU, memory, interrupts, hardware, etc..) will go a long way to finding the ultimate solution.

My gosh, your backups are failing, so this is pretty important to get resolved.  Google "PC diagnostic board", and choose something.  (Usually you can find a nice selection on ebay and equivalent auction sites).

Software-based hardware diagnostics can and WILL sometimes give you bad info.  A board in a slot with it's own CPU and memory that monitors traffic on the bus can easily detect faults that can be missed.  Even DOS-booted relatively simple memory diagnostic programs will easily miss memory problems.  

Also, if your PC is a server class, then is there a BIOS accessible event log?  

0
 

Author Comment

by:mjkisic
ID: 34992562
OK in review sever health reports the top two memory using processes are dbsrv9 and store from 2/19 to 2/27 they fluctuate from 547 MB to 621 Mb, there is no pattern to the fluctuations.  On CPU usage there NTbackup consistently has the highest CPU usage, and there is a pattern here:

2/19      5.8%
2/20      6.3%
2/21      5.9%
2/22      4.9%
2/23      7.2%
2/24 server crashed, restarted
2/25 stopped nt backups for trouble shooting
2/26   resumed ntbackup    3.7%

These numbers may be just a reflection of the server freezing on the 24th?
0
 
LVL 30

Expert Comment

by:pgm554
ID: 34992609
0
 

Author Comment

by:mjkisic
ID: 34992653
Thanks pgm554, we did install SEP 12.0  on the 9th of Feb, we had a crash previous to that though on the 5th.  However I am going to restart dbsrv9 service and see if that reduces memory hogging.
0
 
LVL 30

Expert Comment

by:pgm554
ID: 34992723
Looking a some of the posts on the Sym site and such,leads me to believe that is the culprit.

Lots of people are complaining about that product.

I've had my issues with Symantec in the past and their products are sometimes ,just barely tolerable.

They have a nasty tendency of acquiring a decent product and ruining the support and QA.
0
 

Author Comment

by:mjkisic
ID: 34992795
pgm554 It is sure easy enough to troubleshoot down that path, I will disable the dbsrv9 service for the week and see if that resolves the issue.  Perhaps it is the combination of memory hog of dbsrv9 and the cpu usage of the ntbackup routine that is causing the perfect storm.
I will report back on server status in the next couple of days and cross my fingers that that is the culprit.

Thanks for all of your input!
0
 
LVL 30

Accepted Solution

by:
pgm554 earned 2000 total points
ID: 34993444
It also,according to what else I read,messes with SQL DB's like the one found in the built in backup utility.

Which,if I remember correctly, is just a stripped down version Of BEX.

So you've got several instances of SQL running,all of which can be affected by the dbsrv9 issue.

That's the trouble with running everything off of one box,too many programs can cause issues with others.

The old Novell Small business server let you install up to 5 servers in one tree,the only limit was the seat license count.

Man I miss their products.
0
 

Author Closing Comment

by:mjkisic
ID: 35043876
Thanks for the logical approach.
0
 

Author Comment

by:mjkisic
ID: 35043898
I disabled the dbsrv9 service altogether, after, 9 plus days the server has not locked up during a normal scheduled backup.  I think pgm554 nailed that one, you would think a brand new updated  install of SEP would be a little better than that, I've run the program in one version or another on a lot of servers without that type of issue before.

Thanks!
0
 
LVL 30

Expert Comment

by:pgm554
ID: 35044153
Not a big Symantec fan
I am trying to install BEX 2010 on an SBS 2008 box and am just having fun,fun,fun.
It's the same old ,same old with Symantec,
Live updates not working or not finding installed products,cryptic errors and interfering with Spack installs.
Man between them and CA ,I don't know who is more mediocre.
0
 

Author Comment

by:mjkisic
ID: 35044442
I know what you mean, but I do use the BESR on my bigger machines and networks and have to say when it does work well it has saved many a system.  I have been able to do a full restore of an entire sbs 2003 box supporting exchange and an AD of 35 as well as about 200 GB of data in less than 3 hours of down time.  I have found the BESR to be somewhat more reliable and use it instead of BEX, havent had any issues on sbs 2008.
0
 
LVL 30

Expert Comment

by:pgm554
ID: 35071564
>I have found the BESR to be somewhat more reliable

That's a PowerQuest product,thank them for the stable code base.

Tried it on SBS 2011 and was unable to do a restore,Symantec says still unsupported.

I like BEXSR when it works and the dissimilar restore has worked pretty well on some upgrades,but when it screws up,it's usually the Symantec tweaks.

Live update breaks quite often in my experience.
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The 6120xp switches seem to have a bug when you create a fiber port channel when you have a UCS fabric interconnects talking to them.  If you follow the Cisco guide for the UCS, the FC Port channel will never come up and it will say that there are n…
Because virtualization becomes more and more common, and, with Microsoft Hyper-V included in Windows Server at no additional costs, and, most server hardware nowadays is more than capable of running a physical Small Business Server (SBS) 2008 or 201…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question