• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 670
  • Last Modified:

Severe paging on SBS 2008 with 16GB memory (lots). Whats wrong?!

Brand new DELL T610 with dual quad core cpu. 16GB memory 3x600GB SAS in RAID5.

Totally updated Windows SBS 2008 with all service packs and patches.

SQL Express Edition 2008 R2 and SQL 2008 running three databases.
Resource Monitor in the Os reports 36% physical memory in use.
Page file fixed size 24559MB

Antivirus on the server was Trend Micro Worry Free Business Advanced. Tried uninstalling it without any improvement on the issue.

10 client PC's.

Problem description:
The server is ok after restart, but after a few days it starts paging. We see it in performance monitor on the counter "Page Faults/sec". It keeps apging in spikes everey few seconds. Sometimes it's persistent and then we also see a rise in "Avg Disk Queue Length" which again indicates heavy paging.

The server is so slow sometimes that the mouse pointer is jittery.

The users see the slowness in the SQL databases responding slowly, and then again their client software.

When I check resource monitor in the OS and see what processes are using memory, it changes all the time. When I use iexplore, then that process is paging with up to several thousand page faults per minute. So I cannot pinpoint one culprit. Its just all over the OS.

This is a tough one, I'd give 2000 points if I could.

Thanks for helping!


0
rakoczy
Asked:
rakoczy
  • 5
  • 3
  • 2
1 Solution
 
setasoujiroCommented:
Hi;
First off, you're living on the edge with your raid5 with just 3 disks:)
Can you see which app is hogging memory at the times?
could it be an out of bound program leaking mem?
0
 
aleghartCommented:
From past experience with my own server, I would re-install the AV, then perform a complete un-install, to make sure it's really gone.  Alternatively, contact the AV vendor to see if they have a removal tool.

Then check your NIC settings.  Make sure there are no options checked for TCP/IP offload.  Don't know why I had them enabled, but disabling it removed a major bottleneck with network access.

Last was a motherboard driver.  It was reading 1 CPU @100%, and fans were going crazy.  Once the correct hardware drivers were installed, that went down.  Doesn't sound like the same issue here though.
0
 
rakoczyAuthor Commented:
Thanks for the help so far!

setas: Please elaborate. 3 disks will get a better read performance and a slightly more IO intense write. I don't think that's an issue on any server with 3 raid5 disks. I see the three SQL DB's hogging up to 2,5GB memory, but still theres plenty left. Also Exchange with it's many processes is hogging memory as expected. But still no shortage of memory...

aleq: We have completely uninstalled the AV today and will test to see if it has eny effect. Will check the TCPIP offload settings. A good tip to anybody is that you need the BAC utility from Broadcom to do this if you have Broadcom adapters. Basically you'll need software from the NIC vendor.

Don't expect a closing of this Q for a few weeks ;)

Anyone know what is considered a high page fault value?
0
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

 
setasoujiroCommented:
no i mean living on the edge in case of failure. Unless you have a spare configured in that array, you will suffer severe dataloss in case of a disk fail.

0
 
aleghartCommented:
1 disk fail in a 3-disk RAID-5 array brings on some severe performance problems.  When you replace the drive, the controller will be reading from the remaining 2 "good" disks, writing to the replacement disks, while doing the parity calculations required to recreate the failed disk.  Some controllers (soft) use the system CPU to do the calcs, so you get application slowdown as well as storage I/O.

3x600 would run faster as 6x300 (more spindles), as well as faster replacement of a failed disk.  For that same 1.2TB of space, 4x600 in RAID-10 might do better.

In any case, I'm in agreement about hot spares.  Friday before my last vacation, two drives failed within hours.  The hot spare was already re-building by the time we discovered the second failed drive.  RAID-10 and RAID-6 survived the 2 drives dead at the same time.  But, the hot spare saved quite a few hours.  Without a monkey to manually insert a new drive, it would have been another 10-12 hours before we could start the rebuild.
0
 
rakoczyAuthor Commented:
Tsss... RAID 5 on a small 10 client customer is as good as it gets. One disk failure will not make them suffer dataloss at all. RAID5 survives perfectly well with one failure. Yes a small performance loss, but nothing that you cannot survive while waiting for a replacement. All disks are fine by the way.

Also.. yes we have a hotspare. Always.

RAID10 is too expensive for most of our customers since it doubles the cost pr GB, although its the fastest solution I agree. good tip on the 6x300 although that would fill up the server leaving only one spot for extra disks later on. Most smalltime servers are 8 disk-bay servers from us.

Well.. Interesting discussion but no news.
0
 
setasoujiroCommented:
lol tss... raid 5 indeed survives with one faiure, but when 1 disk fails chances are big the other ones are at th end of their time as well, so i wouldn't go about waiting for a replacement :) but you say you have a spare, so you're fine.
0
 
rakoczyAuthor Commented:
I've had engineers from DELL look at the server and they see some necessary BIOS and network updates that we need to install there. I'll let you know if that solves it. BTW. You can call DELL support on all DELL servers and send dem a report callled DSET which they can use to completely analyze everything on the server.
0
 
rakoczyAuthor Commented:
We've done a lot of work on this machine and I can't really say what made a difference. We've uninstalled Trend Micro AV and that seemed to solve most of our issues, but we've also updated all firmware and to be sure changed the brand new switch for a new one.

It seems the culprit might have been Trend even though we've exluded almost all directories from the Real Time Scan.
0
 
rakoczyAuthor Commented:
We've fixed the issue ourselves.
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now