[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Outlook locking up when server avg. disk queue length spike to 500, 800

Posted on 2011-10-13
11
Medium Priority
?
554 Views
Last Modified: 2012-12-30
I just inherited a small network (10 days ago) and I am having lock up issues with outlook on the terminal server, everyone in the main office is working fine.  It’s a small environment with 2 server, 2003 R2 with Exchange 2003, the other 2008 terminal server.  The local workstations are a mixture of XP, WIN7, all with Office 2007 all OS professional.  20 users, 6-8 remote using the terminal sever.

So far I tracked the lock ups to when the Avg. Disk Queue Length is above 150, and the reason why the desktop users are not experiencing the problem is because they have outlook in cache mode.  Take them out and they will experience the same problem as the terminal users.

Using perfmon and Process Explorer, I can see the most active process is the store.exe and when outlook switches to not responding/get server not responding the Avg. Disk Queue from perfmon is above 500, reports upward to 800

The disk spikes can last for anywhere from 15 seconds up to 2 minutes, effectively locking outlook and even the console of the server during the event.   I am seeing a few ftdisk warnings in the event viewer, 2 from a few days ago, 12 from a few weeks ago, but nothing during the event.  It’s happening ever 40 to 70 minutes on the server.

My question, what’s the best method to tell what Exchange process is causing the IO spike or am I dealing with a damaged store. This just the beginning of a  hardware failure?  The firmware is out date and I plan on updating disk/controller/system board this weekend.

The server is an older Dell SC1430, with a simple SATA RAID1.  No errors reported from the controller, but it is listing a number of firmware initializations information notifications for some reason.  I don’t recall rebooting the server 17 times in the last few days, but the card is listing initilaizations occuring.

The nightly Exchange defrag are running and listed as completing successfully in the event viewer.
0
Comment
Question by:Jeff_Creed
  • 8
  • 3
11 Comments
 

Author Comment

by:Jeff_Creed
ID: 36966613
Update - The RAID card hasn't listed a firmware initialization for past 6 hours, but perform is showing 17 jumps above 500.  Even appears to be every 15 to 17 minutes
0
 
LVL 11

Accepted Solution

by:
Paul S earned 2000 total points
ID: 36966677
I suspect hardware, be worried. make sure you are backing up this server. I had an exchange 2003 server with RAID-5 have a disk fail and when we replaced the disk everything appeared fine until we found EDB store corruption two or three days later.

I would definitely update everything (BIOS, Firmware, RAID driver, etc...) maybe start locating a compatible RAID card to replace the current one in case it is failing. Do you have spare drives on site already? Do you have Open manage installed? can you download the RAID firmware logs from the card?
0
 
LVL 11

Expert Comment

by:Paul S
ID: 36966682
chkdsk c: /f might be a good idea too.

also, read this:
http://www.msexchange.org/tutorials/exchange-isinteg-eseutil.html
0
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

 

Author Comment

by:Jeff_Creed
ID: 36969559
Thanks - I think I will perform a DR test restore of the system into a VM this evening instead of performing the firmware upgrade.  I will reschedule the firmware upgrade for tommorrow evening.

The store before I arrived had not been correctly backed up since 6/19/11.  They had Backup Exec uninstalled and switched to Mozy which had never ran corectly.  Since Mozy running correctly/didnt the log files, and circuarl was disabled from Symantec - they had almost filled the drive with log files before I came on board.  I used ntbackup to clear the log files, but I am now using StorageCraft which is VSS snapshot technology.

I will restore the server right now actually.

The Dell SC1430 line doesnt have OpenMangement tools like the other systems, checked with Dell. I have standard SATA drives onsite.

The system drive was at 54% fragmentation, it took over 3 hours to defrag a 9GB of data just a few nights ago.  The drive with Exchange store is 26%.

On the plus side the store is mounting and dismounting nicely all the nights I have worked on the server.
0
 

Author Comment

by:Jeff_Creed
ID: 36976961
DR restore went clean.  I have an ESXi with an OpenSource ISCSI target for testing.   The IO spiking that the physical server is doing while idle is not happening in the restored virtual environment.  So thanks for the insight on the hardware.

I moved forwarded with BIOS updates for the drives, controller, and system board.  I also ran Exchange Server Analyzer and found some non standard memory configurations.  I made the following changes to the production system over the weekend.

http://support.microsoft.com/kb/315407
http://technet.microsoft.com/en-us/library/aa996786(EXCHG.80).aspx

Also ran chkdsk /f on all drives, twice on the exchange volume - no errors.

After the work, the IO looks very different now and outlook is responding differently.  The administrator account had 22K of warning emails in the inbox.  Before trying to select them all and delete from the terminal server would send the disk latency immediately into the 500 range, cause outlook to stop responding and communication warnings.

This time though it worked as expected, took 15 minutes.  The disk latency instead of shooting up to 500 and beyond only rose to 30 to 40.  Outlook only warned twice about lost of connection during the deletion.  I don’t like seeing the warning but it much better than having everyone taking out.

Going forward – Going to watch tomorrow.  I have two consumer grade 500GB SATA and I am ordering a replacement controller card.   I am entertaining the thought just install the two 500GB drive direct to motherboard and doing a software mirror.   But not sure if software mirroring, etc will keep up with say 20 Exchange users?  Thoughts?  4GB – dual quad core 1.86

High hopes the firmware upgrade will buy me the time to get the new controller in and the user group will be patient.
0
 

Author Comment

by:Jeff_Creed
ID: 37013630
All drives replaced along with controller card, still seeing spikes to 500.  Now that I am on a new controller and HD performing a defrag/integrity check on the store.
0
 

Author Comment

by:Jeff_Creed
ID: 37015403
The integrity check was successful and the server has been running like a charm all day.  The integrity check took several hours to complete but was successful.  Ran out of time last night to do the defrag, but will perform one this evening.
0
 
LVL 11

Expert Comment

by:Paul S
ID: 37020923
Sounds like you are making great progress. Has Outlook and/or the server froze since the controller and disk change?
0
 

Author Comment

by:Jeff_Creed
ID: 37025556
The defrag of the store started running outside my window and had to cancel.  Getting around 3GB an hour and the store with stream is 40Gb.  Shedule for this weekend.

Yesterday all day server ran great. Only had one just to 250 on server and all users report great performance and no lock ups.  But then 5pm last night till morning the old behavior is back, shooting up to 500.

I can see firmware initialization information listed on the new controller also, just like before.   There is no time stamp on the log for them just ##################.  

Controller ID: 0 MegaRAID firmware intialization started:    (PCI ID 0x1000/ 0x0054/ 0x1028  / 0x1f09)  have nine listed between 10/23 5AM to this morning.

Going to schedule a reboot of the server this evening if the IO spikes move into the day.
0
 

Author Comment

by:Jeff_Creed
ID: 37108259
Everything about this server had been goofed with in some fashion; regrettably I am crying uncle.  Just planning for a migration to Server 2008/Exchange 2010 this month instead of next year.   The disk IO spikes return after a day or so, store.exe and system process are listed as the heavy hitters.  But a aimple reboot clears the condition.  The nightly exchange defrags and weekly tasks are completing normally, chkdsk clean, and offline integrity checks are passing for the store.  All the DR restores have been successful.
0
 

Author Closing Comment

by:Jeff_Creed
ID: 38731662
Dumped hardware completely moved to virtual environment - restored server from backup into vm, no issues since.  Failing hardware
0

Featured Post

Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to effectively resolve the number one email related issue received by helpdesks.
I came across an unsolved Outlook issue and here is my solution.
The basic steps you have just learned will be implemented in this video. The basic steps are shown to configure an Exchange DAG in a live working Exchange Server Environment and manage the same (Exchange Server 2010 Software is used in a Windows Ser…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Suggested Courses
Course of the Month19 days, 9 hours left to enroll

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question