• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5272
  • Last Modified:

Exhange Server 2003 performance issues

symptoms:  Users have been getting tons of "outlook is trying to receive data from the exchange server".  In checking the performance monitor, pages/sec and %proc time shoots up between 80-100 .  I ran the exchange troubleshooting assistant and it reports:

Overview  
Items of severity Errors  
 
  Disk bottleneck found :  
 A potential performance issue was observed from the disk performance counters. One or more disks is exhibiting a performance bottleneck.
 
  Potential issue with RPC activity found :  
 A potential issue with the RPC activity for some MAPI operations was identified.
  Tell me more about this issue and how to resolve it.  
 
  Network interface performance issue found :  
 A performance issue was found with the network interface performance counters.
 
  Unusually high user activity detected :  
 RPC Operations per second rates indicate a user or users on this server are unusually active.
  Tell me more about this issue and how to resolve it.  
 
  Slow Local Security Authority Subsystem Service calls Area: Function Call log
 The Function Call log (FCL) shows some slow calls to LSASS.
  Tell me more about this issue and how to resolve it.  
 
Items of severity Warnings  
 
  The top 6 users account for 100% of the MAPI CPU usage on the server :  
 The top 6 users account for 100% of the MAPI CPU usage on the server.
  Tell me more about this issue and how to resolve it.  
 
Informational Messages  
 
  No issues found with location of Exchange data files :  
 No issues were found with the location of the Exchange data files, page file, TEMP directory or TMP directory, or with the amount of disk space on one or more drives.
 
  No Lightweight Directory Access Protocol (LDAP) performance issues identified :  
 No issue was found with the 'MSExchangeDSAccess Domain Controllers LDAP' performance counters.
 
  No processor or memory bottleneck found :  
 No processor or memory bottleneck was found on the server exchange.
 
  RPC activity distributed across many users :  
 The usage sample shows RPC activity is distributed across many users, rather than being caused by a single user.

I have also run the ExMon utility today all day (9am-4pm) and here are the results: (sorry i know they are hard to read its CSV (only output)
,Packets,Operations,CPU Time (ms),CPU (%),Avg. Server Latency (ms),Max. Server Latency (ms),Bytes In,Bytes Out,Client Versions,Client IP Addresses,Read Pages,PreRead Pages,Dirtied Pages,Log Bytes
,"29374","35044","132780","64.69%","10","11998","783962","634315600","11.8000.0 ","192.168.20.125 ","0","0","0","0"
,"34377","56840","13710","6.68%","0","1171","15866558","3547876","11.8000.0 ","192.168.20.103 ","0","0","0","0"
,"10965","32144","11775","5.74%","4","1046","2630751","24644847","11.8000.0 ","192.168.20.112 ","0","0","0","0"
,"14952","26379","9705","4.73%","2","984","743223","23591735","11.8000.0 ","192.168.20.133 ","0","0","0","0"
,"4148","7821","9345","4.55%","8","437","1660393","913717","11.8000.0 ","192.168.60.101 ","0","0","0","0"
,"2565","7333","6030","2.94%","4","859","6121983","7818026","11.8000.0 ","192.168.20.142 192.168.20.141 ","0","0","0","0"
,"8248","25737","3120","1.52%","0","406","1799508","10656224","11.8000.0 ","192.168.20.151 ","0","0","0","0"
,"2347","6400","2625","1.28%","5","843","273036","4015804","11.8000.0 ","192.168.20.111 ","0","0","0","0"
,"1910","7016","2085","1.02%","1","359","253654","519605","11.8000.0 ","192.168.20.108 ","0","0","0","0"
,"1256","3367","1530","0.75%","3","1796","6221415","2110106","11.8000.0 ","192.168.20.113 ","0","0","0","0"
,"2152","7867","1155","0.56%","1","828","211240","314517","11.8000.0 ","192.168.20.107 ","0","0","0","0"
,"3099","10745","1140","0.56%","0","406","208404","677557","11.8000.0 ","192.168.20.146 ","0","0","0","0"
,"2043","7222","1005","0.49%","1","218","304173","676437","11.8000.0 ","192.168.20.140 ","0","0","0","0"
,"961","3274","945","0.46%","1","171","159077","6521269","11.8000.0 ","192.168.20.149 ","0","0","0","0"
,"2074","7329","900","0.44%","0","78","956318","219452","11.8000.0 ","192.168.20.104 ","0","0","0","0"
,"922","2862","855","0.42%","2","234","3223659","1741194","11.8000.0 ","192.168.20.130 ","0","0","0","0"
,"831","3095","780","0.38%","2","203","92368","578719","11.8000.0 ","192.168.20.118 ","0","0","0","0"
,"623","2468","765","0.37%","3","109","172321","445318","11.8000.0 ","192.168.20.132 ","0","0","0","0"
,"623","2171","630","0.31%","4","437","316297","204120","11.8000.0 ","192.168.20.102 ","0","0","0","0"
,"900","3097","630","0.31%","2","124","898662","1283687","11.8000.0 ","192.168.20.145 ","0","0","0","0"
,"777","2884","435","0.21%","1","171","124192","834408","11.8000.0 ","192.168.20.106 ","0","0","0","0"
,"531","1640","360","0.18%","2","124","79113","3995442","11.8000.0 ","192.168.20.115 ","0","0","0","0"
,"388","1473","345","0.17%","2","140","24790","93269","11.8000.0 ","192.168.10.108 ","0","0","0","0"
,"229","898","315","0.15%","4","218","65791","60039","11.8000.0 ","192.168.60.108 ","0","0","0","0"
,"656","1720","300","0.15%","1","124","5620438","150226","11.8000.0 ","192.168.20.110 ","0","0","0","0"
,"270","928","285","0.14%","4","234","24654","510105","11.8000.0 ","192.168.20.124 ","0","0","0","0"
,"237","748","240","0.12%","3","140","137758","1208601","11.8000.0 ","192.168.20.119 ","0","0","0","0"
,"207","653","240","0.12%","3","109","114572","76407","11.8000.0 ","192.168.40.107 ","0","0","0","0"
,"212","953","225","0.11%","2","109","89427","607307","11.8000.0 ","192.168.20.109 ","0","0","0","0"
,"452","1082","210","0.10%","1","124","4899066","65925","11.8000.0 ","192.168.20.137 ","0","0","0","0"
,"829","1707","165","0.08%","0","124","43710","282035","11.8000.0 ","192.168.20.14 192.168.20.103 ","0","0","0","0"
,"210","765","135","0.07%","4","109","58878","30463","11.8000.0 ","192.168.20.135 ","0","0","0","0"
,"124","459","120","0.06%","2","46","47298","378671","11.8000.0 ","192.168.40.100 ","0","0","0","0"

Background of the server:
Dell Poweredge dual xeon 3ghz proc, 2gb ram,raid 5,50 mailboxes, 8 users are utilizing exchange active sync.  6 mailboxes are over 1gb.  I also use the exchange journaling feature (keep a year of email, that mailbox is huge (about 17 or 18gb).  the disks have two partitions: C and D.  C is the OS, all of exchange is on D (i know bad). Shared calendars are being used.  

 
Ok what I have done so far:

>implemented a 1gb mailbox size limit.  giving the users who are over a week to get cleaned up.

>followed all recommendations in here: http://support.microsoft.com/kb/815372
boot.ini changes (base video, /3gb, /userva),changed to standard vga driver.  all the reg entries were already correct. (i haven't rebooted with boot.ini changes yet.  tonight).

>increase page file size to be 1.5 physical ram (it is on D).  

>the disks are defragged weekly.  C is 15gb and has 10 free.  D is 120gb and has 35gb free

That's all i can remember that i have done--i have been working on all week so i am fried.  

Here is other things i am considering doing but want feedback:

>adding 2gb more ram

>adding another disk and putting the temp directory and pagefile on it.

I know the RAID 5 is not great for disk performance.  I know the logs and mailboxes should be on different drives--can i move them?

I know the hardware config is not optimal for exchange but with 50 mailboxes doesn't seem(?) like alot for the hardware.  


0
David Scott, MCSE
Asked:
David Scott, MCSE
  • 14
  • 2
  • 2
  • +1
4 Solutions
 
vtobusmanCommented:

  Have  you tried to do an off line defrag of the databases ?  if not please try it... also adding a drive to the raid 5 and moving the page file will always help with system performance.... but exchange is a memory user so adding ram wil help to a point....  First check to see how much ram is being used via task manager...  also check you antivirus i had an issue where Symantec corp was using allot is processing power on the message in and out directories ......  so i excluded the directories for scanning....  also what else are you running on the server ?  or is this an exchange stand alone sever ?
0
 
redseatechnologiesCommented:
Don't do an offline defrag, that isn't going to help in the slightest.

I would also not be adding more ram, too much ram will adversely affect exchange - it is not the memory pig people think - it is a disk pig.

If you can add 2 disks in RAID1 for the transaction logs, that will give you the best bang for buck.

Also, how are your users accessing this server?  RPC/HTTP, MAPI, POP3?
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
redseat:  how would i move the transaction logs to the new raid 1 array?  

they are accessing via rpc/http (mostly cached mode excluding the exchange active sync users as i understand that can be problematic).

vtob: i use mcafee virus scan enterprise and mcafee group shield for exchange.  i have the mailroot and mdbdata directories excluded from scanning.

i was running backup exec continuous protection server on this server, but have stopped the jobs to see if performance improves.  this server is also one of two domain controllers (i know not recommended for exchange)

0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
David Scott, MCSENetwork AdministratorAuthor Commented:
i just reran the performance analyzer (which put the boot.ini changes /3gb /userva /basevideo into effect).  
i only rec'd the network interface outbound packets beyond threshold error and
the slow lsass calls.  

for network interface error, MS says its a faulty nic.  i've got dual nics on the server and i already changed to the other nic and still rec this error so i doubt both nics on the server are faulty?

for the slow lsass calls, one thing it recommends is the "never ping" reg key set to 1 on domain controllers which i have already done.

I don't know if i didn't get the bottleneck issues b/c of the boot.ini changes or b/c all the sales people were in a meeting while it was running ;)
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
i just ran another troubleshooting assistant performance test:

I am getting this error about unusually high rpc activity.  I mean i only have 50 mailboxes. that doesn't seem like it should create "unusually high rpc activity"...is it possible its the exchange active sync?  

the 15gb journal mailbox-could that be contributing to performance issues? i could look into a third party app for email archiving.....or setup the journal mailbox in outlook and set the autoarchive and save the archive on a file server.  

0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
i forgot to post this from the report:

RPC Performance Counter Data (Performance_RPCPerfCounters1.1.1.1.1.1.1.1.1.1.1.1.3.1.1.7)
   RPC Performance Counters
   10/05/2007 12:34:50 - 10/05/2007 12:39:44
  The ratio of active logons per mailbox on the server is 3.75.
  The user activity level is 0.353 operations per sec per user.  
   Since the RPC operations per second per user is greater than 0.25, the RPC operations per second rates indicate a user or users on this server are unusually active. The measured RPC operations per second per user rate is 0.353.  
  RPC health during the time range: 10/05/2007 12:34:50 - 10/05/2007 12:39:44
  Summary of RPC results
   If the users accessing the Exchange server are highly active, and you are unable to reduce the load on your server, and your server is exhibiting bottlenecks, you should consider moving some users to another server.
   RPC Operations per second rates indicate a user or users on this server are unusually active.
0
 
MRR045Commented:
No more users than you have hitting the server, your server configuration should be fine even though it would be nice if you could set it up according to standards. I try to stick to M$ way of setting things up just so I don't have to worry about if my setup could be causing the problem.
I would start with the NIC issue. It is ENTIRELY possible that both NICS are bad. I have seen it more than once. All it takes is a small power surge...
If you think Active Sync could be the issue, disable it for a couple of days - just remember to tell your end users.
There is a link from MS that tells you how to move logs and etc... I will find it and post it if someone else don't beat me to it.
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
actually i found the link for the move of the logs.  i will try another nic card.  

i have had very few outlook errors today.  maybe my efforts so far have helped.  or maybe its just friday before a 3 day weekend!! columbus day!!!

i suppose I could try the active sync thing but it would be tough to sell as the sales people are hooked on getting their email on their phones now.  

thanks for the input.  
0
 
redseatechnologiesCommented:
We are obviously in opposing time zones :)

Anyway, you have the link to move the logs, which is good, I am assuming that you are also in posession of a new RAID1 array.

Those RPC warnings are a bit of a worry, and could well be the problem.

Get ExMon running so you can see what is doing this traffic - it (hopefully) is one specific user -> http://www.msexchange.org/tutorials/Microsoft-Exchange-Server-User-Monitor.html

Also, as far as networking goes, also change the cables and switch ports.  Infact, if the server is any good, you could probably enable NIC Teaming and run BOTH network cards as one.

-red
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
no, i don't have a new raid 1 array.  haven't decided to that yet.  wanted to eliminate the possibility of bad nic.

i have run exmon and i posted the results above.  while there are few "power users" with high usage i am not sure one of them is responsible for the slow downs.  take a look above if you could and tell me what you think.

i will look into the nic teaming.  thanks
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
ran diags on both nics.  both passed.  enabled nic teaming.  ran exchange troubleshooter again.  here is the report:  looks like disk issues and alot of synchronizations (from exchange active sync and/or outlook cached mode?)

_____________________________________________________________

Performance Issues  
Area: Disk Drive and Exchange Data File Information  
 
Time Range: All  
 
  The transaction log files for storage group 'First Storage Group' do not have a dedicated drive Time Range: All
 The transaction log files for storage group 'First Storage Group' share drive D: with d:\program files\exchsrvr\mailroot\vsi 1\queue, d:\program files\exchsrvr\mdbdata\priv1.edb, d:\program files\exchsrvr\mdbdata\priv1.stm, d:\program files\exchsrvr\mdbdata\pub1.edb, d:\program files\exchsrvr\mdbdata\pub1.stm.
  Tell me more about this issue and how to resolve it.  
 
  SMTP server does not have a dedicated drive Time Range: All
 The queues for SMTP server 'Default SMTP Virtual Server' share drive D: with the following Exchange data files: D:\PROGRAM FILES\EXCHSRVR\MDBDATA\PRIV1.EDB, D:\PROGRAM FILES\EXCHSRVR\MDBDATA\PRIV1.STM, D:\PROGRAM FILES\EXCHSRVR\MDBDATA\PUB1.EDB, D:\PROGRAM FILES\EXCHSRVR\MDBDATA\PUB1.STM.
  Tell me more about this setting.  
 
Area: Disk Drive Health  
 
Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53  
 
  Logical disk performance issue on drive hosting SMTP server: Average write latency Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 SMTP drive: Average '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 10 (0.01 ms). The measured value is 0.016 (16 ms).
  Tell me more about this issue and how to resolve it.  
 
  Logical disk performance issue on drive hosting SMTP server: Maximum write latency Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 SMTP drive: Maximum '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 50 (0.05 ms). The measured maximum value is 0.064 (64 ms).
  Tell me more about this issue and how to resolve it.  
 
  Performance issue found on logical disk containing system page file Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 Page file drive: The average value for '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.01 (10 ms). The measured value is 0.016 (16 ms).
  Tell me more about this issue and how to resolve it.  
 
  TEMP drive write latencies are high Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 TEMP drive: The average value for '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.01 (10 ms). The measured value is 0.016 (16 ms).
  Tell me more about this issue and how to resolve it.  
 
  TEMP drive write latencies are high Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 TEMP drive: The maximum value for '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.05 (50 ms). The measured value is 0.064 (64 ms).
  Tell me more about this issue and how to resolve it.  
 
  TMP drive write latencies are high Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 TMP drive: The average value of '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.01 (10 ms). The measured value is 0.016 (16 ms).
  Tell me more about this issue and how to resolve it.  
 
  TMP drive write latencies are high Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 TMP drive: The maximum '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.05 (50 ms) for the TMP drive. The measured value is 0.064 (64 ms).
  Tell me more about this issue and how to resolve it.  
 
  Transaction log write latencies are high Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 Transaction log disk: The average value for '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.01 (10 ms). The measured value is 0.016 (16 ms).
  Tell me more about this issue and how to resolve it.  
 
  Transaction log write latencies are high Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 Transaction log disk: The maximum value for '\LogicalDisk(D:)\Avg. Disk sec/Write' should be less than 0.05 (50 ms). The measured value is 0.064 (64 ms).
  Tell me more about this issue and how to resolve it.  
 
Area: Network Usage  
 
Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53  
 
  Average 'Network Interface(Intel[R] Advanced Network Services Virtual Adapter)\Packets Outbound Errors' beyond error threshold Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 The average 'Network Interface(Intel[R] Advanced Network Services Virtual Adapter)\Packets Outbound Errors' is greater than 0 packets. The measured value is 1 packets.
  Tell me more about this issue and how to resolve it.  
 
Area: RPC Performance Counters  
 
Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53  
 
  Active RPC user activity Time Range: 10/12/2007 14:59:59 - 10/12/2007 15:04:53
 Since the RPC operations per second per user is greater than 0.15, the users are considered as 'moderately active'. The measured RPC operations per second/per user rate is 0.247.
  Tell me more about this issue and how to resolve it.  
 
Unclassified Items:  
 
MAPI Operation: FXSrcGetBuffer  
 
  High RPC synchronization operations MAPI Operation: FXSrcGetBuffer
 The Exchange Server User Monitor (ExMon) RPC data indicates that clients are performing synchronization operations. Synchronization operations for MAPI operation "FXSrcGetBuffer" account for 66.67% of the processor usage devoted to processing RPC requests.
  Tell me more about this issue and how to resolve it.  
 
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
Are the high sync operations normal considering the exchange active sync?  

I believe my next step is to get the two disks and setup a raid 1 then move the transaction logs to that drive?
0
 
MRR045Commented:
Your might consider moving your TEMP and TMP folders to a dedicated drive as well. I don't have the link in front of me that explains this. Do a search on Technet for "Move Temp and TMP Directories". It has to do with Exchange competing with I/O in these directories with other apps and services.
At the end of your log the "MAPI Operation: FXSrcGetBuffer" looks like the problem. You need to click on the "Tell me more about this issue and how to resolve it." and see what M$ recomends. I have resolved several performance issues by doing just that.
Another thing that causes high RPC is the MAPI connections. Your users might have multipe connections open and not realize it. I did the following to stop that:
Windows 2003 SP2 introduced a feature called scalable networking. Disabling it immediately fixed the problem of Outlook clients using too many sessions. To disable without rebooting enter the following at a command prompt:
netsh (hit enter) interface (hit enter) ip (hit enter) set chimney disabled (hit enter)
To permanently disable (reboot required), set the following registry value to 0:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableTCPChimney.
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
i did the above (disabled scalable networking) and i followed this ms article:

http://support.microsoft.com/?kbid=818484

did first one set it to 5

i still have a disk bottleneck, Network Outbound Packets beyond threshold and high user rpc activity.  

i have run exmon for a period of time and there are a couple of heavy email users who also have windows mobile devices syncing with exchange.

i still have 4 or mailboxes over a gig (barely).  

and i have the journal mailbox (with envelope journaling) which is a monster 17.5GB

we also have a staff calendar which gets alot of usage.  

i don't know its probably just all those things combined are causing the issues------but users aren't getting the RPC errors on their outlook clients anymore.  But the performance analyzer is still giving those three errors.

i think i might get 2gb more ram and add the two disks in a raid 1 and put the tmp/temp and the logs on them.  see what happens.

0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
one thing technet said for the outbound packet issue is this:

"Segment inter-server and global catalog traffic
When there is much traffic, and therefore overhead due to packet collision, you can improve network performance by separating inter-server and global catalog traffic from client traffic. You can do this by having servers and global catalogs with dual network adapters, and by building a separate network for the communication required by servers and global catalogs."


i'm not sure how to go about doing this?
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
i'm just getting exhausted with this.  maybe i will stop journaling for a minute and run the perf analyzer and see if that makes any difference.
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
i followed the ms article already and i'm still getting errors in the log that the server memory settings are not optimal for exchange

Event Type:      Warning
Event Source:      MSExchangeIS
Event Category:      General
Event ID:      9665
Date:            10/15/2007
Time:            4:06:04 PM
User:            N/A
Computer:      EXCHANGE
Description:
The memory settings for this server are not optimal for Exchange.

 For more information, click http://support.microsoft.com?kbid=815372

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 03 00 00 00               ....    
0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
Sorry so long posting back.  Crazy time of year around here.  I put the two scsi disks in and created a raid 1 array, moved the log files there and also put the temp folders on it.  I ran the perf analyzer and no disk bottlenecks but still the network interface issue as described in this link:

http://technet.microsoft.com/en-us/library/aa997363.aspx

maybe i will switch out both nic cards?  

0
 
David Scott, MCSENetwork AdministratorAuthor Commented:
Ok, UNCLE.  

I ran the troubleshooting assistant again during production hours and it still shows a disk bottleneck.  

so i tried moving the database file (not streaming) to the raid 1 as well and now the bottleneck is on the raid 1 array.  I am going to move the database file back to the raid 5 and leave the raid 1 with the log files.  

No one is complaining about issues and the rpc popups are no longer happening.  

I also added 2 more GB of additional RAM.

I am also still getting the network interface issue---i called Dell and they stated that the diagnostics builtin to the driver of the nics is accurate.  I ran that before and the cards passed all tests.  

I suppose since email performance seems good, I am not going to worry about the disk bottlenecks.  
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

  • 14
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now