Exchange 2003 Performance Issues

I have a single Exchange 2003 server running on Windows 2003 Standard SP1. Exchange has SP2 loaded, and a couple of security fixes. (Including the one released last week). I have 80 mailboxes, with about 40 of those as daily active users. The server has 2 GB of RAM, and two RAID-5 arrays on 10k drives, one array holds the OS, pagefile, transaction logs, etc. The other holds the private mailbox database and the streaming file for it. The server is also a DC and a GC, but does not hold any of the FSMO roles. It also runs DNS, WINS, and DHCP.  Most of my users are on Outlook XP or 2003, but I have between 5-10 users on Entourage. (OS X) Recently, I have had many complaints of mail responding extremely slowly. Running some performance logs, I have found some interesting numbers, and I don't know what else to look at. The CPU utilization is extremely low 3-7 %. I have almost a gig of free memory, and pages and interrupts are in the low-normal range. The average disk write queue length for my disk holding the mailbox store is extremely high though. It is common to see numbers >250. This is on a 3 spindle RAID-5. Read queue length stays in the normal range, from about .05 - 1.5; Read and write queue lengths stay normal on the other array, in about the same range as read queue on the database disk. Naturally, performance of the entire system goes downhill when the write queue gets this backed up. The strange part is, it fluctuates regularly, but not on 1-2 second intervals as you might expect for a high-load server. More like 10 minute intervals. For 10 minutes, the system is slammed, and the queue backs up. Then, for 10 minutes there is almost no activity and everything is running great. (The mail queues never back up during this time, there is rarely more than 1 piece of mail in any queue, and there is no mail in the badmail folder) I have traced everything that I can think of to try to find where this activity is coming from. Filemon reports that it is indeed store.exe writing to priv1.edb causing the backup. Performance logs of active users, SMTP counters, ExchangeIS counters for RPC times only show the obvious, that RPC latency is bad when the server is lagging due to the backed up write queue. Exchange Mailbox counters show that user activity is not varying obviously between periods of good and bad performance. The performance analyzer also shows the obvious, that RPC latency is high. The best practice analyzer shows a couple of things, that the transaction logs and temp files are on the same drive as the pagefile; and that the /3GB switch is not set. I actually do have the /3GB switch set, although I have /USERVA=2900 instead of /USERVA=3030 because I was having difficulties running some things (Such as Remote Desktop inbound), and decided maybe the OS needed just a bit more memory (changing it to 2900 did fix my issues). The Exchange User Monitor displays varying server latencies, as performance fluctuates. From 0-3 ms when performance is good, to >2000 ms when performance is bad. The amount of data sent and received during these times, according to the user mon, is relatively unchanged. DCDIAG comes back clean, and my event logs are almost perfectly clean. I loaded and ran our Symantec Anti-Virus on the server to check for viruses, and loaded and ran Windows Defender to check for spy-ware etc. Both came back clean, so I removed them again, since they were causing more disk activity. (We have a Barracuda hardware anti-virus/spyware device in front of the Exchange server)

One thing worth mentioning, as it is very non-standard, is that there are a bunch of ports open to the internet. Aside from the standard SMTP and POP ports, there is 445 (DS), 88 (kerberos), 135 (DCOM), 3268 (GC), 389 (ldap), and 53 (DNS). This was configured before I arrived, (I have only worked here for 3 weeks). I have enabled RPC over HTTP and am in the process of moving remote users off of the old config (in place so users could access Exchange remotely through outlook). Once I get management off of this old config, I will close the ports. I have considered that we may have gotten hacked, and have some sort of malicous program running in the background, but I have yet to find any evidence to support that. There are no new processes that appear durring the high-load periods.

Does anyone have any idea what could be causing this up and down trend in disk writing? What other performance counters should I look to for more insight? Are there other settings that I have overlooked to further tune disk usage in exchange? Any ideas would be greatly appreciated!
Who is Participating?
SembeeConnect With a Mentor Commented:
RAID 5 is fine for the Exchange databases, that is what I normally configure.
What is bad is if the Exchange databases and the logs are on the same physical array. Both those parts are high transactional and will therefore be thrashing the drive. If the new server allows you to use at least two arrays then do so.

What you need to do is close all the holes in the firewall and then start to secure from behind that closure.
Ideally I would suggest closing everything, except 25 inbound (SMTP) and possibly http outbound and then see whether anything appears in the logs.

If you do a pure Exchange data migration then you shouldn't bring anything else across. Start changing passwords, particularly on the administrator and any other accounts that have administrator or domain admin rights. That includes any local administrator accounts. Look for odd accounts in the local user database and the domain database, particularly if they have high privileges. If in doubt, disable them and see what breaks.
You may also want to look at the security policy, particularly on the domain controllers and ensure that nothing silly has been set like anonymous having the ability to do things it shouldn't - like log on as a service.

Otherwise, get the data off and then wipe the thing.

If 135 was open to the internet, then you should presume the machine has been hacked.
If someone has got a rootkit on to the machine, then you wouldn't see evidence of any malware.
I would be scheduling time to swing the data over to the another server or workstation, so that the server can be wiped.

If you haven't got 135 closed, do it now. There are no valid reason to have the port open. I would also close the other ports as well, as they aren't required and just exposing the network.

Otherwise it sounds like the hard drives are being thrashed. First place I would look is the RAID controller configuration. Based on the firewall configuration it looks like some cowboy setup the network, so it wouldn't surprise me if the RAID card hasn't been configured correctly for a high transaction database like Exchange.

One other thing... you may want to consider breaking the text up - your question was very difficult to read.

itnaantiAuthor Commented:
Sorry about the format, I will be sure to add some line breaks in the future. It is a shame that I can't edit the post.

Anyway, I have closed off 135. I actually have a new server waiting to be installed, so I can change my plans a bit and migrate my Exchange data to that server, and then wipe the current Exchange server.

Is there any risk in migrating my current Exchange data? Would it be possible to accidentally move a rootkit exploit over to the new server with the data? Additionally, what is the risk of the rest of my servers being compromised? If there is a rootkit on my DC, would it be possible for the exploiter to have the domain user list, passwords, etc? What I am wondering, is do I need to wipe all of my servers systematically while disconnected from the internet, or would I be okay to just wipe the Exchange server. Having to reformat all 5 of my current production servers could be a serious hassle!

Prior to going through with migration and incineration, is there anything else to consider? Are there any other potential explanations to the high write-queue length coming from Exchange? I have read somewhere that Macs can cause high load, because of a MAPI conversion. Some of the Mac users have very large inboxes. (>5GB) (Remember, I just started working here, don't blame me for lack of quotas!) I am looking for some other explanation, prior to spending my weekend migrating a server because of a problem that I am not certain even exists, and have no real way of proving that it exists to any degree of certainty.

As for the RAID configuration, we have a HP Smart Array 5i, so there isn't much I can configure. There isn't even a BB Write Cache module available for it. I realize RAID-5 isn't the best choice for Exchange, but we don't have the room in the server to do 10.

If I do migrate though, the new server does have a better RAID card, with 128mb BB write cache. That alone could help some of the issue, even if there is no root kit on my server.
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

itnaantiAuthor Commented:
The new server will have two arrays, arranged as you suggest.

One more quick Q: To move Exchange, I have KB822945 in hand, but it doesn't spell out the procedure for when your box is also a DC. Should I:

1) DCPROMO down the current Exchange box.
2) Disconnect it from the network.
3) Connect my new box, join it to the network, DCPROMO it up.
4) Proceed with the KB article on restoring Exchange.

I will be performing the aforementioned process tomorrow morning, provided that my backups run smoothly tonight.

The backup / restore method is the highest risk method for doing a migration. I actually refuse to do them.
You would be much better doing a swing migration. Even in this case I would still do a swing migration.

You should never change the role of a server once Exchange is installed, so the DCPROMO part cannot be done.
You do know that it is best practise NOT to install Exchange on to a domain controller? Considering the background to this question I would serious advise you not to install Exchange on to a DC.

itnaantiAuthor Commented:
Thanks for the comments. I performed a forklift migration over to the new server, and it is all up and running now. I also reset all of my admin passwords, and removed some members of upper-management from the admin group until I can have them reset their passwords. I also locked down outbound connections on our firewall. I am only permitting 80, 443, 21, and udp53 right now. I will check the logs later and see if anything interesting happened. I also did not DC the new exchange server. I do have one question though, I have ~75 active users, and 5 servers. The servers are for: File Sharing, Accounting, Exchange, Backup, WSUS/print shares/install point/SAV, and TS. Currently, the only remaining DC is the Accounting server, which does not have many users on it, but does have database apps, and some file shares on it, in addition to being a DC. What is a best-practice recommendation for where the DCs should be? (I don't really have the budget for deticated DC's) I am planning on promoing the backup server for now, since it has nothing on it other than B.U. Exec - but I would like to move it to a different server later, since it is really handy to be able to work on the backup server in the middle of the day without affecting users.
You have to look at what services are compatible with being on a DC.

Exchange is one that should be on a dedicated server.
Terminal Services should not be a DC because users are logging in to the actual machine and would be able to see the AD management tools.

Otherwise use the server that has the lowest use. I will often use file and print servers for DC roles.
Although apart from file sharing, in many cases I will get a dedicated DC and then look at what else it can do (rather than the other way round). One of my common tricks is to put WSUS on to a domain controller.

itnaantiAuthor Commented:
Now that I have had a chance to see the new server in action, I know a bit more information. First, the problems that the users have been complaining about have subsided. Mail is responding faster, and the "Waiting for information from Exchange server" dialog pops up less often.

The write-disk queue length on the mailstore volume has not subsided though, I still get very large numbers in the write queue length. I suppose the write cache and slightly faster speed on the new server are helping the users issues, but I am still seeing the underlying problem that I believe was causing my speed issues in the first place.

I found one other bit of information though; Entourage is syncing on a 10 minute cycle! That would explain my 10 minute on, 10 minute off problem. I am going to look into this a little further, and see if removing my Entourage clients fixes the issue.
itnaantiAuthor Commented:
Although my problem is not completely fixed, I am on a warpath now, and I think I know what to do from here. Wiping the server was a good idea. Even though my problems were not root-kit related, I could have had a root kit anyway that could have caused problems down the road, and now I know that I don't have one. Thanks for all of your help! I will try to update this thread when I find an Entourage solution.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.