Performance tuning in Linux

My email server (sendmail 8.10.1 on Redhat 6.1) somtimes behaves very slowly.  Currently, I just use "top" to look at the statistics.  Is there any tool that can tell me more specifically which process is the culprit?
Who is Participating?

Improve company productivity with a Business Account.Sign Up

linuxwranglerConnect With a Mentor Commented:
Please be more specific. Does slowly mean that when a client attempts to send mail the request takes a long time or times out or do you mean that mail is queued and not delivered for a long time?

Also, do you notice any other attributes (same time of day, after sendmail has been running for 28 hours, only a particular client complains, etc.)

Also, check /var/log/maillog and see what messages are being sent. Are a bunch of huge messages being sent when the machine slows down?

What else is running on the server. If users are popping mail off the server check /var/mail (or /var/spool/mail depending on distro) and look for the sizes of each mailbox. If you find unusually large ones make sure that user is deleting mail from the server when they pop. Pop can suck processor/disk when scanning a large mailbox and if a few users hit it at the same time it spells trouble.

Note, the auto-check feature also forces pop to scroll through the mailbox to determine if there is new mail. If you get a bunch of users setting the check-mail time to a minute or two you can really load up the server.
kevintsangAuthor Commented:
I'm still trying to relate different cases of problem and look for commonality in terms of users, clients, date/time with reference to the /var/log/maillog and /var/log/messages.  Let me put the case this way.  There are times when mobile user try to connect to our mail server through the internet from different countries (through iPass service), Outlook shows that it is downloading the messages but it actually stop there for a very long time even though there are just few small messages in the mail box.  As I check with netstat, the connection is made.  When we forwarded the mails from the mailbox to the mail server of the local ISP, the user gets all the mails without any problem.  This is an intermittent problem.  So far, it seems like only a few users report such a problem.
On the other hand, this may or may not be related to the problem above, our mail server, especially during office hours, occassionally shows high load average value of 3-6.  Any actions or commands performed in the mail server in a telnet session response very slowly.  When I go through the "top" processes, I cannot tell which process(es) is causing the slowness.
That's why I'm asking if there is any handy tool that I can install in the server to tell me what could be causing the slowness.
qpopper and some other perl scripts are also running periodically in the same server.  These program do suck up a lot of processing power when they are run.  But they don't cause the slowness everytime though.
There is no single tool that I know of that will tell you exactly what to do - only troubleshooting techniques using a variety of tools. (Thank goodness - otherwise we sysadmins would be in trouble :>)

Are you able to observe these events or are they only reported later? If you are near the server you can sometimes get a "gut-level" idea. For example, when the system is slow is the disk thrashing? Also, is the slowness only reported by users from outside the country?

How much RAM do you have and what do the memory and swap numbers (near the head of the "top" display) tell you? Also, what is your CPU speed?

3-6 feels like a bit of a high load average (what is your cpu speed?) I used to run a mail-server with only 50 clients but frequently handling messages > 10 Meg each and > 100 Meg/day. This was on a 90MHz Pentium and I rarely saw the average approach 1. At the same job I ran a 1/4 Terrabyte Linux/Samba server on a 350MHz machine with a typical file size being copied to/from the machine of 20-500MB and it, too, had very low loads.

Having said that, load averages are one measure but only one. My home machine runs the RC5 cracking program during all idle times so the average is always almost exactly 1.0 and performance is great. A single process can take all the machines resources and a gob of little ones can drive up the load-average.

So what to do?
Watch more stats including for starters:
cpu utilization
memory and swap utilization

Also, keep narrowing the problem - if local users can continue to work without delay at the same time that remote users experience slowness then you probably have a network-related problem rather then a resource-based one. In that case a whole new set of tools will be pressed into use including:
traceroute (in fact, if you have a slow user "on-the-line", try a traceroute back to their IP address)

Oh, another question as my brain churns: what is the exact connection when the users experience problems? Modem-ISP-internet-you, etc.?
kevintsangAuthor Commented:
I believe your first comment already identified the cause of the problem.  In the past two days, I found that there were two users' mailboxes contain more than 200Mb.  Soon after I trimmed them, so far I've not seen frequent high load average value from top.
I found that it's when the user.pop file is written back to the user mailbox that causes high load average value.

From, I found that the server mode setting can deal with large mailbox popping.
Setting disk quota for each mailbox can prevent this from happening again.
But from a system point of view, is there any tool that can dynamically (with high low threshold) allocate system resources (CPU, RAM, Disk,etc) for each process.
If this can be done, I can make sure popper can use up only 70% max of the CPU power when requires, leaving 30% for other essential processes.  Looks like this kind of feature only exists in commercial version of UNIX, such as HP-UX.

For your info, my mail server is a HP LH4 with 600Mhz CPU and 512Mb RAM and 2 x 18Gb HD for approximately 600 mailboxes.
Check the "nice" command and at setting priority levels. Linux does do this and it is actually better than reserving a set % since you really want as close to 100% resources used to run your process unless something else needs them. I am running the RC5DES cracking program on one of my machines and don't notice even though the average CPU usage is >99% since anything I do is handled with more priority than the RC5 program gets.

Disk space, as you've noted, can be controlled with quotas (but fair warning - those users are gonna scream that they really NEED 200MB for mail).

What I want to look into is limiting memory usage by a user process - I don't know how it is done and I haven't researched it but have recently run into some situations where it would be useful.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.