iplanet webserver6.0 ns-httpd process is uitilizing 98% cpu

Posted on 2004-04-22
Medium Priority
Last Modified: 2013-11-21
Dear Expert,

I have installed iplanet 6.0 on sun sparc2.8. running java application. Some times my webserver process called ns-httpd uses 98% cpu utilization.  Please suggest me why this is happening and how to avoid this.

Thanks  in Advance
Anil Kumar K.
Question by:Kesdee
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 23

Accepted Solution

rama_krishna580 earned 200 total points
ID: 10947304
check here..


My first guess would be repeated recompilation of Java servlets. There is a setting that controls this (can't remember its name, maybe something like "development-mode"). This setting tells iWS how often to look for (and recompile) servlets. When running in pure development mode this setting is 0, meaning check for a new version of the servlet every time a request comes in and do a recompile. In pure production mode this setting is -1, meaning after startup and the initial compile, never look for a new version of the servlet and never do another recompile until the next restart. Other positive values are used to indicate the number of minutes between checks.

My second guess for high CPU utilization would be "thundering herd". If you have your RqThrottle set too high you get lots of %SYS utilization as the socket readers compete to ACCEPT the conenction.

Another thought would be to run Perfdump. It's defined in the iWS manual and there's an article on my website about how to set it up. I'd look at the ConnectionQueue, the KeepAliveInfo, the CacheInfo. The Performance Counters may show you what part of iWS or the app is consuming the CPU.

Another possibility is if iWS isn't configured to handle HTTP/1.1 (i.e. HTTP persistence). The KeepAliveInfo in perfdump should point that out as well as the access.log will show HTTP/1.0 instead of HTTP/1.1. This would require each new request to open a new socket connection which does consume lots of CPU.

If your app does lots of CGI processing, be sure the iWS CGI-thread is automatically spawned. Can't remember what the name of the setting is. Pretty sure it's in magnus.conf. It causes iWS to spawn another process or thread at startup. This process/thread handles CGI requests without having to spawn a new process as each new request arrives. I'm pretty sure you'd see this in perfdump under cgi-stats if you had it set wrong.

Also, in the perfdump display the file-bucket counter and the cache-hit-ratio would let you know if iWS is having to do lots of disk I/O to service the request. Doubt that would show up as CPU time but maybe.

Sometimes I find useful info using the tools in /usr/proc/bin. For example, "pfiles PID" or "pmap PID" or "pstack PID" may show something that looks abnormal. Hmmm, that makes me think of something else regarding Solaris, make sure you're using the new LWP thread library. I think it's in /usr/lib/lwp but not sure (check http://docs.sun.com). I think you have to include it near the front of your LD_LIBRARY_PATH. Hmmmm, also guess you should check your CLASSPATH also to make sure your JAR, JSP files are located near the beginning. Maybe iWS is having to look thru looks of large directory before it finds what it's looking for, but again that wouldn't show up as CPU time. (can you tell I'm starting to grasp at straws ;-)

Finally I'd resort to truss'ing the iWS process. use "ps -ef | grep http" to get iWS's PID. Then do "truss -p PID > /tmp/file 2>&1" and let it run for a minute. Then look thru /tmp/file and see if you some system call that is being repeated over and over. Pretty low-level troubleshooting but sometimes you see something that looks so abnormal you know it's got to be the problem.

Good luck, e-Ken

PS - I just re-read your question and see that this is a 2-CPU box. Check the context-switching counts. You can see the aggregate value in "CS" column under vmstat but I prefer the per-CPU view in the "CSW" and ICSW" columns under mpstat. If these numbers are high then your CPU usage is most likely thrashing. Context switching shouldn't be greater than 100-200 times the nbr of CPUs, so in your case that would be 200-400 on your 2-CPU box. If the high values are in the CSW column as opposed to the ICSW column that means you've got more processes/threads/lwp's running than your CPU's can handle. This will drive up the %SYS. If you see %SYS over 10% that's really bad and usually means thrashing. The LWP's are spending most of their time switching onto and off of the CPU and not very much time actually doing work. This may also manifest itself in a very high number of waiting and blocked processes in the "R B W" columns under PROCS in the vmstat display. My rule-of-thumb is no more than 2 times the nbr of CPUs in the R-B-W columns.

If you see any of these symptoms, check the number of threads/LWPs. You can see this value in the THR column if you run "top", but I prefer using the new "prstat" command. With this I use "prstat -L -p PID" b/c the -L will show each of the LWPs within the process. Also I sometimes use the "-v" option to see context switching (ICX) and CPU utilization by LWP.

If you see a large number of LWPs (sorry, don't have a good rule-of-thumb for what's consdiered "large"), then consider reducing the RqThrottle and checking the number of Procs in magnus.conf. If you think those values are set OK, then I'd suspect some bad java code. The usual culprit is a synchronized object (it's either a hashtable or a vector, one is good and one is bad and I can never remember which is which). But it's something that the programers put in their code to synchronize access to a critical resource (most often a table in memory). When you have this in the java code, the LWPs will have to line up behind each other to get to the syncrhonized resource. When the resource isn't available, rather than switch off the CPU the LWP may "spin" waiting for the resource to become available. (not sure what causes it to decide whether to context switch off the CPU or to spin). The spinning is called a Mutex spin (Mutex stands for Mutually Exclusive, I think). This value is shown in SMTX column in the mpstat column. Any value over 200-300 times the number of CPUs is considered high. Some apps that perform well have very high SMTX counts (Oracle is the one I' most familiar with) so don't assume high SMTX are bad; however, if it's homegrown code it usually is a bad sign if this column gets up to 1000 or more (last week we had SMTX up to 60,000 on an 8-CPU box and %SYS was 50% and *no* work was getting done ... funny to watch). The natural reaction when you see this slowdown is to add more processes or LWPs (i.e. increase RqThrottle) but all that does is put more things in the line awaiting the syncrhonized resource. This will just increase the CSW and SMTX. At this point, I go to my developers and ask them to scan their source code for a synchronized object and see if there is a way to implement the same goal using different logic.

best of luck..


Featured Post

ATEN's HDBaseT Presentation at InfoComm 2017

Hear ATEN Product Manager YT Liang review HDBaseT technology, highlighting ATEN’s latest solutions as they relate to real-world applications during her presentation at the HDBaseT booth at InfoComm 2017.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever sent email via ColdFusion and thought of tracking this mail to capture the exact date and time when the message was opened ?  If yes, then this article is for you ! First we need a table user_email with columns user_id , email , sub…
Periodically we have to update or add SSL certificates for customers. Depending upon your hosting plan you may be responsible for the installation and/or key generation. In the wake of Heartbleed many sites were forced to re-key. We will concen…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…
Suggested Courses
Course of the Month11 days, 14 hours left to enroll

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question