Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 743
  • Last Modified:

iplanet webserver6.0 ns-httpd process is uitilizing 98% cpu

Dear Expert,

I have installed iplanet 6.0 on sun sparc2.8. running java application. Some times my webserver process called ns-httpd uses 98% cpu utilization.  Please suggest me why this is happening and how to avoid this.

Thanks  in Advance
Anil Kumar K.
1 Solution
check here..

My first guess would be repeated recompilation of Java servlets. There is a setting that controls this (can't remember its name, maybe something like "development-mode"). This setting tells iWS how often to look for (and recompile) servlets. When running in pure development mode this setting is 0, meaning check for a new version of the servlet every time a request comes in and do a recompile. In pure production mode this setting is -1, meaning after startup and the initial compile, never look for a new version of the servlet and never do another recompile until the next restart. Other positive values are used to indicate the number of minutes between checks.

My second guess for high CPU utilization would be "thundering herd". If you have your RqThrottle set too high you get lots of %SYS utilization as the socket readers compete to ACCEPT the conenction.

Another thought would be to run Perfdump. It's defined in the iWS manual and there's an article on my website about how to set it up. I'd look at the ConnectionQueue, the KeepAliveInfo, the CacheInfo. The Performance Counters may show you what part of iWS or the app is consuming the CPU.

Another possibility is if iWS isn't configured to handle HTTP/1.1 (i.e. HTTP persistence). The KeepAliveInfo in perfdump should point that out as well as the access.log will show HTTP/1.0 instead of HTTP/1.1. This would require each new request to open a new socket connection which does consume lots of CPU.

If your app does lots of CGI processing, be sure the iWS CGI-thread is automatically spawned. Can't remember what the name of the setting is. Pretty sure it's in magnus.conf. It causes iWS to spawn another process or thread at startup. This process/thread handles CGI requests without having to spawn a new process as each new request arrives. I'm pretty sure you'd see this in perfdump under cgi-stats if you had it set wrong.

Also, in the perfdump display the file-bucket counter and the cache-hit-ratio would let you know if iWS is having to do lots of disk I/O to service the request. Doubt that would show up as CPU time but maybe.

Sometimes I find useful info using the tools in /usr/proc/bin. For example, "pfiles PID" or "pmap PID" or "pstack PID" may show something that looks abnormal. Hmmm, that makes me think of something else regarding Solaris, make sure you're using the new LWP thread library. I think it's in /usr/lib/lwp but not sure (check I think you have to include it near the front of your LD_LIBRARY_PATH. Hmmmm, also guess you should check your CLASSPATH also to make sure your JAR, JSP files are located near the beginning. Maybe iWS is having to look thru looks of large directory before it finds what it's looking for, but again that wouldn't show up as CPU time. (can you tell I'm starting to grasp at straws ;-)

Finally I'd resort to truss'ing the iWS process. use "ps -ef | grep http" to get iWS's PID. Then do "truss -p PID > /tmp/file 2>&1" and let it run for a minute. Then look thru /tmp/file and see if you some system call that is being repeated over and over. Pretty low-level troubleshooting but sometimes you see something that looks so abnormal you know it's got to be the problem.

Good luck, e-Ken

PS - I just re-read your question and see that this is a 2-CPU box. Check the context-switching counts. You can see the aggregate value in "CS" column under vmstat but I prefer the per-CPU view in the "CSW" and ICSW" columns under mpstat. If these numbers are high then your CPU usage is most likely thrashing. Context switching shouldn't be greater than 100-200 times the nbr of CPUs, so in your case that would be 200-400 on your 2-CPU box. If the high values are in the CSW column as opposed to the ICSW column that means you've got more processes/threads/lwp's running than your CPU's can handle. This will drive up the %SYS. If you see %SYS over 10% that's really bad and usually means thrashing. The LWP's are spending most of their time switching onto and off of the CPU and not very much time actually doing work. This may also manifest itself in a very high number of waiting and blocked processes in the "R B W" columns under PROCS in the vmstat display. My rule-of-thumb is no more than 2 times the nbr of CPUs in the R-B-W columns.

If you see any of these symptoms, check the number of threads/LWPs. You can see this value in the THR column if you run "top", but I prefer using the new "prstat" command. With this I use "prstat -L -p PID" b/c the -L will show each of the LWPs within the process. Also I sometimes use the "-v" option to see context switching (ICX) and CPU utilization by LWP.

If you see a large number of LWPs (sorry, don't have a good rule-of-thumb for what's consdiered "large"), then consider reducing the RqThrottle and checking the number of Procs in magnus.conf. If you think those values are set OK, then I'd suspect some bad java code. The usual culprit is a synchronized object (it's either a hashtable or a vector, one is good and one is bad and I can never remember which is which). But it's something that the programers put in their code to synchronize access to a critical resource (most often a table in memory). When you have this in the java code, the LWPs will have to line up behind each other to get to the syncrhonized resource. When the resource isn't available, rather than switch off the CPU the LWP may "spin" waiting for the resource to become available. (not sure what causes it to decide whether to context switch off the CPU or to spin). The spinning is called a Mutex spin (Mutex stands for Mutually Exclusive, I think). This value is shown in SMTX column in the mpstat column. Any value over 200-300 times the number of CPUs is considered high. Some apps that perform well have very high SMTX counts (Oracle is the one I' most familiar with) so don't assume high SMTX are bad; however, if it's homegrown code it usually is a bad sign if this column gets up to 1000 or more (last week we had SMTX up to 60,000 on an 8-CPU box and %SYS was 50% and *no* work was getting done ... funny to watch). The natural reaction when you see this slowdown is to add more processes or LWPs (i.e. increase RqThrottle) but all that does is put more things in the line awaiting the syncrhonized resource. This will just increase the CSW and SMTX. At this point, I go to my developers and ask them to scan their source code for a synchronized object and see if there is a way to implement the same goal using different logic.

best of luck..

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now