?
Solved

HP Ux Superdome 100% CPU utilization and memory Utilization

Posted on 2004-11-17
12
Medium Priority
?
3,408 Views
Last Modified: 2013-12-06
Hi

I am running a HP UX 11.x Superdome 4 CPU 16 GB Memory in a Cluster Mode (2 Superdomes in Cluster - One running the App Server and 1 running the DB server). 4 GB Ram is for Oracle. I am running a banking app and the Database is Oracle 9i. I also use Apache and Resin.

The number of logged in users are about 150 (peak). The Memory and CPU utilizations hit the roof and hovers between 85% to 100%. How to go about diagnosing and how to sort out this. For the type of load this looks absurd.

Further the resin part is also a bit confusing with resin also appearing to be slow in startup and is there a standard config guidelines available for Configuring and installing Resin on HP Ux 11.x

Regards

Surya
0
Comment
Question by:suryapadma
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 21

Expert Comment

by:tfewster
ID: 12604231
To start from basics:
For CPU utilisation, run `top` to see the heaviest CPU users; Trace the "CPU hog" processes back to their parents, and decide if the CPU usage is "reasonable";

Also look at the load averages `top` shows you; If possible, post the `top` output.

To check process memory usage, do:
UNIX95= ps -eo uid,pid,ppid,pcpu,state,sz,vsz,time,comm |sort -rnk7 |more

Of course, the App and DB systems will have very different profiles and will need tuning differently.

Start `sar` logging running so you can build up a picture of resource usage over a few days.
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12610765

 Follow tfewster to use top or ps to find out what processes are eating up all your
system resource.

It is posible to see a box use up all it CPU and RAM, it depands on what it is doing.
eg.
some of my boxes are running Neural Network simulations at the time, and in most
of the case 99.99% CPU time + 100% RAM are used.

find out the cause of the problem first and then decide what to do, hareward upgrade, or
split work load to more boxes, or software bug, need patches.

0
 
LVL 62

Expert Comment

by:gheist
ID: 12621060
try $ vmstat 1 10
to identify if bottleneck is i/o wait or cpu brainpower.
depending on result there are many ways to improve before buying expensive hardware

then try sync ; sync ; vmstat 1 10
if this improves wait time - your system is simply swapping, teach your apps to use less memory .

etc etc
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 21

Expert Comment

by:tfewster
ID: 12679222
Oops, just noticed this response from Surya that was posted as feedback instead of a comment:
(is anyone familiar with configuring/tuning Resin?)

      Author: suryapadma
      Date: 11/17/2004 02:48PM GMT
      View Source Question
      Post a Reply to this feedback
      I have got the stats collected using Glance and also OV. I have the stats as a file also. I need some help to drill down on to the problem further and get it sorted out. I have Resin 2.0.5 also running and this is also hogging memory and may be CPU also. Resin takes a lot of time to come up and in a Cluster Switch Over fails. Do you have any inputs on Resin and its Installation and Tuning on HP UX. I can send you the log files, if you can give me an Id where I can send them.

      Regards

      Surya
0
 
LVL 1

Accepted Solution

by:
sirjb earned 2000 total points
ID: 12760636
Hi  suryapadma,

From reading your posts I see 3 questions
1. HIGH CPU utilization in general
2. HIGH MEMORY utilization in general
3. Application Resin is using high CPU utilization.
4. Cluser Switch over fails with resin

Performance issue is not easy and will not be be solved simply using one or two tools.
First rule. If no one is complaining about the performance, don't try to do / change too much.
(you complained about the resin which i assume is running on the second superdome but you didn't complain about the performance of the oracle database server)

Second:
High CPU utilization is not neccessary a performance issue. To say it simply, if a process get all the resouces it demands ( allocate enough memory, IO from disk etc ) then it will use up to 100% CPU to try to run ( calculate ) the process and try to finish it as soon as possible. If a process starts up and runs 100% cpu for a few seconds or minutes then disappear, then it sounds ok to me. But if 1 single process keeps running 100% CPU for a long time, then it can be cause by software failure or other reason.
High Memory utilization can be ok too. Check if you have enough ( or too much swap ), check SWAP utilization, page out frequency, IO on the swap devices etc.
As long as you have enough swap, NO PAGE OUT!!!, and specially when no users are complaining, you don't have to worry ! :)

Third: collect all relevate informatie !
Some idea's
use glance to check global CPU, Memory, but also disk IO's swap utilization.  Check the size of global shared mem,  numbers of nfiles, Page put, system calls, global WAIT stats / threads. Check single process CPU/memory/IO, activities per CPU (maybe one cpu is broken?! ), check activity in file system (LVM), PV, NFS? network utilization. All can be done and view within Glance!
using glance you can determinate if the CPU usage is used by system mode (kernel) or usermode (outside the kernel).

If this performance issue is temperaray and only happens within a certain time frame. Use perfview ( measureware) to collect data's  and put it ina graph to find out a  trance!

Check syslog for any unusal warnings
Check EMS for any hardware warning
use netstat to check any hanging open connections
check the kernel setup (kmtune). check the value of maxuers, maxfiles, maxdsiz,maxssiz maxtsiz, max_thread_proc, nfiles,nproc, nkthread etc etc etc........
use lsof (list open files) to track down what a process ( the one claiming 100% cpu / mem constantly ) is actually doing.
It's not a standard HP software but you can found one from http://hpux.connect.org.uk/hppd/hpux/Sysadmin/lsof-4.73/man.html
Check the latest quality package of HP. I will advice to install the latest half year of HW and quality packages. ( don't install the newest one unless you have no chooise for example to solve some typical problem, a quality packages of at least 6 months old will provide enough stability and it's almost 100% safe to install it with no headache)

So far, these are checks are only usefull if the performance issue is related to HP-UX OS.
if you have issue with oracle, grab a DBA and ask hem to do trouble shooting together with you, then try the following
check the set up of oracle
does the HP kernel confirm the requirement of a typical version of oracle? check metalink for it.
check the global logs for oracle
check the listenaar logs
check others like data framentations in a datafiles, check buffer cache hit rates, check numbers of concurrent users etc etc

do the same for other applicaion like resin. Read the install doc for resin, for example
http://www.sia12.net/hp_docs/getting.started  (good one)
http://www.caucho.com/download/resin-install.pdf
http://cocoon.apache.org/2.1/installing/#Installing+on+Resin+2.x
To begin with check the base configuration.

Anyway too much to mentions.

About you specific problems.
1 and 2. first collect all the needed information as describle above to determinate if it's a global issue or single process / applicaiton. Hardward or only software relative. Is it a concurrent problem or only happens between 9-17 hours??? gather more information and post it here, we will help you.

3. resin is slow / using high CPU utilization.
give us the result of checking configuration, show us more the configurations of you HP-OS like kernel settings etc.
Do a lsof on the busy process. Maybe it's relative to apache. Start the application but with no user loads and see if the problems still exits. ( try to start it without MC serviceGuard)

4. Cluster switch failed. You have to show us the entire configuration of MC/ServiceGuard.
MCSG is very powerfull but need to be configure carefull. 1 single small mistake and it will fail to start.
Collect the error log from serviceguard with the configuration of the node / packages and post it here please!


Just go and try out a bit, post your questions in details in and see if we can help more.
If you like to learn more about HPUX / oracle setups, i have a few cookbook ready for you.
Post me and give me your mail address and i can mail them to you. but they are but , like a couple of Mg each :)) FAT cookbook.

Hope this can guide you to the right direction !!

Good luck

JB
0
 
LVL 21

Expert Comment

by:tfewster
ID: 12913248
Points to sirjb for good generic info on investigating performance problems; Without detailed feedback it would be impossible to diagnose the real problem, but sirjb's suggestions will go long way to finding the problem.

Incidentally, I'd be interested in seeing JBs "cookbooks"; I have a few links to HP-UX performance troubleshooting docs (Some of which require an account on HPs ITRC and, naturally, there is a great deal of overlap...)
HP-UX Performance Cookbook: http://www.interex.org/pubcontent/enterprise/nov01/grumann.jsp
Basics of Poor System Performance: http://www4.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000063203050
Determining the cause of system performance problems: http://www4.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000063210674
System Performance Tuning Guide: http://www4.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000073346765
0
 
LVL 1

Expert Comment

by:sirjb
ID: 12916652
Hi,

The link to the HP_UX Performance Cookbook posted by tfewster is good BUT that document is outdated.
The original and the latest version is:

http://h21007.www2.hp.com/dspp/files/unprotected/devresource/Docs/TechPapers/UXPerfCookBook.pdf  by Stephen Ciullo and Doug Grumann 27th May 2003
This is the latest one !

To tfewster:
I have some e-books for specific subjects like MC/SG, LVM in combination of SAN storage for like XP's, VA's etc etc
What are you looking for? or in what area?
PS: thanks for the comment

Regards,

JB
0
 
LVL 62

Expert Comment

by:gheist
ID: 12917524
most likely problem is with JVMs, and one has to renice and tune it if ir interferes with normal processes ( thus accounting etc)
0
 
LVL 21

Expert Comment

by:tfewster
ID: 12918770
JB - Thanks for the link to the updated Performance Tuning pdf - I searched the ITRC for "cookbook" and found an MC/Serviceguard one as well, but if you have anything specific on tuning HP-UX for Oracle I'd be interested. My email address is in my profile (but posting a link would be better, so everyone can see it ;-)

0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you do backups in the Solaris Operating System, the file system must be inactive. Otherwise, the output may be inconsistent. A file system is inactive when it's unmounted or it's write-locked by the operating system. Although the fssnap utility…
Why Shell Scripting? Shell scripting is a powerful method of accessing UNIX systems and it is very flexible. Shell scripts are required when we want to execute a sequence of commands in Unix flavored operating systems. “Shell” is the command line i…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses

579 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question