Throttling a process CPU consumption in Linux : HP-UX Redhat CentOS : something better than nice


Q1:
When a running process hogs a CPU, I would like to be able to reduce its % CPU consumption,
say to 50%.  Is there a command to do this or this is possible in CentOS, HP-UX & Redhat ?

I know if we started a process / script using "nice", say "nice ./scriptname", it will give the
process a lower priority (I think) so that other processes will get a higher priority to use
the CPU  but still a nice process will chew 100% of the CPU when no other processes
are using it, correct me if I'm wrong.

My purpose is so that we don't get a lot of alerts - having said this, it makes sense to
tune the monitoring tool's threshold but there's a lot of bureacracy involved in tuning &
it takes a while to get the tuning approved.  I'm looking for quick interim fix to reduce
the overwhelming alerts though this does not address the root cause.

The best idea I can find so far to achieve this is the link below to put it in the code as
follows but this is still short of a quick fix :
http://unix.derkeiler.com/Newsgroups/comp.unix.solaris/2006-10/msg01405.html

Q2:
One thought : can a running app (eg Websphere, Weblogic, Tomcat, Oracle instances)
be set such that it uses only a specific processor in a multiple processor environment :
the averaged value of all the processors can then show a lower average CPU utilization.
The other lower CPU hungry process can be shifted to one processor and those
savage ones restricted to 1 or 2 processors : I'm not sure if this manual reshuffling
is more efficient that letting the system manages it on its own but just soliciting views.
In some cases, I can't kill or restart resource hungry processes, just looking for ways
to mitigate the situation


Q3:
Can I say if "uptime" does not show the average load of above 0 (or 1?) then the
constant CPU hogging is not to be worried : what the "top" utility reports for high CPU
utilization is more of a CPU usage at that instant ?

sunhuxAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
cjl7freelance for hireCommented:
Hi,

A cpu works on a "on/off" scheme, the cpu is either ON or OFF. 100% CPU in itself isn't a bad thing!!!

What you need to watch for is the old trio of CPU/DiskIO/Memory.

If you have 100% CPU but free memory and no IO-wait then you are good. If you have 100% cpu and a lot of IO-wait then the cpu is waiting for the disks and that indicates that you have a disk-io problem (cpu is faster then the disk, which is the normal case...)

You shouldn't try to "shuffle load" around cores manually, this is what the operatingsystem is there for!

'top' is good enough for a "snapshot" view of the performance. 'sar' is a good way to get some better long-time stats about the system.

//jonas
0
sunhuxAuthor Commented:
Thanks chaps.

What NabeelM gave is quit neat.
I think the following command meets my need to limit a running process' CPU consumption:
   cpulimit -p 1313 -l 30

Can anyone reply whether cpulimit package/software (hope it's a freeware) is supported
on HP-UX & CentOS ?


Initially I thought of using a script to pause,resume, pause, resume, pause, .......
to average out a CPU hungry process but it's not that neat,  ie :

 kill -STOP pid-of-busy-process        # pause execution
sleep 0.1
 kill -CONT pid-of-busy-process        # continue execution
sleep 0.1
 kill -STOP pid-of-busy-process        # pause execution
sleep 0.1
 kill -CONT pid-of-busy-process        # continue execution
sleep 0.1
.........

but the above is not that neat
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

sunhuxAuthor Commented:

and can provide me a link / url to download a ready-to install & run (don't have to compile)
cpulimit package
0
sunhuxAuthor Commented:

Actually, I needed this for a HP-UX server (less critical for Linux though good to have).

Is there an equivalent for HP-UX or can the C source code be "make" or compiled
on HP-UX.

The codes from sourceforge follows :

==================================

 * Author:  Angelo Marletta
 * Date:    26/06/2005
 * Version: 1.1
 * Last version at: http://marlon80.interfree.it/cpulimit/index.html
 *
 * Changelog:
 *  - Fixed a segmentation fault if controlled process exited in particular circumstances
 *  - Better CPU usage estimate
 *  - Fixed a <0 %CPU usage reporting in rare cases
 *  - Replaced MAX_PATH_SIZE with PATH_MAX already defined in <limits.h>
 *  - Command line arguments now available
 *  - Now is possible to specify target process by pid
 */


#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/resource.h>
#include <string.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>

//kernel time resolution (inverse of one jiffy interval) in Hertz
//i don't know how to detect it, then define to the default (not very clean!)
#define HZ 100

//some useful macro
#define min(a,b) (a<b?a:b)
#define max(a,b) (a>b?a:b)

//pid of the controlled process
int pid=0;
//executable file name
char *program_name;
//verbose mode
int verbose=0;
//lazy mode
int lazy=0;

//reverse byte search
void *memrchr(const void *s, int c, size_t n);

//return ta-tb in microseconds (no overflow checks!)
inline long timediff(const struct timespec *ta,const struct timespec *tb) {
    unsigned long us = (ta->tv_sec-tb->tv_sec)*1000000 + (ta->tv_nsec/1000 - tb->tv_nsec/1000);
    return us;
}

int waitforpid(int pid) {
      //switch to low priority
      if (setpriority(PRIO_PROCESS,getpid(),19)!=0) {
            printf("Warning: cannot renice\n");
      }

      int i=0;

      while(1) {

            DIR *dip;
            struct dirent *dit;

            //open a directory stream to /proc directory
            if ((dip = opendir("/proc")) == NULL) {
                  perror("opendir");
                  return -1;
            }

            //read in from /proc and seek for process dirs
            while ((dit = readdir(dip)) != NULL) {
                  //get pid
                  if (pid==atoi(dit->d_name)) {
                        //pid detected
                        if (kill(pid,SIGSTOP)==0 &&  kill(pid,SIGCONT)==0) {
                              //process is ok!
                              goto done;
                        }
                        else {
                              fprintf(stderr,"Error: Process %d detected, but you don't have permission to control it\n",pid);
                        }
                  }
            }

            //close the dir stream and check for errors
            if (closedir(dip) == -1) {
                  perror("closedir");
                  return -1;
            }

            //no suitable target found
            if (i++==0) {
                  if (lazy) {
                        fprintf(stderr,"No process found\n");
                        exit(2);
                  }
                  else {
                        printf("Warning: no target process found. Waiting for it...\n");
                  }
            }

            //sleep for a while
            sleep(2);
      }

done:
      printf("Process %d detected\n",pid);
      //now set high priority, if possible
      if (setpriority(PRIO_PROCESS,getpid(),-20)!=0) {
            printf("Warning: cannot renice.\nTo work better you should run this program as root.\n");
      }
      return 0;

}

//this function periodically scans process list and looks for executable path names
//it should be executed in a low priority context, since precise timing does not matter
//if a process is found then its pid is returned
//process: the name of the wanted process, can be an absolute path name to the executable file
//         or simply its name
//return: pid of the found process
int getpidof(const char *process) {

      //set low priority
      if (setpriority(PRIO_PROCESS,getpid(),19)!=0) {
            printf("Warning: cannot renice\n");
      }

      char exelink[20];
      char exepath[PATH_MAX+1];
      int pid=0;
      int i=0;

      while(1) {

            DIR *dip;
            struct dirent *dit;

            //open a directory stream to /proc directory
            if ((dip = opendir("/proc")) == NULL) {
                  perror("opendir");
                  return -1;
            }

            //read in from /proc and seek for process dirs
            while ((dit = readdir(dip)) != NULL) {
                  //get pid
                  pid=atoi(dit->d_name);
                  if (pid>0) {
                        sprintf(exelink,"/proc/%d/exe",pid);
                        int size=readlink(exelink,exepath,sizeof(exepath));
                        if (size>0) {
                              int found=0;
                              if (process[0]=='/' && strncmp(exepath,process,size)==0 && size==strlen(process)) {
                                    //process starts with / then it's an absolute path
                                    found=1;
                              }
                              else {
                                    //process is the name of the executable file
                                    if (strncmp(exepath+size-strlen(process),process,strlen(process))==0) {
                                          found=1;
                                    }
                              }
                              if (found==1) {
                                    if (kill(pid,SIGSTOP)==0 &&  kill(pid,SIGCONT)==0) {
                                          //process is ok!
                                          goto done;
                                    }
                                    else {
                                          fprintf(stderr,"Error: Process %d detected, but you don't have permission to control it\n",pid);
                                    }
                              }
                        }
                  }
            }

            //close the dir stream and check for errors
            if (closedir(dip) == -1) {
                  perror("closedir");
                  return -1;
            }

            //no suitable target found
            if (i++==0) {
                  if (lazy) {
                        fprintf(stderr,"No process found\n");
                        exit(2);
                  }
                  else {
                        printf("Warning: no target process found. Waiting for it...\n");
                  }
            }

            //sleep for a while
            sleep(2);
      }

done:
      printf("Process %d detected\n",pid);
      //now set high priority, if possible
      if (setpriority(PRIO_PROCESS,getpid(),-20)!=0) {
            printf("Warning: cannot renice.\nTo work better you should run this program as root.\n");
      }
      return pid;

}

//SIGINT and SIGTERM signal handler
void quit(int sig) {
      //let the process continue if it's stopped
      kill(pid,SIGCONT);
      printf("Exiting...\n");
      exit(0);
}

//get jiffies count from /proc filesystem
int getjiffies(int pid) {
      static char stat[20];
      static char buffer[1024];
      sprintf(stat,"/proc/%d/stat",pid);
      FILE *f=fopen(stat,"r");
      if (f==NULL) return -1;
      fgets(buffer,sizeof(buffer),f);
      fclose(f);
      char *p=buffer;
      p=memchr(p+1,')',sizeof(buffer)-(p-buffer));
      int sp=12;
      while (sp--)
            p=memchr(p+1,' ',sizeof(buffer)-(p-buffer));
      //user mode jiffies
      int utime=atoi(p+1);
      p=memchr(p+1,' ',sizeof(buffer)-(p-buffer));
      //kernel mode jiffies
      int ktime=atoi(p+1);
      return utime+ktime;
}

//process instant photo
struct process_screenshot {
      struct timespec when;      //timestamp
      int jiffies;      //jiffies count of the process
      int cputime;      //microseconds of work from previous screenshot to current
};

//extracted process statistics
struct cpu_usage {
      float pcpu;
      float workingrate;
};

//this function is an autonomous dynamic system
//it works with static variables (state variables of the system), that keep memory of recent past
//its aim is to estimate the cpu usage of the process
//to work properly it should be called in a fixed periodic way
//perhaps i will put it in a separate thread...
int compute_cpu_usage(int pid,int last_working_quantum,struct cpu_usage *pusage) {
      #define MEM_ORDER 10
      //circular buffer containing last MEM_ORDER process screenshots
      static struct process_screenshot ps[MEM_ORDER];
      //the last screenshot recorded in the buffer
      static int front=-1;
      //the oldest screenshot recorded in the buffer
      static int tail=0;

      if (pusage==NULL) {
            //reinit static variables
            front=-1;
            tail=0;
            return 0;
      }

      //let's advance front index and save the screenshot
      front=(front+1)%MEM_ORDER;
      int j=getjiffies(pid);
      if (j>=0) ps[front].jiffies=j;
      else return -1;      //error: pid does not exist
      clock_gettime(CLOCK_REALTIME,&(ps[front].when));
      ps[front].cputime=last_working_quantum;

      //buffer actual size is: (front-tail+MEM_ORDER)%MEM_ORDER+1
      int size=(front-tail+MEM_ORDER)%MEM_ORDER+1;

      if (size==1) {
            //not enough samples taken (it's the first one!), return -1
            pusage->pcpu=-1;
            pusage->workingrate=1;
            return 0;
      }
      else {
            //now we can calculate cpu usage, interval dt and dtwork are expressed in microseconds
            long dt=timediff(&(ps[front].when),&(ps[tail].when));
            long dtwork=0;
            int i=(tail+1)%MEM_ORDER;
            int max=(front+1)%MEM_ORDER;
            do {
                  dtwork+=ps[i].cputime;
                  i=(i+1)%MEM_ORDER;
            } while (i!=max);
            int used=ps[front].jiffies-ps[tail].jiffies;
            float usage=(used*1000000.0/HZ)/dtwork;
            pusage->workingrate=1.0*dtwork/dt;
            pusage->pcpu=usage*pusage->workingrate;
            if (size==MEM_ORDER)
                  tail=(tail+1)%MEM_ORDER;
            return 0;
      }
      #undef MEM_ORDER
}

void print_caption() {
      printf("\n%%CPU\twork quantum\tsleep quantum\tactive rate\n");
}

void print_usage(FILE *stream,int exit_code) {
      fprintf(stream, "Usage: %s TARGET [OPTIONS...]\n",program_name);
      fprintf(stream, "   TARGET must be exactly one of these:\n");
      fprintf(stream, "      -p, --pid=N        pid of the process\n");
      fprintf(stream, "      -e, --exe=FILE     name of the executable program file\n");
      fprintf(stream, "      -P, --path=PATH    absolute path name of the executable program file\n");
      fprintf(stream, "   OPTIONS\n");
      fprintf(stream, "      -l, --limit=N      percentage of cpu allowed from 0 to 100 (mandatory)\n");
      fprintf(stream, "      -v, --verbose      show control statistics\n");
      fprintf(stream, "      -z, --lazy         exit if there is no suitable target process, or if it dies\n");
      fprintf(stream, "      -h, --help         display this help and exit\n");
      exit(exit_code);
}

int main(int argc, char **argv) {

      //get program name
      char *p=(char*)memrchr(argv[0],(unsigned int)'/',strlen(argv[0]));
      program_name = p==NULL?argv[0]:(p+1);
      //parse arguments
      int next_option;
      /* A string listing valid short options letters. */
      const char* short_options="p:e:P:l:vzh";
      /* An array describing valid long options. */
      const struct option long_options[] = {
            { "pid", 0, NULL, 'p' },
            { "exe", 1, NULL, 'e' },
            { "path", 0, NULL, 'P' },
            { "limit", 0, NULL, 'l' },
            { "verbose", 0, NULL, 'v' },
            { "lazy", 0, NULL, 'z' },
            { "help", 0, NULL, 'h' },
            { NULL, 0, NULL, 0 }
      };
      //argument variables
      const char *exe=NULL;
      const char *path=NULL;
      int perclimit=0;
      int pid_ok=0;
      int process_ok=0;
      int limit_ok=0;

      do {
            next_option = getopt_long (argc, argv, short_options,long_options, NULL);
            switch(next_option) {
                  case 'p':
                        pid=atoi(optarg);
                        pid_ok=1;
                        break;
                  case 'e':
                        exe=optarg;
                        process_ok=1;
                        break;
                  case 'P':
                        path=optarg;
                        process_ok=1;
                        break;
                  case 'l':
                        perclimit=atoi(optarg);
                        limit_ok=1;
                        break;
                  case 'v':
                        verbose=1;
                        break;
                  case 'z':
                        lazy=1;
                        break;
                  case 'h':
                        print_usage (stdout, 1);
                        break;
                  case '?':
                        print_usage (stderr, 1);
                        break;
                  case -1:
                        break;
                  default:
                        abort();
            }
      } while(next_option != -1);

      if (!process_ok && !pid_ok) {
            fprintf(stderr,"Error: You must specify a target process\n");
            print_usage (stderr, 1);
            exit(1);
      }
      if ((exe!=NULL && path!=NULL) || (pid_ok && (exe!=NULL || path!=NULL))) {
            fprintf(stderr,"Error: You must specify exactly one target process\n");
            print_usage (stderr, 1);
            exit(1);
      }
      if (!limit_ok) {
            fprintf(stderr,"Error: You must specify a cpu limit\n");
            print_usage (stderr, 1);
            exit(1);
      }
      float limit=perclimit/100.0;
      if (limit<0 || limit >1) {
            fprintf(stderr,"Error: limit must be in the range 0-100\n");
            print_usage (stderr, 1);
            exit(1);
      }
      //parameters are all ok!
      signal(SIGINT,quit);
      signal(SIGTERM,quit);

      //time quantum in microseconds. it's splitted in a working period and a sleeping one
      int period=100000;
      struct timespec twork,tsleep;   //working and sleeping intervals
      memset(&twork,0,sizeof(struct timespec));
      memset(&tsleep,0,sizeof(struct timespec));

wait_for_process:

      //look for the target process..or wait for it
      if (exe!=NULL)
            pid=getpidof(exe);
      else if (path!=NULL)
            pid=getpidof(path);
      else {
            waitforpid(pid);
      }
      //process detected...let's play

      //init compute_cpu_usage internal stuff
      compute_cpu_usage(0,0,NULL);
      //main loop counter
      int i=0;

      struct timespec startwork,endwork;
      long workingtime=0;            //last working time in microseconds

      if (verbose) print_caption();

      float pcpu_avg=0;

      //here we should already have high priority, for time precision
      while(1) {

            //estimate how much the controlled process is using the cpu in its working interval
            struct cpu_usage cu;
            if (compute_cpu_usage(pid,workingtime,&cu)==-1) {
                  fprintf(stderr,"Process %d dead!\n",pid);
                  if (lazy) exit(2);
                  //wait until our process appears
                  goto wait_for_process;            
            }

            //cpu actual usage of process (range 0-1)
            float pcpu=cu.pcpu;
            //rate at which we are keeping active the process (range 0-1)
            float workingrate=cu.workingrate;

            //adjust work and sleep time slices
            if (pcpu>0) {
                  twork.tv_nsec=min(period*limit*1000/pcpu*workingrate,period*1000);
            }
            else if (pcpu==0) {
                  twork.tv_nsec=period*1000;
            }
            else if (pcpu==-1) {
                  //not yet a valid idea of cpu usage
                  pcpu=limit;
                  workingrate=limit;
                  twork.tv_nsec=min(period*limit*1000,period*1000);
            }
            tsleep.tv_nsec=period*1000-twork.tv_nsec;

            //update average usage
            pcpu_avg=(pcpu_avg*i+pcpu)/(i+1);

            if (verbose && i%10==0 && i>0) {
                  printf("%0.2f%%\t%6ld us\t%6ld us\t%0.2f%%\n",pcpu*100,twork.tv_nsec/1000,tsleep.tv_nsec/1000,workingrate*100);
            }

            if (limit<1 && limit>0) {
                  //resume process
                  if (kill(pid,SIGCONT)!=0) {
                        fprintf(stderr,"Process %d dead!\n",pid);
                        if (lazy) exit(2);
                        //wait until our process appears
                        goto wait_for_process;
                  }
            }

            clock_gettime(CLOCK_REALTIME,&startwork);
            nanosleep(&twork,NULL);            //now process is working      
            clock_gettime(CLOCK_REALTIME,&endwork);
            workingtime=timediff(&endwork,&startwork);

            if (limit<1) {
                  //stop process, it has worked enough
                  if (kill(pid,SIGSTOP)!=0) {
                        fprintf(stderr,"Process %d dead!\n",pid);
                        if (lazy) exit(2);
                        //wait until our process appears
                        goto wait_for_process;
                  }
                  nanosleep(&tsleep,NULL);      //now process is sleeping
            }
            i++;
      }

}
0
cjl7freelance for hireCommented:
I find it hard to believe that a "manual" scheduler would be better than the provided one! If you are willing to restrict the usage of the cpu you are lowering the performance of this app! Why not use nice if it is ok to decline the performance?

Anyways on HP-UX there are tools to turn on or off cpu's, but the only valid use for it is licensing issues (IMO)
0
sunhuxAuthor Commented:


Right, technically, "renice" is a better option, but here I'm working around the bureaucracy that

a) that the alerts from the monitoring tool kept coming
b) retuning and recalibrating the monitoring tool's threshold (CA Unicentre) is not an option
c) any other process can out of the blue hog the CPU in the night & I'll need to be able to
    stop this alert;  in the past, we've killed & restarted the Websphere/Weblogic processes
    & it seemed to go well  after restarting  but this is not allowed anymore as it caused brief
    disruptions

Btw, is there something in HP-UX like "truss ./application_startup_script"  or
"trace ./application_startup_script"  which will give us sort of thread dump or
means to track why is the process chewing so much CPU?

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.