Abnormal CPU utilazation on AWS Ubuntu server

I have an Ubuntu OS server running on AWS. The size is m4.2xlarge (8 Cores and 32 GiB of memory) and the server is not in production yet but it does have abnormal CPU Utilization and varies for no reason. I'm using RunCloud (https://runcloud.io/) for the server management. The following is a week-long graph on CPU utilization. The peaks is a cron job running so that is as expected but the fact that the CPU utilization varies I can't figure out why. I have already disabled the cron jobs which does not solve this problem. I have run out of ideas to check why I get this abnormal CPU utilization. Can someone help me?


Here is what it looks like right now when using the top command
LVL 1
GerhardpetAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dfkeCommented:
Hi,

I noticed 219 zombie processes on your server?

Can you issue the following command to see if there really are zombie processes. If yes then you should start cleaning them out as they might hog up performance:

ps aux | awk '"[Zz]" ~ $8 { printf("%s, PID = %d\n", $8, $2); }'

Open in new window

Cheers
0
GerhardpetAuthor Commented:
Yes it looks like it. Here is what I get. I'm not a power user for Linux so not sure how to clean them up. I already did reboot the server which did not solve it
ubuntu@utilex:~$ ps aux | awk '"[Zz]" ~ $8 { printf("%s, PID = %d\n", $8, $2); }'
Z, PID = 558
Z, PID = 612
Z, PID = 621
Z, PID = 976
Z, PID = 996
Z, PID = 1077
Z, PID = 1364
Z, PID = 1473
Z, PID = 1490
Z, PID = 1755
Z, PID = 1847
Z, PID = 1915
Z, PID = 2315
Z, PID = 2320
Z, PID = 2462
Z, PID = 2720
Z, PID = 2727
Z, PID = 2821
Z, PID = 3073
Z, PID = 3188
Z, PID = 3201
Z, PID = 3446
Z, PID = 3531
Z, PID = 3690
Z, PID = 3885
Z, PID = 3932
Z, PID = 4060
Z, PID = 4310
Z, PID = 4353
Z, PID = 4533
Z, PID = 4754
Z, PID = 4818
Z, PID = 4922
Z, PID = 5131
Z, PID = 5260
Z, PID = 5388
Z, PID = 5541
Z, PID = 5616
Z, PID = 5763
Z, PID = 5940
Z, PID = 5997
Z, PID = 6129
Z, PID = 6318
Z, PID = 6468
Z, PID = 6582
Z, PID = 6691
Z, PID = 6865
Z, PID = 6956
Z, PID = 7176
Z, PID = 7213
Z, PID = 7352
Z, PID = 7515
Z, PID = 7590
Z, PID = 7797
Z, PID = 7858
Z, PID = 8068
Z, PID = 8247
Z, PID = 8320
Z, PID = 8494
Z, PID = 8599
Z, PID = 8715
Z, PID = 8873
Z, PID = 8964
Z, PID = 9088
Z, PID = 9289
Z, PID = 9418
Z, PID = 9511
Z, PID = 9670
Z, PID = 9818
Z, PID = 9888
Z, PID = 10092
Z, PID = 10245
Z, PID = 10318
Z, PID = 10561
Z, PID = 10679
Z, PID = 10700
Z, PID = 10928
Z, PID = 11031
Z, PID = 11153
Z, PID = 11287
Z, PID = 11414
Z, PID = 11523
Z, PID = 11676
Z, PID = 11884
Z, PID = 11917
Z, PID = 12120
Z, PID = 12260
Z, PID = 12367
Z, PID = 12375
Z, PID = 12514
Z, PID = 12607
Z, PID = 12716
Z, PID = 12759
Z, PID = 12944
Z, PID = 13036
Z, PID = 13129
Z, PID = 13364
Z, PID = 13504
Z, PID = 13571
Z, PID = 13721
Z, PID = 13869
Z, PID = 13953
Z, PID = 14119
Z, PID = 14190
Z, PID = 14273
Z, PID = 14321
Z, PID = 14502
Z, PID = 14589
Z, PID = 14635
Z, PID = 14703
Z, PID = 14751
Z, PID = 14871
Z, PID = 15017
Z, PID = 15109
Z, PID = 15205
Z, PID = 15364
Z, PID = 15494
Z, PID = 15529
Z, PID = 15795
Z, PID = 15908
Z, PID = 16098
Z, PID = 16187
Z, PID = 16348
Z, PID = 16365
Z, PID = 16408
Z, PID = 16482
Z, PID = 16726
Z, PID = 16738
Z, PID = 16820
Z, PID = 17103
Z, PID = 17476
Z, PID = 17544
Z, PID = 17669
Z, PID = 17698
Z, PID = 17880
Z, PID = 17923
Z, PID = 18037
Z, PID = 18169
Z, PID = 18282
Z, PID = 18484
Z, PID = 18689
Z, PID = 18728
Z, PID = 18910
Z, PID = 19084
Z, PID = 19338
Z, PID = 19507
Z, PID = 19630
Z, PID = 19672
Z, PID = 19726
Z, PID = 19853
Z, PID = 19860
Z, PID = 20088
Z, PID = 20233
Z, PID = 20411
Z, PID = 20541
Z, PID = 20615
Z, PID = 20704
Z, PID = 20971
Z, PID = 21078
Z, PID = 21246
Z, PID = 21344
Z, PID = 21416
Z, PID = 21424
Z, PID = 21449
Z, PID = 21454
Z, PID = 21617
Z, PID = 21646
Z, PID = 21768
Z, PID = 21819
Z, PID = 21835
Z, PID = 21883
Z, PID = 21892
Z, PID = 22013
Z, PID = 22072
Z, PID = 22076
Z, PID = 22151
Z, PID = 22350
Z, PID = 22525
Z, PID = 22744
Z, PID = 22907
Z, PID = 23156
Z, PID = 23613
Z, PID = 23786
Z, PID = 24111
Z, PID = 24485
Z, PID = 24918
Z, PID = 25304
Z, PID = 25678
Z, PID = 26131
Z, PID = 26461
Z, PID = 26856
Z, PID = 27311
Z, PID = 28750
Z, PID = 28752
Z, PID = 28985
Z, PID = 29002
Z, PID = 29411
Z, PID = 29479
Z, PID = 29761
Z, PID = 29818
Z, PID = 30169
Z, PID = 30260
Z, PID = 30644
Z, PID = 30713
Z, PID = 31001
Z, PID = 31033
Z, PID = 31096
Z, PID = 31360
Z, PID = 31424
Z, PID = 31486
Z, PID = 31772
Z, PID = 31790
Z, PID = 31950
Z, PID = 32214
Z, PID = 32222
Z, PID = 32297
Z, PID = 32597
Z, PID = 32666
Z, PID = 32701
ubuntu@utilex:~$

Open in new window

0
dfkeCommented:
Hi,

the key here is to try and kill parent processes that cause these zombies.

Issue:
ps ef

Open in new window

You willl see a tree or some trees of PID's.  Now try and find one of the PID's from your output list and then look at the parent PID's. Then try to kill the parent processes or at least this way you should be able to find out what causes all these zombie processes to appear in the first place.

Or if this all seems too much work ( which it does sound like to me) then you can try to kill all the parents by issuing:

ps axu | awk '"[Zz] ~ $8 { system(sprintf("kill -HUP %d", $2)); }'

Open in new window

If the zombies are still there you can replace -HUP with -9.

-9 should terminate for sure but it is usually considered bad practice.

Cheers
0
Webinar: Cyber Crime Becomes Big Business

The rising threat of malware-as-a-service is not one to be overlooked. Malware-as-a-service is growing and easily purchased from a full-service cyber-criminal store in a “Virus Depot” fashion. Join us in our upcoming webinar as we discuss how to best defend against these attacks!

GerhardpetAuthor Commented:
So a reboot would not clear this up? If I kill all parent processes will I need to reboot the server or will it still be operational after I kill all of them?
0
dfkeCommented:
Hi,

Rebooting will clear out the zombie processes but in your case something is started upon boot what's causing your zombie processes to appear again.

So it is essential to find out what the parent processes are.

Look at the example below:

PID TTY      STAT   TIME COMMAND
5102 pts/3    Ss     0:00 bash MANPATH=/usr/local/share/man:/usr/share/man:/usr
5229 pts/3    R+     0:00  \_ ps ef MANPATH=/usr/local/share/man:/usr/share/man
4929 tty1     S+     0:00 -bash TERM=linux HOME=/home/zymos SHELL=/bin/bash USE
4954 tty1     S+     0:00  \_ /bin/sh /usr/bin/startx MANPATH=/usr/local/share/
4970 tty1     S+     0:00      \_ xinit /home/zymos/.xinitrc -- -nolisten tcp -
4975 tty1     S      0:01          \_ icewm MANPATH=/usr/local/share/man:/usr/s
4977 tty1     S      0:01              \_ xterm MANPATH=/usr/local/share/man:/u
4987 pts/0    Ss     0:00              |   \_ bash MANPATH=/usr/local/share/man
5090 pts/0    Sl+    1:31              |       \_ ./hydranode MANPATH=/usr/loca
5092 pts/0    Sl+    0:29              |           \_ ./hydranode-core --disabl
4978 tty1     S      0:00              \_ xterm -e /home/zymos/.icewm/startup M
4986 pts/1    Ss+    0:00              |   \_ /bin/sh /home/zymos/.icewm/startu
4993 pts/1    S+     0:00              |       \_ gaim    MANPATH=/usr/local/sh
4994 pts/1    S+     0:00              |       \_ /bin/bash /usr/libexec/mozill
5030 pts/1    Sl+    1:29              |       |   \_ /usr/lib/mozilla-firefox/
5067 pts/1    Z+     0:00              |       |       \_ [netstat] <defunct>
4995 pts/1    S+     0:00              |       \_ xterm MANPATH=/usr/local/shar
5032 pts/2    Ss+    0:00              |       |   \_ bash MANPATH=/usr/local/s
5085 pts/1    S+     0:00              |       \_ boinc_client MANPATH=/usr/loc
4981 tty1     S      0:00              \_ icewmtray MANPATH=/usr/local/share/ma
5060 pts/1    S+     0:00 /usr/libexec/gconfd-2 13 MANPATH=/usr/local/share/man

In this example ps ef output you can see that PID 5067 is a zombie process. You can also see that it's parent PID 5030 mozilla-firefox is causing netstat to defunct for some reason. Killing PID 5030 will probably get rid of the zombie PID 5067.

So in your case you need to identify the application that causes the zombies processes.  Killing the parent PIDs will kill the zombie processes. Once all zombie processes are gone check the cpu load again and see if has dropped. Do not reboot.

Then the next step is to find out why that particular application is causing zombie processes. It could be a bug so also check for software updates.

Cheers
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GerhardpetAuthor Commented:
Killing the zombies fixed the problem. Thank you for your help!
0
dfkeCommented:
Hi,

no problem happy to help.

Cheers
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.