How to clear the Memory segment associated with a Linux process ?

Hello Linux Experts,  Good morning/Good day !

Please look the attached Screen shot of Top output taken from a DB server. You would noticed 2 Uninterruptible process (in state 'D') in it.  PIDs 9900 & 11240. In this server, we often experienced performance issue. When we investigate that time, we had noticed one or more processes in D state occupying hung chunk of Memory causing the high I/O wait on the server and the server would be running out of Physical Memory, which inturn would cause more Page swapping. Most of them we ended up rebooting the server to bring the server to Normal performance. Bcoz the 'kill -9' doesn't work with Uninterruptable process.

Now as a proactive measure, we are keeping close eye on this server.  Today I found there are 2 Uninterruptable processes (in D state) shown on the top output. I waited for about an hour but the state isn't tend to changing to Sleep or Running. I have the liberty to kill any process on this server, so I tried 'kill -9', it didn't work.  I checked the Memory segments occupied by this processes by running 'pmap -x' command.

Now my question is, Can I clear all the Memory segments that are associated with these processes in D state ?    Please let me know. Thanks in advance !


[root@mlck-chi-pdb02 ~]# pmap -x 9900
9900:   oraclePMLCKI1 (LOCAL=NO)
Address           Kbytes     RSS    Anon  Locked Mode   Mapping
0000000000400000   99776       -       -       - r-x--  oracle
000000000666f000     532       -       -       - rw---  oracle
00000000066f4000     676       -       -       - rwx--    [ anon ]
0000000060000000   32768       -       -       - rw-s-    [ shmid=0x38002 ]
0000000080000000 4177920       -       -       - rw-s-    [ shmid=0x40003 ]
0000000180000000 4177920       -       -       - rw-s-    [ shmid=0x48004 ]
0000000280000000  575488       -       -       - rw-s-    [ shmid=0x50005 ]
0000002a95556000       8       -       -       - rw---    [ anon ]
0000002a95558000       4       -       -       - r-x--  libcwait.so
0000002a95559000    1020       -       -       - -----  libcwait.so
0000002a95658000       4       -       -       - rw---  libcwait.so
0000002a95659000     148       -       -       - r-x--  libskgxp10.so
0000002a9567e000    1024       -       -       - -----  libskgxp10.so
0000002a9577e000       8       -       -       - rw---  libskgxp10.so
0000002a95780000     940       -       -       - r-x--  libhasgen10.so
0000002a9586b000    1020       -       -       - -----  libhasgen10.so
0000002a9596a000      24       -       -       - rw---  libhasgen10.so
0000002a95970000      20       -       -       - rw---    [ anon ]
0000002a95975000       8       -       -       - r-x--  libskgxn2.so
0000002a95977000    1020       -       -       - -----  libskgxn2.so
0000002a95a76000       4       -       -       - rw---  libskgxn2.so
.
.
<output truncated>
Screen-shot-of-Top.jpg
ashsysadAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rysicCommented:
Can you show (sometimes you must try it more tan once)

strace -p 9900

Open in new window

0
ashsysadAuthor Commented:
@rysic, I ran the strace command against that PID and it gives me a very HUGE output.
0
rysicCommented:
Show us the end.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

ashsysadAuthor Commented:
It keeps running forever. Here's come portion of it:


getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 271046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
times(NULL)                             = 437168934
write(42, "\0F\0\0\6\0\0\0\0\0\7\1\200\0\0\2\301\37\0\0\1N\0\0\4\1\0\0\0\0\0\0"..., 70) = 70
read(42, "\0N\0\0\6\0\0\0\0\0\3\4\322\"\0\0\0\1\0\0\0\0\0\0\0\7\3>df\2\301"..., 2064) = 78
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
times(NULL)                             = 437168934
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
semtimedop(360456, 0x7fbfff83f0, 1, {2, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(360456, 0x7fbfff83f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(360456, 0x7fbfff83f0, 1, {4, 0}) = -1 EAGAIN (Resource temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={6, 272046}, ru_stime={0, 835872}, ...}) = 0
semtimedop(360456, 0x7fbfff83f0, 1, {5, 0}) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigprocmask(SIG_BLOCK, [], NULL, 8)  = 0
times(NULL)                             = 437170329
0
Gerwin Jansen, EE MVETopic Advisor Commented:
The processes in D state are Oracle processes, what is the main function of this (production) server? How do the client (applications) show that there is a performance issue? Can you disable clients or reduce load in some way?
0
ashsysadAuthor Commented:
Hi gerwinjansen,  We have been doing all we could. This is one of a core DB server, which is accessed by a bunch of Application servers. When it happen next time, I don't want to reboot the server. Instead I'm looking for a solution to clear all those D state processes by clearing the Memory segments associated with it. Is that possible? Please let me.
0
ashsysadAuthor Commented:
By searching in Internet, I found a command called 'ipcrm'. Can we use that ?
0
Gerwin Jansen, EE MVETopic Advisor Commented:
I'm not familiar with ipcrm, think you should add zone 'Oracle' to your question to get Oracle experts' attention for your question.
0
ravenplCommented:
> By searching in Internet, I found a command called 'ipcrm'. Can we use that ?
ipcrm is for inter process communication resources cleanup, that is shared memory, message queues and semaphores - has nothing to do wit D state,

> Instead I'm looking for a solution to clear all those D state processes by clearing the Memory segments associated with it. Is that possible?
No.
State D means the process(thread) is waiting for local IO - usually local disk access or swap. Consider monitoring disks usage, maybe Your disks are simply too slow to handle the DB load. Another usual issue is the swapping, monitor sappiness and swap usage.

Clearing memory segments of the process would lead to segfault as soon as it returns from syscall and seems unwise.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rysicCommented:
@ashsysad,
There is no way to kill D process - the only way is to figure what is he waiting for...
I suggest contact your Oracle support.
0
rysicCommented:
0
ashsysadAuthor Commented:
Thankyou, I accept your answer on killing 'D' processes.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.