Process in D state... should this cause system to not shutdown?

I have a process that gets hung in the D state.  Redhat says I just have to work with the maker of that process (which I am)... but it also seems like redhat should be able to shutdown even if this process is hanging.  

Thoughts?
XetroximynAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

tfewsterCommented:
As you know, a process in the D state is waiting on I/O (normally disk I/O) and can't be interrupted or killed. It may be possible to fool the process, e.g. if it's waiting on a dead NFS server, bringing another box up with the IP of the dead server will cause the NFS call to return (with an error, but at least the process is out of the I/O wait-state).

The problem with `shutdown` is it tries to use init to shut down all running processes cleanly. Unfortunately if it hangs on one process, it won't progress to the next shutdown script in the order. I don't know if systemd will be better or worse in this situation.

You can halt the server with `halt -f`, which won't call `shutdown`. So before that, close processes manually and unmount filesystems cleanly where possible. You may be able to just kill the hung shutdown script to allow the other shutdown scripts to run.

Be aware that `halt -f` may cause data loss or corruption, so it's a last resort!
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
XetroximynAuthor Commented:
thanks!  Any tips on how to kill all processes and unmount filesystems cleanly?  Normally shutdown does this for me... seems like there should be a way to run "shutdown" and tell it "if a process won't die just move forward with the rest of the shutdown".   It just seems crazy to me that a process in D means you have to bring your system down uncleanly.
0
tfewsterCommented:
It depends on what this rogue process is doing, but if it was, say,  an Oracle listener hung on comms to a remote system, you could shut down the Oracle database, making the listener obsolete.

`ps` should give you some clues - I'd expect to see a process like "service <myprogram> stop" or "/etc/init.d/<myprogram> stop" if the shutdown process is hung, and killing that stop "process" should allow shutdown to continue. If you can post the (anonymised) `ps -ef` output, that may include the process State info and we can help troubleshoot.

`fuser -c /myfilesystem` will show you what processes are holding myfilesystem open and preventing an unmount. Again, using `fuser -k` and `umount -f` are last resorts.
0
Acronis Data Cloud 7.8 Enhances Cyber Protection

A closer look at five essential enhancements that benefit end-users and help MSPs take their cloud data protection business further.

XetroximynAuthor Commented:
Thanks!  I will try some of this out next time.  

FYI the rouge process is asamba  (An acronis samba process used to mount a share during a backup).   The share is accessible and can be mounted normally on the server... so it's a bug in acronis's asamba process...
0
XetroximynAuthor Commented:
So i did this again and it hangs on

/etc/rc6.d/K74ipmi stop

That script is
#!/bin/sh
#############################################################################
#
# ipmi:         OpenIPMI Driver init script
#
# Authors:      Matt Domsch <Matt_Domsch@dell.com>
#               Chris Poblete <Chris_Poblete@dell.com>
#
# chkconfig: - 13 87
# description: OpenIPMI Driver init script


Which I don't think is acronis specific... though this only happens when the asamba process is in state D.
0
XetroximynAuthor Commented:
Oh.. then after killing that it gets stuck at

/etc/rc6.d/K75netfs stop

then it gets stuck at

root      7519  0.0  0.0  65964  1284 ?        S    14:57   0:00 /bin/sh /etc/rc6.d/K76ipsec stop
root      7525  0.0  0.0  65964   572 ?        S    14:57   0:00 /bin/sh /etc/rc6.d/K76ipsec stop
root      7526  0.0  0.0  66100  1392 ?        S    14:57   0:00 /bin/sh /usr/libexec/ipsec/_realsetup stop

then here
root      7855  0.1  0.0  66360  1696 ?        S    14:58   0:00 /bin/bash /etc/rc6.d/K90network stop
root      7876  0.1  0.0  66232  1512 ?        S    14:58   0:00 /bin/bash /etc/init.d/netfs stop

then here
root      7876  0.0  0.0  66232  1512 ?        S    14:58   0:00 /bin/bash /etc/init.d/netfs stop
root      7961  0.0  0.0  66228  1520 ?        S    14:59   0:00 /bin/sh /etc/rc6.d/K99cpuspeed stop

then
root      8032  0.0  0.0  66228  1516 ?        S    15:00   0:00 /bin/sh /etc/init.d/cpuspeed stop

then
root      8050  0.0  0.0  66100  1404 ?        S    15:00   0:00 /bin/sh /etc/init.d/ipmi stop

then
root      8068  0.0  0.0  65964  1284 ?        S    15:00   0:00 /bin/sh /etc/init.d/ipsec stop
root      8074  0.0  0.0  65964   572 ?        S    15:00   0:00 /bin/sh /etc/init.d/ipsec stop
root      8075  0.0  0.0  66100  1392 ?        S    15:00   0:00 /bin/sh /usr/libexec/ipsec/_realsetup stop

Then I got disconnected
0
tfewsterCommented:
Interesting. I can understand netfs and some other network related processes failing to stop as asamba will be dependent on them. But IPMI? cpuspeed? Why would they even check network processes?! Overly paranoid Linux developers?

The good news is that, from what you've said, it seems that "only" your backups are impacted by the asamba failure; As long as you're confident no important applications are running and/or accessing the share at shutdown time there's little risk of data loss or corruption. The OS is fairly resilient to crashes, and an fsck on a modern filesytem is quick, as it just needs to check the intent logs have been flushed. (Obligatory warnings about unclean shutdowns/unmounts - At least, check your system logs for errors on startup and compare with a previous startup to ensure there are no NEW warnings).

When you say "got disconnected" - was that because the server had completed the shutdown (of networking)?  And did it come back up again OK?
0
XetroximynAuthor Commented:
I just got disconnected... the server still needed a hard reboot.  It did come back OK.  Looks like asamba has been fixed by acronis... lets hope it stays that way!  Thanks for the help!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.