Question

Problems with nohup in the 2.6 kernel

Asked by: fklein23

I have been trying to put a complex multi-threaded process in the background. I am using CentOS 5.3 (a 2.6 kernel). Previously, using a 2.4 kernel, I used the following template successfully.

nohup ./my_process > nohup.08-06-09.out &

This always worked for me. I have conducted dozens of experiments, using simple programs, and have always gotten the above template to work if I followed these rules:

1. The process is a simple script (like #1 below)
2. A compiled program, SimpleTask.c, that frequently calls fflush(stdout)

#1. simply always works.
#2. works as long as I explicitly call fflush to purge the stdout stream.

If I eliminate the fflush call in #2, the "nohup.*.out" file is created, but always has zero length. If I include the fflush call, nohup.*.out is just what I expect, and contains all the terminal output from the process.

So now I need to put a large, complex program online that has minimal terminal output, but preserving that output is essential. This program is built from about 180 source code files, has been 4 years in development and is ready for production, so I MUST background it. 2 years ago, the 2.4 kernel demo version of this program worked fine with nohup.

The new 2.6 kernel version simply will not work with background attempts.

This method:

nohup ./big_process &

doesn't work because the nohup.out file is permanently empty. I don't know exactly what the process is doing, but it seems to be running.

This method:

./big_process &
disown -h

doesn't work because the disowned process is stuck somewhere.

I have tried every combination of things I can think of in terms of explicit redirection of output, redirection of input, ad nauseum...

I have a script that displays the open files of the complex process and the output looks the same except that the filesize of all the files owned by the backgrounded process are all frozen.

Example #1: A script that always works:
 
#!/bin/sh
# This script prints the date and time once per second
while [ 1 ]
do
   date
   sleep 1
done
 
---------- end of script --------------------------------
 
Example #2: Executable that works as long as fflush is in the loop:
 
#include <stdio.h>
#include <stdlib.h>
 
int main (void)
{   while (1)
    {
        printf("Hello there...\n");
        fflush(stdout);   // Remove this and nohup.out never contains output
        sleep(1);
    }
}

                                  
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:

Select allOpen in new window

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2009-08-06 at 09:13:46ID24632004
Tags

nohup "background process" Linux hangup SIGHUP

Topics

Linux Programming

,

CentOS

,

Linux Setup

Participating Experts
2
Points
500
Comments
12

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. kernel update
    I have Linux Slackware 3.6 ( kernel 2.0.35 ). I would like to update to kernel 2.2. Where do I get the new kernel ? How do I update ? Are there possible problems ?
  2. Incomplete kernels
    As a newbie to Linux, could someone explain why, when downloading a kernel from www.kernel.org for example, some kernels are supplied incomplete, i.e. without 'arch' 'drivers' and 'scripts', also no 'docs'? I assumed that these files are common to all kernels, but that is ob...
  3. lock_kernel() ...
    Hi, I'm Salvatore , from Italy. My question: How can i lock a my function from time-sharing-scheduling? I've tried including smp_lock.h that contain lock_kernel & unlock_kernel() and calling it. gcc goals but i think that it does not work. I'm writing some programs about...
  4. Kernel
    Hi all, I welcome everyone to join this discussion about kernels. Points will be split equally between all good comments. This question has been posted in the MS-DOS section so that we start from the most basic kernel of DOS, io.sys and msdos.sys. My understanding of DOS a...
  5. Kernel Upgarde
    Hi 1) I want to install ORACLE 10 G on a Redhat ES 3,(Redhat Enterprise 3) 2) ORACLE 10 g Requires Linux kernel Version 2.4.21-15 Minimum 3) My Installed Redhat having only 2.4.21-4 Kernel since it is not a registered version I can’t get Automatic updates from Redhat ...

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: cjl7Posted on 2009-08-06 at 09:28:04ID: 25035296

Well,

You could use screen as a way to get forward right now.

If all you want to do is detach the program that will work.

About nohup I don't know why it isn't working for you.


//jonas

 

by: fklein23Posted on 2009-08-06 at 13:29:38ID: 25037713

We already considered this option.
I haven't delved deeply into screen, but my first impression is that it would be much more awkward for our operators to use screen. The nohup option can be easily embedded in a script. The operators are not linux-savvy and we distilled the whole process down to two aliased commands:

runbmc   --> starts the 'bmc' process
endbmc  --> ends the 'bmc' process

Internally, the runbmc script simply uses nohup.
Finding that nohup just didn't work was a really big shock.

We also considered using vncserver. That option is awkward for the operators, too. We'd prefer to reduce the requirement for the operators to have to navigate around multiple windows to control the linux applications. There are 4 apps that run continuously and interoperate among themselves. Three out of the 4 work fine in background, but the 4th one has just stumped us!

 

by: RBEIMSPosted on 2009-08-06 at 14:05:50ID: 25038052

Just out of curiosity:
Did you try to just background the process without calling nohup? I ask this because nohup is normally used when you have to make sure that the process will not die if the shell that called it dies. The nohup program only ignores the hangup signal.
If the process runs long but not "days long" then it could be a good possibility. In this case the shell would be ready to do other commands while the process is running (even starting the same process more times with different output files).

Another thing that you could try is to create a perl "daemon" script and then use the system() function to call your process inside it. Then you could check the return value from system() to decide if the process ended and perhaps even print something good in your output file afterwards.

To create a daemon perl script, you could use this:
http://www.webreference.com/perl/tutorial/9/3.html

 

by: fklein23Posted on 2009-08-06 at 14:39:58ID: 25038355

We have been running for over a year now with the processes running as foreground tasks under separate SSH shells (one for each process). 9 times out of 10 this has been fine, but the 1 time out of 10 that the processes actually die during a ssh disconnect could be disastrous.

"process will not die if the shell that called it dies" ---> this is exactly why we used nohup. The network connection we use to start the processes with SSH could have an interruption causing a hangup at any time. The processes must be completely immune to hangups, or ANY TCP/IP disconnect from any of its connected devices.

There are more than 4 processes that interoperate on one Linux computer. These in turn interact with an arbitrary number of connections coming in via TCP/IP.  The processes are each unique (there are never multiple instances of one of the processes).

They are run on a linux box that may be on the other side of the world, with only ssh shell connecting via the internet (or intranet, depending on which installation we are discussing) to connect to and start or stop the processes. Once they are started, they must be autonomous. They must be immune to closing the shell, intentionally, accidentally or due to "acts of God", like power outages, storms, earthquakes, or whatever.

We'd like the processes to run for 20 years (the useful life of the plant), only interrupting the processes for occasional S/W upgrades or maintenance procedures.

I am not ruling out the perl daemon script. I will look over the website you mentioned, but I am not quite sure I understand how that would solve our hangup problem.

If we don't find a solution soon, we are going to have to resort to using vncserver, but I don't really want to require our linux boxes to have to run X-Windows, for performance reasons.

 

by: RBEIMSPosted on 2009-08-06 at 15:21:58ID: 25038661

I think that the perl script would solve the problem because it's daemonizing itself. This means that it's forking and setting it's parent process to be init. In this case, it's not dependent anymore of the shell.
This is the same as all the server processes do, with the exception that this one is done in perl.
Please notice that the script is doing two things:
It's calling fork() to create a new copy of the script process;
It's calling setsid(), that creates a new session for the process;

From the description of the setsid() function:
"The setsid() function shall create a new session, if the calling process is not a process group leader. Upon return the calling process shall be the session leader of this new session, shall be the process group leader of a new process group, and shall have no controlling terminal. The process group ID of the calling process shall be set equal to the process ID of the calling process. The calling process shall be the only process in the new process group and the only process in the new session."

It actually would be better if you modified the code of your software to call these functions by itself. This way you would have a daemon process and avoid the fuss of messing with a script / other program to control it.

 

by: fklein23Posted on 2009-08-06 at 16:10:53ID: 25038912

An interesting concept.
I am not a fan of scripts in general and the last thing you mentioned is worth studying -- having the process daemonize itself. I rather like that.

However, I just encountered a new wrinkle.
I tried the exact same code on a CentOS 5.0 instead of CentOS 5.3, and the problem went away!

This throws a whole new light on the problem.
More in the morning. I'm burnt out!

 

by: cjl7Posted on 2009-08-07 at 07:32:42ID: 25043090

If you expect the process to run for so long without intervention screen might be the fastest road to glory.

It isn't that difficult...

CTRL-a is the default command sequence..

A sample session
screen (to start screen, you will have a "normal" shell)
$ ./start_my_app

And thats all you need...

Should your connection die or your personal computer that connects to the server die screen will still be alive next time you login. Just do a 'screen -r' to re-assume the previous session...

I always use screen, it's much better then nohup!? (IMO)

//jonas

 

by: RBEIMSPosted on 2009-08-07 at 10:31:07ID: 25045106

I also use screen all the time and it really works well. You can even do
screen start_my_app
and then the new session will be created with the program running automatically.
This would actually avoid all the mess with having to redirect STDOUT / STDERR as they are connected to screen and will be always available even if your connection drops (as stated before).

 

by: fklein23Posted on 2009-08-07 at 11:39:18ID: 25045732

Thanks for your input. I am definitely planning to try 'screen'
I still wonder about this, however:

What if a process is started by terminal session X, User A
User A closes his session, exits from the SSH client and turns his computer off, leaving a process running in the background.

When user B logs into another SSH session Y, does screen somehow give this other user, with a different IP address and SSH session the ability to attach to the running process and bring it into the foreground?

With nohup, ANYONE at any time can see nohup.out. It wasn't clear to me that screen could do that, unless you were restarting the same session.

 

by: RBEIMSPosted on 2009-08-07 at 12:07:21ID: 25045960

You can try to follow this tutorial to get screen configured for sharing session between users:
http://ubuntuforums.org/showthread.php?t=299286

 

by: fklein23Posted on 2009-08-07 at 12:26:00ID: 25046111

OK, I'm convinced.

screen is very cool. I think it fits our needs.

You guys both gave me lots of good input on this matter, so I am going to split the points.

Thanks again

 

by: fklein23Posted on 2009-08-07 at 12:28:02ID: 31612521

Thanks for you help. You were both correct about screen. Nohup just went into the dustbin!

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...