No child processes ERROR after calling the accept function.

mte01
mte01 used Ask the Experts™
on
Hello,
i am getting the error (CHILD      10 No child processes) sometimes when i issue a system call (accept) following is the code and the debug logs:

Code:
sock = accept((*network)->networkSpecific.TCP.listenSocket, &from, &len);
COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
<< errno << ")*************\n\n\n";

Logs:
*************AFTER ACCEPT(sock: 5/errno: 10)*************

Any idea of hint on what might cause this error? again this error take place in some rare cases and most of the time the above code works as expected.
thx
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
From the manpage of accept() system call, I don't see this error.
From your comments, looks like, it is related to SIGCHLD system call.
So, can you add this code:
signal(SIGCHLD,SIG_IGN);

before you create a socket and see if you still get the error?

And are you using fork() or any other system call to create a child process?
Top Expert 2016
Commented:
the errno is only an indication of an error if you would got a socket error (INVALID_SOCKET == -1). you got a valid socket 5 and so in my opinion the errno has no meaning for the accept call.

Sara

Author

Commented:
Hi and Thx for the reply

i am debugging an issue with an old code, and trying to figure out the reason of this problem, this is the complete while loop:

 do {
        if (debug)
            COUT << "\n\n\n*************BEFORE ACCEPT*************\n";
        sock = accept((*network)->networkSpecific.TCP.listenSocket, &from, &len);
        if (debug)
            COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
            << errno << ")*************\n\n\n";
    } while (sock == -1 && errno == EINTR && errno == ECHILD);


can you tell me how did you know that the error is: SIGCHLD  and what do it mean?
i think the fork is taking place later when the system tries to use this connection for receiving some info.

11/26 Forrester Webinar: Savings for Enterprise

How can your organization benefit from savings just by replacing your legacy backup solutions with Acronis' #CyberProtection? Join Forrester's Joe Branca and Ryan Davis from Acronis live as they explain how you can too.

Author

Commented:
sarabande, when errorno is 10 i got an error and in my case the image was not transferred, i see in the logs that a successful transfer had errno = 0.

*************AFTER ACCEPT(sock: 5/errno: 0)*************

Note this errorno 10 happens very rarely.

Author

Commented:
ssnkumar, thanks for your reply,

i use fork afterward to handle the new connection.
Commented:
You are doing:
>  sock = accept((*network)->networkSpecific.TCP.listenSocket, &from, &len);
>         if (debug)
>             COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
>             << errno << ")*************\n\n\n";
So, you are printing the value of errno, when debug has a non-zero value.
But, the value of errno is relevant, only when the previous command returns an error.
That means, if accept() system call returns a -1, then only you should check the value of errno.
Otherwise, its value doesn't have any meaning.

Now, in your o/p:
> Logs:
> *************AFTER ACCEPT(sock: 5/errno: 10)*************
accept is returning the value 5.
So, that is not an error and hence you should not check the value of errno.
Whether it is 10 or 15 or 100, doesn't make sense.
Top Expert 2016

Commented:
ssnkumar, your last comment was only a repetition what i already told.

but mte01 stated it was an error when the errno was 10 (in rare cases). that statement doesn't correspond with the code and the reported output "sock: 5/errn.: 10".

mte01, can you post a logfile entry where sock was -1 and errno was 10 ?

Sara

Commented:
sarabande,

I agree that whatever I replied is same as yours.
Since, mte01 didn't understand what you told, I had to explain it in more detail.

If something is not understood, then there is nothing wrong in explaining the same in more words or by taking some examples, right?

In this case, since accept() is returning 5, it is very clear that there is no error and hence errno should not be checked.

Author

Commented:
Hi Guys
It was clear that errno 10 with sock = 5 means that i should disregard the errno, maybe it was just a coincidence that the problem takes place only when i have 5/10 and not when i have 5/0.
anyways following are the logs that i have, maybe the problem is taking place upon reading the actual data.
anyways thanks for the efforts to help.

Failure scenario logs:
--------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:42:59 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 10)*************

DUL  FSM Table: State: 1 Event: 4DUL  Event:  Transport connection indication
DUL  Action: AE 5 Transport Connect Response
DUL  FSM Table: State: 2 Event: 16DUL  Event:  Transport connection closed
DUL  Action: AA 5 Stop ARTIM timer
PID(18047) Association Received ( 192.168.40.85:-> Thu Mar 17 09:43:13 2011
DUL  FSM Table: State: 1 Event: 7DUL  Event:  A-ASSOCIATE resp prim (reject)
DUL  Action:
0006:0303 DUL Finite State Machine Error: No action defined, state 1 event 7
=== S9: ASC_destroyAssociation
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:43:13 2011


Working scenario logs:
-----------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 11:41:54 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 0)*************

DUL  FSM Table: State: 1 Event: 4DUL  Event:  Transport connection indication
DUL  Action: AE 5 Transport Connect Response
Read PDU HEAD TCP:  01 00 00 00 01 03
Read PDU HEAD TCP: type: 01, length: 259 (103)
DUL  FSM Table: State: 2 Event: 5DUL  Event:  A-ASSOCIATE-RQ PDU (on tranport)
DUL  Action: AE 6 Examine Associate Request
PDU Type: Associate Request PDU Length: 265
  01  00  00  00  01  03  00  01  00  00  4f  55  54  53  49  44
  45  50  52  49  4f  52  53  20  20  20  44  43  4d  48  2d  4d
  33  20  20  20  20  20  20  20  20  20  00  00  00  00  00  00

Commented:
mte01,
Your question was: "No child processes ERROR after calling the accept function".
But, from looking at the log, we can say very clearly that, whatever you are thinking as error is not an error.

But, errno should not get to 10 after the call to accept.
Could it be a problem with something else?
Do you have another thread that is running in parallel and has executed a system call and failed?

Author

Commented:
The error is being generated by the read function:
in my case, the bytesRead is 0 and DUL_NETWORKCLOSED is raised. errno 10 was misleading me.
read returns 0 when connection is closed on the remote end (you guys agree?).
Thanks for your help


do {
          bytesRead = connection->read((char*)b, size_t(l));
        } while (bytesRead == -1 && errno == EINTR);

        /* if we actually received data, move the buffer pointer to its own end, update the variable */
        /* that determines the end of the first loop, and update the reference parameter return variable */
        if (bytesRead > 0) {
            b += bytesRead;
            l -= (unsigned long) bytesRead;
            if (rtnLen != NULL)
                *rtnLen += (unsigned long) bytesRead;
        } else {
            /* in case we did not receive data, an error must have occured; return a corresponding result value */
            return DUL_NETWORKCLOSED;
        }

Author

Commented:
Solution was not complete.
Top Expert 2016

Commented:
so you raised the event "Event: 16 DUL  Event:  Transport connection closed" comes from the read function  when returning DUL_NETWORKCLOSED?

i wonder why you assume network closed when the read returns 0.  it should give more reasons why a read would fail, doesn't it?

can you post the code of the read function?

Sara

Author

Commented:
Here is the read function implementation,

ssize_t DcmTCPConnection::read(void *buf, size_t nbyte)
{
    return ::read(getSocket(), (char *)buf, nbyte);
}



Top Expert 2016

Commented:
i assume your socket is non-blocking. because of using ::read and not ::recv the normal return is -1 (socket error) and errno EINTR. when using ::recv you would get bytesRead = -1 and errno = EWOULDBLOCK. in both cases you shouldn't get a return of 0 bytes (though tcp/ip runs asynchronous and a 0 return is not impossible) but i wouldn't handle it as network closed error. if the network really closed you soonly would get a normal error return -1.

Sara  

Commented:
> read returns 0 when connection is closed on the remote end (you guys agree?).
No, I don't agree.
It has to return -1 and set the errno to one of these:
       ECONNRESET
              A read was attempted on a socket and the connection was forcibly closed by its peer.
       ENOTCONN
              A read was attempted on a socket that is not connected.

Commented:
Prototype of read is:
ssize_t read(int fildes, void *buf, size_t nbyte);
if the value passed for nbyte is 0, then read() function will return errors if there are any.
If there are no errors, then read() function will return 0.

So, it will be helpful to debug, if you can print the value of 2nd argument you are passing to DcmTCPConnection::read().

Commented:
This manpage gives the details of when read returns 0:
http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html

Author

Commented:
Thanks guys for the replies,
The nbyte value is hard coded to 6


I saw the following in the man page, these are the cases when read returns 0.

1 - When attempting to read from an empty pipe or FIFO if no process has the pipe open for writing, read() shall return 0 to indicate end-of-file .
--- maybe in this case the other end did not write anything on the socket?

2 - In addition, read() shall fail if the STREAM head had processed an asynchronous error before the call. In this case, the value of errno shall not reflect the result of read(), but reflect the prior error. If a hangup occurs on the STREAM being read, read() shall continue to operate normally until the STREAM head read queue is empty. Thereafter, it shall return 0.
--- This is what i meant by having a closed connection on the other end!

is there any other scenario that i missed?


Top Expert 2016

Commented:
when a read was made on an open socket, in my opinion a return of 0 bytes read would happen when the tcp/ip doesn't signal 'interrupted system call' (EINTR) because the socket wasn't blocking anymore but still had not received the message yet.

i prefer recv before read where you would ignore a 0 bytes read like

ssize_t DcmTCPConnection::read(void *buf, size_t nbyte)
{
    size_t nbyte_read = 0;
    int nread = 0;
    while (nbyte_read < nbyte &&
               (nread = recv(getSocket(), (char *)buf + nbyte_read, nbyte-nbyte_read)) >= 0)
    {
         nbyte_read += nread;
    }
    if (nbyte_read > 0  && (nread >= 0 || (nread == -1 && errno == EWOULDBLOCK))
        return nbyte_read;
    return -1;
)

Open in new window


the above would consider that tcp/ip could send a message in parts and would ignore 0 bytes reads.

Sara

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial