Link to home
Start Free TrialLog in
Avatar of mte01
mte01Flag for Lebanon

asked on

No child processes ERROR after calling the accept function.

Hello,
i am getting the error (CHILD      10 No child processes) sometimes when i issue a system call (accept) following is the code and the debug logs:

Code:
sock = accept((*network)->networkSpecific.TCP.listenSocket, &from, &len);
COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
<< errno << ")*************\n\n\n";

Logs:
*************AFTER ACCEPT(sock: 5/errno: 10)*************

Any idea of hint on what might cause this error? again this error take place in some rare cases and most of the time the above code works as expected.
thx
Avatar of Narendra Kumar S S
Narendra Kumar S S
Flag of India image

From the manpage of accept() system call, I don't see this error.
From your comments, looks like, it is related to SIGCHLD system call.
So, can you add this code:
signal(SIGCHLD,SIG_IGN);

before you create a socket and see if you still get the error?

And are you using fork() or any other system call to create a child process?
ASKER CERTIFIED SOLUTION
Avatar of sarabande
sarabande
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mte01

ASKER

Hi and Thx for the reply

i am debugging an issue with an old code, and trying to figure out the reason of this problem, this is the complete while loop:

 do {
        if (debug)
            COUT << "\n\n\n*************BEFORE ACCEPT*************\n";
        sock = accept((*network)->networkSpecific.TCP.listenSocket, &from, &len);
        if (debug)
            COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
            << errno << ")*************\n\n\n";
    } while (sock == -1 && errno == EINTR && errno == ECHILD);


can you tell me how did you know that the error is: SIGCHLD  and what do it mean?
i think the fork is taking place later when the system tries to use this connection for receiving some info.

Avatar of mte01

ASKER

sarabande, when errorno is 10 i got an error and in my case the image was not transferred, i see in the logs that a successful transfer had errno = 0.

*************AFTER ACCEPT(sock: 5/errno: 0)*************

Note this errorno 10 happens very rarely.
Avatar of mte01

ASKER

ssnkumar, thanks for your reply,

i use fork afterward to handle the new connection.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ssnkumar, your last comment was only a repetition what i already told.

but mte01 stated it was an error when the errno was 10 (in rare cases). that statement doesn't correspond with the code and the reported output "sock: 5/errn.: 10".

mte01, can you post a logfile entry where sock was -1 and errno was 10 ?

Sara
sarabande,

I agree that whatever I replied is same as yours.
Since, mte01 didn't understand what you told, I had to explain it in more detail.

If something is not understood, then there is nothing wrong in explaining the same in more words or by taking some examples, right?

In this case, since accept() is returning 5, it is very clear that there is no error and hence errno should not be checked.
Avatar of mte01

ASKER

Hi Guys
It was clear that errno 10 with sock = 5 means that i should disregard the errno, maybe it was just a coincidence that the problem takes place only when i have 5/10 and not when i have 5/0.
anyways following are the logs that i have, maybe the problem is taking place upon reading the actual data.
anyways thanks for the efforts to help.

Failure scenario logs:
--------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:42:59 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 10)*************

DUL  FSM Table: State: 1 Event: 4DUL  Event:  Transport connection indication
DUL  Action: AE 5 Transport Connect Response
DUL  FSM Table: State: 2 Event: 16DUL  Event:  Transport connection closed
DUL  Action: AA 5 Stop ARTIM timer
PID(18047) Association Received ( 192.168.40.85:-> Thu Mar 17 09:43:13 2011
DUL  FSM Table: State: 1 Event: 7DUL  Event:  A-ASSOCIATE resp prim (reject)
DUL  Action:
0006:0303 DUL Finite State Machine Error: No action defined, state 1 event 7
=== S9: ASC_destroyAssociation
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:43:13 2011


Working scenario logs:
-----------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 11:41:54 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 0)*************

DUL  FSM Table: State: 1 Event: 4DUL  Event:  Transport connection indication
DUL  Action: AE 5 Transport Connect Response
Read PDU HEAD TCP:  01 00 00 00 01 03
Read PDU HEAD TCP: type: 01, length: 259 (103)
DUL  FSM Table: State: 2 Event: 5DUL  Event:  A-ASSOCIATE-RQ PDU (on tranport)
DUL  Action: AE 6 Examine Associate Request
PDU Type: Associate Request PDU Length: 265
  01  00  00  00  01  03  00  01  00  00  4f  55  54  53  49  44
  45  50  52  49  4f  52  53  20  20  20  44  43  4d  48  2d  4d
  33  20  20  20  20  20  20  20  20  20  00  00  00  00  00  00
mte01,
Your question was: "No child processes ERROR after calling the accept function".
But, from looking at the log, we can say very clearly that, whatever you are thinking as error is not an error.

But, errno should not get to 10 after the call to accept.
Could it be a problem with something else?
Do you have another thread that is running in parallel and has executed a system call and failed?
Avatar of mte01

ASKER

The error is being generated by the read function:
in my case, the bytesRead is 0 and DUL_NETWORKCLOSED is raised. errno 10 was misleading me.
read returns 0 when connection is closed on the remote end (you guys agree?).
Thanks for your help


do {
          bytesRead = connection->read((char*)b, size_t(l));
        } while (bytesRead == -1 && errno == EINTR);

        /* if we actually received data, move the buffer pointer to its own end, update the variable */
        /* that determines the end of the first loop, and update the reference parameter return variable */
        if (bytesRead > 0) {
            b += bytesRead;
            l -= (unsigned long) bytesRead;
            if (rtnLen != NULL)
                *rtnLen += (unsigned long) bytesRead;
        } else {
            /* in case we did not receive data, an error must have occured; return a corresponding result value */
            return DUL_NETWORKCLOSED;
        }
Avatar of mte01

ASKER

Solution was not complete.
so you raised the event "Event: 16 DUL  Event:  Transport connection closed" comes from the read function  when returning DUL_NETWORKCLOSED?

i wonder why you assume network closed when the read returns 0.  it should give more reasons why a read would fail, doesn't it?

can you post the code of the read function?

Sara

Avatar of mte01

ASKER

Here is the read function implementation,

ssize_t DcmTCPConnection::read(void *buf, size_t nbyte)
{
    return ::read(getSocket(), (char *)buf, nbyte);
}



i assume your socket is non-blocking. because of using ::read and not ::recv the normal return is -1 (socket error) and errno EINTR. when using ::recv you would get bytesRead = -1 and errno = EWOULDBLOCK. in both cases you shouldn't get a return of 0 bytes (though tcp/ip runs asynchronous and a 0 return is not impossible) but i wouldn't handle it as network closed error. if the network really closed you soonly would get a normal error return -1.

Sara  
> read returns 0 when connection is closed on the remote end (you guys agree?).
No, I don't agree.
It has to return -1 and set the errno to one of these:
       ECONNRESET
              A read was attempted on a socket and the connection was forcibly closed by its peer.
       ENOTCONN
              A read was attempted on a socket that is not connected.
Prototype of read is:
ssize_t read(int fildes, void *buf, size_t nbyte);
if the value passed for nbyte is 0, then read() function will return errors if there are any.
If there are no errors, then read() function will return 0.

So, it will be helpful to debug, if you can print the value of 2nd argument you are passing to DcmTCPConnection::read().
This manpage gives the details of when read returns 0:
http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
Avatar of mte01

ASKER

Thanks guys for the replies,
The nbyte value is hard coded to 6


I saw the following in the man page, these are the cases when read returns 0.

1 - When attempting to read from an empty pipe or FIFO if no process has the pipe open for writing, read() shall return 0 to indicate end-of-file .
--- maybe in this case the other end did not write anything on the socket?

2 - In addition, read() shall fail if the STREAM head had processed an asynchronous error before the call. In this case, the value of errno shall not reflect the result of read(), but reflect the prior error. If a hangup occurs on the STREAM being read, read() shall continue to operate normally until the STREAM head read queue is empty. Thereafter, it shall return 0.
--- This is what i meant by having a closed connection on the other end!

is there any other scenario that i missed?


when a read was made on an open socket, in my opinion a return of 0 bytes read would happen when the tcp/ip doesn't signal 'interrupted system call' (EINTR) because the socket wasn't blocking anymore but still had not received the message yet.

i prefer recv before read where you would ignore a 0 bytes read like

ssize_t DcmTCPConnection::read(void *buf, size_t nbyte)
{
    size_t nbyte_read = 0;
    int nread = 0;
    while (nbyte_read < nbyte &&
               (nread = recv(getSocket(), (char *)buf + nbyte_read, nbyte-nbyte_read)) >= 0)
    {
         nbyte_read += nread;
    }
    if (nbyte_read > 0  && (nread >= 0 || (nread == -1 && errno == EWOULDBLOCK))
        return nbyte_read;
    return -1;
)

Open in new window


the above would consider that tcp/ip could send a message in parts and would ignore 0 bytes reads.

Sara