mte01
asked on
No child processes ERROR after calling the accept function.
Hello,
i am getting the error (CHILD 10 No child processes) sometimes when i issue a system call (accept) following is the code and the debug logs:
Code:
sock = accept((*network)->network Specific.T CP.listenS ocket, &from, &len);
COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
<< errno << ")*************\n\n\n";
Logs:
*************AFTER ACCEPT(sock: 5/errno: 10)*************
Any idea of hint on what might cause this error? again this error take place in some rare cases and most of the time the above code works as expected.
thx
i am getting the error (CHILD 10 No child processes) sometimes when i issue a system call (accept) following is the code and the debug logs:
Code:
sock = accept((*network)->network
COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
<< errno << ")*************\n\n\n";
Logs:
*************AFTER ACCEPT(sock: 5/errno: 10)*************
Any idea of hint on what might cause this error? again this error take place in some rare cases and most of the time the above code works as expected.
thx
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi and Thx for the reply
i am debugging an issue with an old code, and trying to figure out the reason of this problem, this is the complete while loop:
do {
if (debug)
COUT << "\n\n\n*************BEFORE ACCEPT*************\n";
sock = accept((*network)->network Specific.T CP.listenS ocket, &from, &len);
if (debug)
COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
<< errno << ")*************\n\n\n";
} while (sock == -1 && errno == EINTR && errno == ECHILD);
can you tell me how did you know that the error is: SIGCHLD and what do it mean?
i think the fork is taking place later when the system tries to use this connection for receiving some info.
i am debugging an issue with an old code, and trying to figure out the reason of this problem, this is the complete while loop:
do {
if (debug)
COUT << "\n\n\n*************BEFORE
sock = accept((*network)->network
if (debug)
COUT << "*************AFTER ACCEPT(sock: " << sock << "/errno: "
<< errno << ")*************\n\n\n";
} while (sock == -1 && errno == EINTR && errno == ECHILD);
can you tell me how did you know that the error is: SIGCHLD and what do it mean?
i think the fork is taking place later when the system tries to use this connection for receiving some info.
ASKER
sarabande, when errorno is 10 i got an error and in my case the image was not transferred, i see in the logs that a successful transfer had errno = 0.
*************AFTER ACCEPT(sock: 5/errno: 0)*************
Note this errorno 10 happens very rarely.
*************AFTER ACCEPT(sock: 5/errno: 0)*************
Note this errorno 10 happens very rarely.
ASKER
ssnkumar, thanks for your reply,
i use fork afterward to handle the new connection.
i use fork afterward to handle the new connection.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ssnkumar, your last comment was only a repetition what i already told.
but mte01 stated it was an error when the errno was 10 (in rare cases). that statement doesn't correspond with the code and the reported output "sock: 5/errn.: 10".
mte01, can you post a logfile entry where sock was -1 and errno was 10 ?
Sara
but mte01 stated it was an error when the errno was 10 (in rare cases). that statement doesn't correspond with the code and the reported output "sock: 5/errn.: 10".
mte01, can you post a logfile entry where sock was -1 and errno was 10 ?
Sara
sarabande,
I agree that whatever I replied is same as yours.
Since, mte01 didn't understand what you told, I had to explain it in more detail.
If something is not understood, then there is nothing wrong in explaining the same in more words or by taking some examples, right?
In this case, since accept() is returning 5, it is very clear that there is no error and hence errno should not be checked.
I agree that whatever I replied is same as yours.
Since, mte01 didn't understand what you told, I had to explain it in more detail.
If something is not understood, then there is nothing wrong in explaining the same in more words or by taking some examples, right?
In this case, since accept() is returning 5, it is very clear that there is no error and hence errno should not be checked.
ASKER
Hi Guys
It was clear that errno 10 with sock = 5 means that i should disregard the errno, maybe it was just a coincidence that the problem takes place only when i have 5/10 and not when i have 5/0.
anyways following are the logs that i have, maybe the problem is taking place upon reading the actual data.
anyways thanks for the efforts to help.
Failure scenario logs:
--------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:42:59 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 10)*************
DUL FSM Table: State: 1 Event: 4DUL Event: Transport connection indication
DUL Action: AE 5 Transport Connect Response
DUL FSM Table: State: 2 Event: 16DUL Event: Transport connection closed
DUL Action: AA 5 Stop ARTIM timer
PID(18047) Association Received ( 192.168.40.85:-> Thu Mar 17 09:43:13 2011
DUL FSM Table: State: 1 Event: 7DUL Event: A-ASSOCIATE resp prim (reject)
DUL Action:
0006:0303 DUL Finite State Machine Error: No action defined, state 1 event 7
=== S9: ASC_destroyAssociation
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:43:13 2011
Working scenario logs:
-------------------------- ---
PID(18047) ENTERED waitForAssociation Thu Mar 17 11:41:54 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 0)*************
DUL FSM Table: State: 1 Event: 4DUL Event: Transport connection indication
DUL Action: AE 5 Transport Connect Response
Read PDU HEAD TCP: 01 00 00 00 01 03
Read PDU HEAD TCP: type: 01, length: 259 (103)
DUL FSM Table: State: 2 Event: 5DUL Event: A-ASSOCIATE-RQ PDU (on tranport)
DUL Action: AE 6 Examine Associate Request
PDU Type: Associate Request PDU Length: 265
01 00 00 00 01 03 00 01 00 00 4f 55 54 53 49 44
45 50 52 49 4f 52 53 20 20 20 44 43 4d 48 2d 4d
33 20 20 20 20 20 20 20 20 20 00 00 00 00 00 00
It was clear that errno 10 with sock = 5 means that i should disregard the errno, maybe it was just a coincidence that the problem takes place only when i have 5/10 and not when i have 5/0.
anyways following are the logs that i have, maybe the problem is taking place upon reading the actual data.
anyways thanks for the efforts to help.
Failure scenario logs:
--------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:42:59 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 10)*************
DUL FSM Table: State: 1 Event: 4DUL Event: Transport connection indication
DUL Action: AE 5 Transport Connect Response
DUL FSM Table: State: 2 Event: 16DUL Event: Transport connection closed
DUL Action: AA 5 Stop ARTIM timer
PID(18047) Association Received ( 192.168.40.85:-> Thu Mar 17 09:43:13 2011
DUL FSM Table: State: 1 Event: 7DUL Event: A-ASSOCIATE resp prim (reject)
DUL Action:
0006:0303 DUL Finite State Machine Error: No action defined, state 1 event 7
=== S9: ASC_destroyAssociation
PID(18047) ENTERED waitForAssociation Thu Mar 17 09:43:13 2011
Working scenario logs:
--------------------------
PID(18047) ENTERED waitForAssociation Thu Mar 17 11:41:54 2011
*************BEFORE ACCEPT*************
*************AFTER ACCEPT(sock: 5/errno: 0)*************
DUL FSM Table: State: 1 Event: 4DUL Event: Transport connection indication
DUL Action: AE 5 Transport Connect Response
Read PDU HEAD TCP: 01 00 00 00 01 03
Read PDU HEAD TCP: type: 01, length: 259 (103)
DUL FSM Table: State: 2 Event: 5DUL Event: A-ASSOCIATE-RQ PDU (on tranport)
DUL Action: AE 6 Examine Associate Request
PDU Type: Associate Request PDU Length: 265
01 00 00 00 01 03 00 01 00 00 4f 55 54 53 49 44
45 50 52 49 4f 52 53 20 20 20 44 43 4d 48 2d 4d
33 20 20 20 20 20 20 20 20 20 00 00 00 00 00 00
mte01,
Your question was: "No child processes ERROR after calling the accept function".
But, from looking at the log, we can say very clearly that, whatever you are thinking as error is not an error.
But, errno should not get to 10 after the call to accept.
Could it be a problem with something else?
Do you have another thread that is running in parallel and has executed a system call and failed?
Your question was: "No child processes ERROR after calling the accept function".
But, from looking at the log, we can say very clearly that, whatever you are thinking as error is not an error.
But, errno should not get to 10 after the call to accept.
Could it be a problem with something else?
Do you have another thread that is running in parallel and has executed a system call and failed?
ASKER
The error is being generated by the read function:
in my case, the bytesRead is 0 and DUL_NETWORKCLOSED is raised. errno 10 was misleading me.
read returns 0 when connection is closed on the remote end (you guys agree?).
Thanks for your help
do {
bytesRead = connection->read((char*)b, size_t(l));
} while (bytesRead == -1 && errno == EINTR);
/* if we actually received data, move the buffer pointer to its own end, update the variable */
/* that determines the end of the first loop, and update the reference parameter return variable */
if (bytesRead > 0) {
b += bytesRead;
l -= (unsigned long) bytesRead;
if (rtnLen != NULL)
*rtnLen += (unsigned long) bytesRead;
} else {
/* in case we did not receive data, an error must have occured; return a corresponding result value */
return DUL_NETWORKCLOSED;
}
in my case, the bytesRead is 0 and DUL_NETWORKCLOSED is raised. errno 10 was misleading me.
read returns 0 when connection is closed on the remote end (you guys agree?).
Thanks for your help
do {
bytesRead = connection->read((char*)b,
} while (bytesRead == -1 && errno == EINTR);
/* if we actually received data, move the buffer pointer to its own end, update the variable */
/* that determines the end of the first loop, and update the reference parameter return variable */
if (bytesRead > 0) {
b += bytesRead;
l -= (unsigned long) bytesRead;
if (rtnLen != NULL)
*rtnLen += (unsigned long) bytesRead;
} else {
/* in case we did not receive data, an error must have occured; return a corresponding result value */
return DUL_NETWORKCLOSED;
}
ASKER
Solution was not complete.
so you raised the event "Event: 16 DUL Event: Transport connection closed" comes from the read function when returning DUL_NETWORKCLOSED?
i wonder why you assume network closed when the read returns 0. it should give more reasons why a read would fail, doesn't it?
can you post the code of the read function?
Sara
i wonder why you assume network closed when the read returns 0. it should give more reasons why a read would fail, doesn't it?
can you post the code of the read function?
Sara
ASKER
Here is the read function implementation,
ssize_t DcmTCPConnection::read(voi d *buf, size_t nbyte)
{
return ::read(getSocket(), (char *)buf, nbyte);
}
ssize_t DcmTCPConnection::read(voi
{
return ::read(getSocket(), (char *)buf, nbyte);
}
i assume your socket is non-blocking. because of using ::read and not ::recv the normal return is -1 (socket error) and errno EINTR. when using ::recv you would get bytesRead = -1 and errno = EWOULDBLOCK. in both cases you shouldn't get a return of 0 bytes (though tcp/ip runs asynchronous and a 0 return is not impossible) but i wouldn't handle it as network closed error. if the network really closed you soonly would get a normal error return -1.
Sara
Sara
> read returns 0 when connection is closed on the remote end (you guys agree?).
No, I don't agree.
It has to return -1 and set the errno to one of these:
ECONNRESET
A read was attempted on a socket and the connection was forcibly closed by its peer.
ENOTCONN
A read was attempted on a socket that is not connected.
No, I don't agree.
It has to return -1 and set the errno to one of these:
ECONNRESET
A read was attempted on a socket and the connection was forcibly closed by its peer.
ENOTCONN
A read was attempted on a socket that is not connected.
Prototype of read is:
ssize_t read(int fildes, void *buf, size_t nbyte);
if the value passed for nbyte is 0, then read() function will return errors if there are any.
If there are no errors, then read() function will return 0.
So, it will be helpful to debug, if you can print the value of 2nd argument you are passing to DcmTCPConnection::read().
ssize_t read(int fildes, void *buf, size_t nbyte);
if the value passed for nbyte is 0, then read() function will return errors if there are any.
If there are no errors, then read() function will return 0.
So, it will be helpful to debug, if you can print the value of 2nd argument you are passing to DcmTCPConnection::read().
This manpage gives the details of when read returns 0:
http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
ASKER
Thanks guys for the replies,
The nbyte value is hard coded to 6
I saw the following in the man page, these are the cases when read returns 0.
1 - When attempting to read from an empty pipe or FIFO if no process has the pipe open for writing, read() shall return 0 to indicate end-of-file .
--- maybe in this case the other end did not write anything on the socket?
2 - In addition, read() shall fail if the STREAM head had processed an asynchronous error before the call. In this case, the value of errno shall not reflect the result of read(), but reflect the prior error. If a hangup occurs on the STREAM being read, read() shall continue to operate normally until the STREAM head read queue is empty. Thereafter, it shall return 0.
--- This is what i meant by having a closed connection on the other end!
is there any other scenario that i missed?
The nbyte value is hard coded to 6
I saw the following in the man page, these are the cases when read returns 0.
1 - When attempting to read from an empty pipe or FIFO if no process has the pipe open for writing, read() shall return 0 to indicate end-of-file .
--- maybe in this case the other end did not write anything on the socket?
2 - In addition, read() shall fail if the STREAM head had processed an asynchronous error before the call. In this case, the value of errno shall not reflect the result of read(), but reflect the prior error. If a hangup occurs on the STREAM being read, read() shall continue to operate normally until the STREAM head read queue is empty. Thereafter, it shall return 0.
--- This is what i meant by having a closed connection on the other end!
is there any other scenario that i missed?
when a read was made on an open socket, in my opinion a return of 0 bytes read would happen when the tcp/ip doesn't signal 'interrupted system call' (EINTR) because the socket wasn't blocking anymore but still had not received the message yet.
i prefer recv before read where you would ignore a 0 bytes read like
the above would consider that tcp/ip could send a message in parts and would ignore 0 bytes reads.
Sara
i prefer recv before read where you would ignore a 0 bytes read like
ssize_t DcmTCPConnection::read(void *buf, size_t nbyte)
{
size_t nbyte_read = 0;
int nread = 0;
while (nbyte_read < nbyte &&
(nread = recv(getSocket(), (char *)buf + nbyte_read, nbyte-nbyte_read)) >= 0)
{
nbyte_read += nread;
}
if (nbyte_read > 0 && (nread >= 0 || (nread == -1 && errno == EWOULDBLOCK))
return nbyte_read;
return -1;
)
the above would consider that tcp/ip could send a message in parts and would ignore 0 bytes reads.
Sara
From your comments, looks like, it is related to SIGCHLD system call.
So, can you add this code:
signal(SIGCHLD,SIG_IGN);
before you create a socket and see if you still get the error?
And are you using fork() or any other system call to create a child process?