asked on

RHE Linux 3.0: Not getting EOF on TCP/IP socket connection after shutdown of machine closes it...

Hi:

I wrote a simple TCP/IP client/server program than currently runs successfully on both HPUX 11.0 and RH8.0 machines. However, I have a problem running it on a RHE 3.0 machine under certain conditions. Following is the scenario that causes my problem.

The Server program is executing on NODE A that is running OS RHE 3.0.
The Client program is executing on NODE B that is running OS RHE 3.0.
The Client program continuously sends text messages to the Server through a TCP/IP socket connection and the Server program receives these messages through a read() of the socket and prints out these messages. Pretty straight forward, right?
However, if the NODE B machine is shutdown, the tcp/ip connection between the Client and the Server programs will be lost. After a short period of time (less than 1 minute), the Server program running on NODE A should receive a zero on a call to the read() the socket function (indicating end-of-file [EOF]). On RedHat 8.0 machines, the Server receives the expected EOF, but on RedHat Enterprise 3.0 machines, the read() does not return a zero (indicating EOF). Instead, on every call to the read() it constantly returns a negative one (-1) indicating the resource is temporarily unavailable. My Server program is expecting this EOF so it may teminate successfully under these conditions.

Has anyone experienced something similar? If so, any solution?
What should I do in researching this problem further?
Preferably, why this happens and possible solution is what I am really looking for.

I am in the process of contacting RedHat to see if there are any known bugs in their kernel w.r.t. TCP/IP software.

BTW, you may ask why I am concerned about a problem in a simple piece of software. Well, I wrote the TCP/IP program as a test program to isolate and minic the behavior of a similar problem in a much larger piece of software.

Thanks in Advance for you help,

PGM.

jlevie

Have you examined the man page for read() (man 2 read)? It pretty clearly states "On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number." and "On error, -1 is returned, and errno is set appropriately."

In the case you describe I'd expect a -1, not an EOF since the other end of the connection has "gone away". and the read() can't successfully complete. What is in errno at that point?

iharpsoln

ASKER

Hi jlevie:

Thank you for your reply.

I have examined the man page on read() and I accept what it says. However, when a socket connection is established and there is nothing to read, it should return -1 and set errno to some value. In this situation, errno is set to value 11 ("Resource temporarily unavailable"), as stated in my original posting. I am expecting the same behavior for the Server program (RHE 3.0) when the machine that the Client is running on (also RHE 3.0) is shutdown to be the same as the behavior on the RH8.0 anf HPUX 11.0 platforms. That is, the read() return zero and errno set to zero (End-of_file, EOF). This seems to be the case for RH8.0 and HPUX 11.0, but unfortunately, I am not seeing it for RHE 3.0). Furthermore, I see nothing in the man pages that indicates that read() behavior should be any different between RH8.0 and RHE 3.0.

One thing I neglected to mention in my original posting is that the Server program is set up to performs reads in a NON_BLOCKING fashion.

jlevie

Hmm, it would seem to me that the RH 8.0 & HPUX 11.0 behavior would be wrong. EOF is usually taken to mean that all of the data to be expected has been read, not an error condition caused by the client "disappearing" in this case. Certainly in the case of files, as compared to a network socket, EOF means you've read all of the data in a file and an I/O error on the file should not return an EOF. Given that, the RHEL results of a read() on a broken network link should return a -1 and set errno would be consistant.

It's been a while, but I seem to remember that Solaris and Irix behave like RHEL 3.0. And I could be misinterpreting it, but my reading of the POSIX standard would imply that the RHEL behavior is correct.

iharpsoln

ASKER

Hi jlevie:

As a followup to your last comment, I would like to say that I find it difficult to accept the fact that the RH 8.0 & HPUX 11.0 behavior is wrong. Keep in mind that I wrote the simple TCP/IP Client/Server program to isolate and identify a problem I encountered in my API software. This API software has been running with this behavior for many years on a HPUX platform and for the last couple of years on RH7.1 and RH8.0 platforms. Now that the API software has been ported to RHEL 3.0 I have encounter a problem where the behavior is different which causes the API to function incorrectly.

So, the bottom line question is:
Why does a Server read() of a socket connection to a shutdown Client, return a zero and set errno to zero when running on HPUX and RH8.0 platforms AND does not do the same on a RHEL 3.0 machine?

One last thing....
I just want to stress the fact that the problem I am having only happens when the client's machine node is shutdown/rebooted. If we just terminate the Client process (e.g. kill -9), the programs behave the same across all platforms, i.e. the Server read() returns a zero (indicating EOF) and sets the errno to zero.

jlevie

While I can't yet cite an authoritative reference as to what POSIX would require in this situation I can offer the results of an informal poll. I posed the question:

"Given that read() returns the number of bytes read. 0 on End Of File, or -1 (and sets errno), what should happen when reading from a network socket with non-blocking I/O is the client connection is severed?"

The unanimous result from people heavily involved with network application developement on Solaris, Irix, & RedHat platforms was that read() should return a -1 if the client connection is lost for any reason (powerdown, reboot, cutting the wire, etc) since the server would not receive a FIN from the client. On the other hand a well behaved client OS will properly terminate the connection with a FIN if the client application is terminated, which would result in read() returning EOF.

Viewed in the context of the presence or absence of a FIN the behavior of RHEL would seem to be correct.

SOLUTION

aleric

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

aleric

Obviously I meant setsockopt(2), not getsockopt(2).

iharpsoln

ASKER

aleric:

In regards to your post, you mention a timeout may have something to do with it. There is no timeout in my original code; I was refering to the fact that the rebooting machine took less than 1 minute to shutdown/reboot.

Anyways, since my last posting, I have contacted RedHat and I am in the process of getting a complete and satisfactory answer (I hope). What RedHat has informed me of so far is the following:

The behavior of RHEL 3.0 continuously returning -1 and setting errno to 11 in relation to my problem scenario is correct and that the behavior of RH8.0 eventually returning 0 and setting errno to 0 is incorrect. They (RedHat) quoted the RHEL 3.0 documentation on the read() function in support of their claim.
I can accept what the documentation is saying, but it does not adequately address my original question that I have stated to RedHat and stated here in a previous reply to this topic.

In the light of new information, let me rephrase the question. In RHEL 3.0 the system read() CONTINUOUSLY returns a -1 and sets errno to 11 when the client machine at the other end of a TCP/IP connection is shutdown/rebooted (not kill -9 or Ctrl-C on the client). Then, HOW will the RHEL 3.0 read() ever detect that the other end of the connection is dead/lost, if it CONTINOUSLY return -1 and sets the errno to 11????

ASKER CERTIFIED SOLUTION

jlevie

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial