From: Tanaka Akira Date: 2010-07-06T15:32:25+09:00 Subject: [ruby-core:31068] Re: [Bug #3540] IO.copy_stream fails to detect client disconnect w/sendfile 2010/7/6 Eric Wong : > UNIX domain sockets are easy to do notification for since they're always > on the same host. TCP might be harder to detect (and thus the Linux > folks choose not to bother at all) because the client is on a different > machine and it might lose a physical connection. If the kernel cannot detect disconnect, how the kernel causes EPIPE? > How does FreeBSD or Solaris behave if a client is on a different machine > and has the network cable pulled out? In the case of physically > disconnected network cable, the client TCP stack has no way to notify > the server of a disconnect. "kill -9" or even normal OS shutdown would > give the TCP stack a chance to properly shutdown the connection. I don't say about such physical disconnection. I described about the situation that the kernel knows the connection is disconnected. The connection is disconnected by RST packet. The RST packet is generated by a normal packet is sent to closed port. % ruby -rsocket -e ' def netstat s = `netstat -n` s.each_line {|line| puts line if /State\s*$|127.0.0.1:8888/ =~ line } puts end serv = TCPServer.open("127.0.0.1", 8888) s1 = TCPSocket.open("127.0.0.1", 8888) s2 = serv.accept netstat s2.close netstat s1.write "a" rescue p $! netstat s1.write "a" rescue p $! p IO.select(nil, [s1], nil, 0) ' Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 127.0.0.1:8888 127.0.0.1:34516 ESTABLISHED tcp 0 0 127.0.0.1:34516 127.0.0.1:8888 ESTABLISHED Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 127.0.0.1:8888 127.0.0.1:34516 FIN_WAIT2 tcp 1 0 127.0.0.1:34516 127.0.0.1:8888 CLOSE_WAIT Proto Recv-Q Send-Q Local Address Foreign Address State # nil When first netstat call, the TCP states of s1 (the local address is 127.0.0.1:8888) and s2 (the local address is 127.0.0.1:34516) are ESTABLISHED. s2.close sends a FIN packet to s1. s1 receives it and send an ACK packet to s2. This changes s1 to FIN_WAIT_2 and s2 to CLOSE_WAIT. The first s1.write "a" sends a normal data packet to s2. Since the write system call doesn't wait the result of the packet, the system call itself succeeds. But s2 is CLOSE_WAIT and no data acceptable. So s2 sends back a RST packet to s1 and change state of s2 to CLOSED. Then s1 receives the RST packet. It changes the state of s1 to CLOSED. The second s1.write "a" fails with EPIPE. This is because the kernel knows s1 is CLOSED. Now the kernel knows write() for s1 doesn't block. (It causes an error immediately) So FreeBSD and Solaris notify it with select(). But Linux doesn't. I think it is a problem of Linux. -- Tanaka Akira