Networking 101: TCP In More Depth
Last week's introduction to TCP promised that this article would enlighten, entertain and obviate all other documentation. Well the last one isn't quite possible in this much space, but let's go ahead and take a look at TCP operational issues, now that we know a little about what TCP actually is.
We said that TCP gets "connected" before any data can be sent. To make that work, the side that initiates a TCP connection will send a SYN (remember the Flags field) packet first. This is simply a packet with no data, and the SYN flag turned on. If the other side wants to talk on the port it received the SYN on, it will send back a SYN+ACK: SYN and ACK fields set, and the ACK number set to acknowledge the first packet. Then, to verify the receipt of the SYN+ACK, the sender will send one final ACK. The SYN, SYN+ACK, ACK sequence is called the three-way handshake. After that happens, the connection is established. The connection will remain active unless it times out or until either side sends a FIN.
Closing a TCP connection can be done from either side, and requires that both sides send a FIN to close their channel of communication. One side can close before the other, or they can both happen at the same time. So, when one side sends a FIN, the other sends FIN+ACK, to start the close of its side, and to ACK the first FIN. The person who sent the first FIN will then FIN+ACK the second FIN, and the other person knows that the connection is closed. There is no way for the person who sent the first FIN to get an ACK back for that last ACK. You might want to reread that now. The person that initially closed the connection enters the TIME_WAIT state; in case the other person didn't really get the ACK and thinks the connection is still open. Typically, this lasts one to two minutes.
And we've come to our first problem. If someone, say an attacker, leaves half-open or half-closed connections on your Web server, this could be bad news. Memory is used up with each connection, and opening thousands of bogus TCP connections could bring a server to its knees. Of course, you can't really adjust the TCP timers without effecting the proper operation of TCP. If you've ever heard of a TCP SYN attack, this is what it means. To prevent this, most operating systems opt to limit the number of half-open connections, for example in Linux it's normally 256 by default.
Now, since we promised to talk about the everlasting flow control problems, let's get into windowing. TCP uses "positive ACK with retransmission" to guarantee reliability. The sender will wait a certain amount of time, and if it doesn't get back an ACK for the packet it sent, it retransmits it. There are a bazillion timers in TCP, by the way, this is just another one. The concept of ACKs is important to flow control, because the TCP sliding window protocol makes the ping-pong nature of ACKs efficient. If TCP were to send a packet and wait for every ACK, it would essentially cut the throughput in half.
Ideally, we can send many packets at once, and then get back an ACK for all of them, probably piggybacked on more data from the other side. But how do we know how much to send? Well, the TCP window size controls how many packets can be held in the "sent but not ACKed" state. If the window is large, we can send large amounts of packets without waiting for an ACK. On the surface, this doesn't look like flow control, but it certainly is.
The receiving side is the one that controls the window size. If it says zero, then the sender cannot send any more data at all. If the window size is one, then we're back to the simple "send and wait for ACK" protocol. If the last window size was zero, the sender will send a probe to figure out when the window is open again. If the sender never gets an ACK, it just keeps trying until, you guessed it, a timer expires. Remember, the window size is a 16-bit field in the TCP header. And if you want a window size (in bytes) larger than 16-bits will accommodate, there's also a TCP option "window scale" that allows it to be multiplied by the scale factor. Without an extremely large window size, TCP has no hope of filling a gigabit link. You should now be better prepared to understand the gigabit tuning article, too.
On the subject of TCP flow control, we can't neglect mention of the Nagle algorithm. What would happen if you had a large TCP window over a telnet connection? You'd type a command, then wait and wait and wait for a response. This is a major problem for real-time applications. Furthermore, telnet can add to congestion, since a 1-byte packet will include 40 bytes of header. RFC 896 defines the Nagle algorithm to attempt to abolish tiny packets. The idea is that we should give data a chance to pile up before sending, to be more efficient. It says that we can only have one unacked small segment, and you can't send more data until you get an ACK. Telnet and interactive ssh connections turn this off with the TCP_NODELAY socket option, so that you can get an immediate response when you press a key.
Of course, we've neglected so many things about TCP. With the understanding from these two articles, however, you should be prepared to understand other literature that assumes you know TCP already. Congestion control, which is different from flow control, wasn't covered here. You may want to read the TCP RFC if you're truly interested in knowing how it all works, in excruciating detail.