Why packet lossage kills speed

It is a well-known fact that packet lossage causes a huge impact on transmissions over long-distance links when using standard TCP. The reason for that is however not as well known, mostly because finding information about this specific problem is not necessarily simple and may be difficult to read.

This document is a description of what happens when packets are lost on a high-speed long-distance link (large TCP windows) where in some place in the network there will be a limit in speed, for example if the end-to-end speed is 1 Gbit/sec, but some link between the endpoints have less than this bandwidth available (say, only 900 Mbit/sec).

The amount of data sent and not yet acknowledged is controlled by two variables; the receiver window size (rcv_wnd) and the congestion window (cwnd). There is also a third important variable, the slow-start threshold (ssthresh) whish is used to tell the system whether slow-start or congestion avoidance should be used.

When a TCP transmission starts and the transmission speed ramps up (see Problems with bursts) the connection will end up with multiple packets being lost, and because of the large amount of packets sent out on the net (due to the large window size) there will be lots of duplicate ACKs returned. When dupacks starts to come back to the sender it will lower ssthresh and also retransmit the missing packet. Because of the large count of outstanding packets, many dupacks will be received, which will in turn increase cwnd and both retransmit the missing segment (Fast Retransmit) and inject new packets into the network (per Congestion Avoidance).
After a while more packets can't be sent (cwnd is full) and the retransmitted segments cannot cause three duplicate acks (which is needed for Fast Retransmit). This will generate a timeout, which will set cwnd to 1 segment.

After the timeout, the next missing segment will be retransmitted and because the receiver window has lots of "holes" in it some count of dupacks will be generated again. This will set ssthresh to a really low value and in fact the congestion avoidance algorithm will control how packets are injected into the network from now on.

Because of the small ssthresh value the sender will not do slow-start (which has an exponential backoff) but congestion avoidance, which have an additive increase of cwnd. The increase in cwnd is calculated as

cwnd = cwnd + (segsz*segsz/cwnd)

which, in our example with 200 ms RTT, would give an increase of cwnd with about 4 kbyte/s, which means that it from this point it would take about 1.5 hour(!) to reach full speed.