Building a Transport Protocol on Top of UDP
This article explains the core networking concepts that make a reliable transport protocol work, using AeroUDP as a concrete reference implementation. AeroUDP is a reliable, ordered, congestion-controlled transport protocol layered on top of unreliable UDP, essentially a stripped-down hybrid of TCP and QUIC written in idiomatic asynchronous Rust.
Source code: https://github.com/Mo7ammedd/AeroUDP
Why Build a Protocol on Top of UDP?
The Transport Layer
The transport layer sits between the raw network (IP) and your application. It is responsible for turning a best-effort packet delivery service into something programs can actually rely on. The two classic choices are TCP and UDP.
TCP (Transmission Control Protocol):
- Reliable, ordered, connection-oriented byte stream
- Built-in congestion and flow control
- Implemented in the kernel, hard to modify or experiment with
UDP (User Datagram Protocol):
- Unreliable, unordered, connectionless datagrams
- No congestion control, no retransmission, no ordering
- A thin wrapper over IP, fast, minimal, and fully programmable in user space
The Middle Ground
Modern protocols like QUIC (which powers HTTP/3) are built on top of UDP in user space, re-implementing reliability and congestion control where they can evolve quickly without kernel changes.
AeroUDP takes the same approach for educational and experimental purposes. It uses UDP as a dumb packet pipe and rebuilds the guarantees people expect from TCP:
- Reliable, in-order, bidirectional delivery
- Detection and recovery from loss, duplication, reordering, and corruption
- Sliding-window flow control
- NewReno-style congestion control
- RTT-driven adaptive timeouts
- A full connection lifecycle (handshake, graceful close, reset, keepalive)
Packets and the Wire Format
What is a Packet Header?
Before any bytes travel across a network, they must be wrapped in a header, a fixed structure of metadata that tells the receiver how to interpret the payload. The header answers questions like: Which connection is this? What sequence number? What is being acknowledged? How much buffer space is available?
AeroUDP Header
Every AeroUDP datagram is exactly one UDP payload carrying one packet with a 28-byte header:
+---------------+---------------+---------------+---------------+
| magic=0xAE | version=1 | packet_type | flags |
+---------------+---------------+---------------+---------------+
| conn_id (u32) |
| seq_num (u32) |
| ack_num (u32) |
| window (u32) |
| payload_len (u16) | reserved (u16) |
| checksum (u32, CRC32 over header + payload) |
| payload ... |
+---------------------------------------------------------------+Key design choices:
- Magic byte + version: lets the receiver reject garbage or mismatched protocol versions immediately.
- Big-endian fields: "network byte order", the universal convention so machines with different native byte orders agree.
- 1200-byte max payload: chosen to fit inside common network MTUs (Maximum Transmission Unit) and avoid IP fragmentation.
Error Detection with Checksums
Networks corrupt bits. A checksum is a small value computed over the data so the receiver can detect corruption.
How AeroUDP does it:
- The checksum field is set to zero.
- CRC32 is computed over the entire packet (header + payload).
- The result is written back into the checksum field.
On receipt, the process is reversed. A mismatch means the packet is silently dropped, from the protocol's perspective it never arrived, and reliability mechanisms will recover it.
Why CRC32?
- Fast to compute in hardware and software
- Excellent at catching burst errors
- Not cryptographic, it detects accidental corruption, not malicious tampering
Reliability: Turning "Best Effort" into "Guaranteed"
UDP can lose, duplicate, or reorder packets. Reliability is the machinery that hides all of that from the application.
Sequence Numbers
Concept:
- Each byte-consuming packet gets a monotonically increasing sequence number.
- The receiver uses these to reassemble data in the correct order and to detect gaps (loss) or repeats (duplication).
In AeroUDP:
- Each side picks a random 32-bit Initial Sequence Number (ISN) at connection time. Randomizing the ISN prevents old packets from a previous connection from being mistaken for new ones.
- SYN, FIN, and DATA each consume exactly one sequence slot. Pure ACKs consume none.
- Comparisons use modular 32-bit arithmetic, so the space wraps around cleanly.
Cumulative Acknowledgements (ACKs)
Concept:
An ACK tells the sender "I received everything up to here." AeroUDP uses cumulative ACKs: the ack_num field means "the next in-order sequence number I expect."
Example:
Receiver has: 1, 2, 3, 4 (missing 5)
ack_num sent: 5 (expecting 5 next)
Later receives: 6, 7 (still missing 5, out of order)
ack_num stays: 5 (cannot advance past the gap)Advantages of cumulative ACKs:
- Simple and robust, a single number summarizes everything received
- A lost ACK is harmless if a later ACK gets through (it supersedes the old one)
Disadvantage:
- Cannot tell the sender which specific later packets arrived after a gap. That is what Selective ACK (SACK) solves, a feature AeroUDP intentionally omits for simplicity.
Automatic Repeat reQuest (ARQ)
The general strategy of "detect loss, then resend" is called ARQ. AeroUDP implements it with a retransmission queue:
How it works:
- Every DATA packet is stored in an in-flight queue, keyed by sequence number.
- A cumulative ACK evicts everything strictly below
ack_num, those packets are confirmed delivered. - If a packet is not acknowledged in time, it is resent (with a
RETXflag set for observability).
Example:
In-flight queue: {5, 6, 7, 8}
ACK arrives with ack_num = 7
→ evict 5 and 6 (confirmed)
→ 7 and 8 remain in flightRound-Trip Time and Retransmission Timeouts
The Core Problem
How long should a sender wait before deciding a packet was lost? Too short, and it retransmits packets that were merely slow, wasting bandwidth. Too long, and it stalls after real losses.
The answer is to measure the network and adapt. The Round-Trip Time (RTT) is how long a packet takes to be sent and acknowledged. The Retransmission Timeout (RTO) is derived from it.
RFC 6298 Estimation
AeroUDP follows the standard TCP algorithm from RFC 6298. It tracks a smoothed RTT (SRTT) and the RTT variation (RTTVAR):
First sample R:
SRTT = R
RTTVAR = R / 2
Each later sample R':
RTTVAR = (1 - 1/4) * RTTVAR + (1/4) * |SRTT - R'|
SRTT = (1 - 1/8) * SRTT + (1/8) * R'
RTO = clamp(SRTT + 4 * RTTVAR, min_rto, max_rto)Why include variance?
A network with steady 50 ms RTT and one with wildly swinging 20–200 ms RTT can have the same average. Adding 4 * RTTVAR makes the timeout generous when the network is jittery and tight when it is stable.
Karn's Algorithm
The ambiguity problem: if a packet is retransmitted and then an ACK arrives, was it acknowledging the original or the retransmission? You cannot tell, so the RTT sample is unreliable.
Karn's rule: never take an RTT sample from a retransmitted segment. AeroUDP enforces exactly this, only clean, first-try packets update SRTT.
Exponential Backoff
When an RTO fires (a real timeout), AeroUDP doubles the RTO (RTO *= 2) before retrying. This is exponential backoff, it prevents a sender from hammering an already-congested or broken network. On the next fresh RTT sample, the backoff resets.
Reference defaults:
initial_rto = 300 ms
min_rto / max_rto = 100 ms / 10 s
max_retries = 12 (then the connection is torn down)Flow Control: Don't Overwhelm the Receiver
The Problem
A fast sender talking to a slow receiver (or one whose application is slow to read) will overflow the receiver's buffer, forcing it to drop data. Flow control prevents this by letting the receiver throttle the sender.
Sliding Window
Concept:
The receiver advertises how much buffer space it currently has, the receive window (the window field in the header, measured in packets in AeroUDP). The sender must never have more unacknowledged data in flight than the window allows.
How it works:
Receiver advertises window = 8 packets
Sender may have at most 8 unacknowledged packets in flight.
As the application reads and drains the buffer, the window grows.
As the buffer fills, the window shrinks toward zero.
window = 0 → sender pauses entirely until space frees up.The window "slides" forward as ACKs confirm old data and new data is sent, hence sliding window.
Out-of-Order Buffering
Because UDP can reorder packets, the receiver may get packet 7 before packet 5. AeroUDP buffers out-of-order packets until the gap fills, then delivers everything to the application in strict order. The application never sees the reordering.
Congestion Control: Don't Overwhelm the Network
Flow control protects the receiver. Congestion control protects the network itself, the shared routers and links between the two endpoints. Without it, many senders can collectively cause congestion collapse, where the network is so overloaded that almost nothing gets through.
AeroUDP implements a NewReno-style controller, the classic TCP congestion algorithm. Its central variable is the congestion window (cwnd): the sender's own estimate of how much it may safely have in flight. The true limit is:
in_flight <= min(cwnd, peer_window)That is: respect both the network (cwnd) and the receiver (peer_window), whichever is smaller.
Phase 1: Slow Start
How it works:
- Start with a small
cwnd(default: 10 packets). - For every ACK that advances the cumulative ACK, grow
cwndby the number of segments acknowledged. - This doubles cwnd roughly every RTT, exponential growth.
- Exit when
cwnd >= ssthresh(the slow-start threshold).
Why "slow"? It starts small (slow) even though it grows fast. The name reflects the conservative starting point, not the growth rate.
cwnd: 10 → 20 → 40 → 80 ... (exponential, per RTT)Phase 2: Congestion Avoidance
How it works:
- Once
cwnd >= ssthresh, switch to cautious linear growth. cwndgrows by roughly 1 packet per RTT (additive increase).
cwnd: 64 → 65 → 66 → 67 ... (linear, per RTT)This is the AIMD principle, Additive Increase, Multiplicative Decrease, probing gently for more bandwidth while staying ready to back off hard.
Phase 3: Reacting to Loss
Loss is the signal that the network is congested. AeroUDP distinguishes two kinds of loss, and reacts to them very differently.
Fast Retransmit / Fast Recovery (mild loss)
Trigger: three duplicate ACKs. When the receiver gets out-of-order packets, it keeps re-sending the same ack_num. Three duplicates strongly suggest a single packet was lost while later ones arrived.
Reaction:
- Immediately retransmit the missing segment, do not wait for the RTO.
ssthresh = max(cwnd / 2, 2)cwnd = ssthresh(halve, don't reset)- Enter fast recovery; exit when a new cumulative ACK passes the recovery point.
This is "multiplicative decrease", a measured halving because the network is delivering some packets, so it isn't badly congested.
Timeout (severe loss)
Trigger: the RTO fires, no ACKs at all.
Reaction (much harsher):
ssthresh = max(cwnd / 2, 2)cwnd = 1, collapse the window- Return to slow start
A full timeout means the network may be severely congested or the path broken, so AeroUDP starts over from near-zero.
The Sawtooth
Put together, these phases produce TCP's characteristic sawtooth pattern, cwnd climbs, hits loss, halves, climbs again, continuously probing for the maximum safe rate:
cwnd
| /| /| /|
| / | / | / |
| / | / | / |
| / | / | /
| / | _/ | _/
|___/_____|/______|/________ time
(loss) (loss) (loss)Connection Lifecycle
A connection-oriented protocol has a well-defined birth, life, and death, modeled as a state machine. AeroUDP's states mirror TCP's.
The Three-Way Handshake
Before data flows, both sides must agree they are connected and synchronize sequence numbers.
Client Server
| |
| ---- SYN(seq=ISN_a) -------> | LISTEN → SYN_RECEIVED
| <- SYN_ACK(seq=ISN_b, |
| ack=ISN_a+1) -------- |
| ---- ACK(ack=ISN_b+1) -----> |
v v
ESTABLISHED ESTABLISHEDWhy three messages? Each side must prove it can both send and receive. The SYN proves the client can send; the SYN_ACK proves the server received and can send back; the final ACK proves the client received the server's message. Only then is two-way communication confirmed.
The handshake itself uses the same retransmission-with-backoff machinery, capped by handshake_timeout (5 s) and max_retries.
Graceful Close (FIN Exchange)
A connection is a two-way street, so each direction is closed independently ("half-close"):
Initiator Peer
| ------- FIN ---------> | → CLOSE_WAIT
FIN_WAIT_1 |
| <----- FIN_ACK ------ | (peer may keep sending)
FIN_WAIT_2 |
| <------- FIN -------- | LAST_ACK
TIME_WAIT |
| ------- ACK --------> | → CLOSED
|
(wait 2 s to absorb stragglers)
v
CLOSEDWhy TIME_WAIT? After the last ACK, the initiator waits (close_wait, default 2 s) before fully closing. This absorbs any delayed retransmissions of the peer's FIN so they don't leak into a future connection reusing the same ports.
Abort (RST)
If something goes catastrophically wrong, either side sends a RST (reset). The receiver immediately transitions to CLOSED and surfaces a PeerReset event, no graceful exchange, just an abrupt teardown.
Keepalive (PING / PONG)
To detect a peer that has silently vanished (crash, cable pull), AeroUDP sends keepalive probes:
- If nothing is received for
keepalive_interval(15 s), send aPING. - The peer replies with
PONG. - After
keepalive_probes(3) unanswered probes, declare the peer dead and close.
Observability: Seeing Inside the Protocol
A protocol you cannot measure is a protocol you cannot debug or tune. AeroUDP exposes a ConnectionMetrics snapshot backed by atomic counters:
Traffic:
- Packets and bytes sent/received in both directions
Reliability health:
- Retransmissions, fast retransmissions
- Duplicate ACKs, duplicate packets, out-of-order packets
- Checksum failures, timeouts
- Derived loss rate
Timing and control state:
- Mean RTT, smoothed RTT (SRTT), RTTVAR, current RTO
- Current cwnd, ssthresh, in-flight count
It also emits structured tracing events (aeroudp::state, aeroudp::engine, aeroudp::handshake, and more), so you can watch the state machine and congestion controller make decisions in real time.
Testing Against a Hostile Network
Real networks misbehave rarely and unpredictably, which makes bugs hard to reproduce. AeroUDP ships a configurable network simulator that deliberately drops, reorders, delays, and jitters packets:
aeroudp-analyzer proxy \
--listen 127.0.0.1:9500 \
--upstream 127.0.0.1:9000 \
--loss 0.05 --reorder 0.03 \
--min-latency-ms 20 --max-latency-ms 60 --jitter-ms 10Pointing a client at this proxy forces every reliability and congestion mechanism to actually work, turning "it passes on localhost" into "it survives 5% loss with reordering and jitter."
Putting It All Together
A single send() call in AeroUDP quietly exercises every concept in this article:
- The payload is wrapped in a 28-byte header with a sequence number and CRC32 checksum.
- The sender checks flow control (peer window) and congestion control (cwnd), it only transmits if
in_flight < min(cwnd, peer_window). - The packet is stored in the retransmission queue and sent over UDP.
- If it arrives cleanly, the receiver buffers or delivers in order and returns a cumulative ACK, advancing its window.
- The ACK confirms the packet, updates the RTT/RTO estimate, and grows cwnd (slow start or congestion avoidance).
- If it is lost, either three duplicate ACKs (fast retransmit) or an RTO timeout triggers retransmission and shrinks cwnd.
None of this is visible to the application, which simply sees a reliable, ordered byte stream, exactly the illusion a transport protocol exists to provide.
Further Reading
-
Standards:
-
Books:
- "TCP/IP Illustrated, Volume 1" by W. Richard Stevens
- "Computer Networking: A Top-Down Approach" by Kurose and Ross
-
Source Code: