TCP Internals: 3-way Handshake and Sequence Numbers Explained
In this article, I will explain and show you what really happens during a TCP 3-way handshake as captured by tcpdump tool.
We'll go deeper into details of TCP 3-way handshake (SYN, SYN/ACK and ACK) and how Sequence Numbers and Acknowledgement Numbers actually work.
Moreover, I'll also briefly explain using real data how TCP Receive Window and Maximum Segment Size play an important role in TCP connection.
As a side note, I will not touch TCP SACK and TCP Timestamps this time as they should be covered in a future article about TCP retransmissions.
FYI, the TCP capture was generated by a simple HTTP GET request to BIG-IP to get hold of a file on /cgi-bin/ directory called script.pl using HTTP/1.1 protocol:
BIG-IP then responds with HTTP/1.1 200 OK with the requested data.
This is not very relevant as we'll be looking at TCP layer but it's good to understand the capture's context to fully understand what's going on.
This is what a TCP 3-way handshake looks like on Wireshark:
As we can see, the first 3 packets are exchanged less than 1 second apart from each other.
The IN/OUT portion of Info field on BIG-IP's capture tells us if the packet is coming IN or being sent OUT by BIG-IP (as capture was taken on BIG-IP).
As this is a slightly more in-depth explanation of TCP internals, I am assuming you know at least what a TCP 3-way handshake is conceptually.
The TCP SYN, SYN/ACK and ACK Segments
We can see that first packet is [SYN], second one is [SYN/ACK] and last one is [SYN/ACK] as displayed on Wireshark.
The Info section as a whole only shows the summary of the most relevant fields copied from the TCP header.
It is just enough to make us understand the context of the TCP segment.
Let's now have a look what these fields mean with the exception of SACK_PERM and TSval.
When we double click on the [SYN] packet below, we find the same information again in the actual TCP header:
The most important thing to understand here is that [SYN], [SYN/ACK] and [ACK] are all part of the Flags header above. They're just 1's and 0's.
When SYN flag is enabled (i.e its value is 1), the receiving end (in this case BIG-IP) should automatically understand that someone (my client PC in this case) is trying to establish a TCP connection.
The response from BIG-IP (SYN/ACK) is an acknowledgement to the SYN packet and therefore it has both SYN and ACK flags set to 1.
Client's last response is just an ACK as seen below:
As per RFC, both sides should now assume a TCP connection is established.
For plain-text HTTP/1.1 protocol, there should now be a GET request in another layer as a payload of (or encapsulated by) TCP layer.
If our traffic it is protected by TLS then TLS layer should come first as the payload of TCP layer and HTTP would be the payload of TLS layer.
Does it make sense?
That's how things work in the real world.
TCP Sequence numbers
A side note, Wireshark shows that our first SYN segment's Sequence number is 0 (Seq=0):
It also shows that it is relative sequence number but this is not the real TCP sequence number.
Wireshark automatically zeroes it for you to make it easier to visualise and/or troubleshoot.
In reality, the real sequence number is a much longer number that is calculated by your OS using current time and other random parameters for security purposes.
This is how we see the real sequence number in Wireshark:
Now back to business. Some people say if Client sends a TCP segment to BIG-IP, BIG-IP's ACK should be client's sequence number + 1 right? Wrong!
Instead of +1 it should be + number of bytes last received from peer or +1 if SYN or FIN segments.
To clarify, here's the full Flow Graph of our capture using relative sequence numbers to make it easier to grasp (.135 = Client and .143 = BIG-IP):
On 4th segment above (PSH, ACK - Len: 93), client sends TCP segment with Seq = 1 and TCP payload data length (comprised of HTTP layer) of 93 bytes.
In this case, BIG-IP's response is not ACK = 2 (1 + 1) as some might think.
Instead, BIG-IP responds with whatever client's last Sequence number was plus number of bytes last received.
As last sequence number was 1 and client also sent a TCP payload of 93 bytes, then ACK is 94!
This is the most important concept to grasp for understanding sequence numbers and ACKs.
SEQs and ACKs only increment when there is a TCP payload involved (by the number of bytes).
SYN, FIN or ZeroWindow segments count as 1 byte for SEQs/ACKs.
I added a full analysis using real TCP SEQs/ACKs to an Appendix section if you'd like to go deeper into it.
For the moment let's shift our attention towards TCP Receive Window.
TCP Receive Window and Maximum Segment Size (MSS)
During 3-way handshake, the Receive Window (Window size value on Wireshark) tells each side of the connection the maximum receiving buffer in bytes each side can handle:
So it's literally like this (read red lines first please):
[1] → Hey, BIG-IP! My receiving buffer size is 29200 bytes. That means, you can initially send me up to 29200 bytes before you even bother waiting for an ACK from me to send further data.
[2] → This should be the same as [1], unless Window Scale TCP Option is active. Window Scale should be the subject of a different article but I briefly touch it on [3].
[3] → Original TCP Window Size field is limited to 16 bits so maximum buffer size is just 65,535 bytes which is too little for today's speedy connections. This option extends the 16-bit window to 32-bit window but because BIG-IP did not advertise Window Scale option for this connection, it is disabled as both sides must support it for it to be used.
[4] → Hey, client! My receiving buffer size is 4380 bytes. That means, you can initially send me up to 4328 bytes before you even bother waiting for an ACK from me to send further data.
The reason why the word initially is underlined on [1] and [3] is because Window size typically changes during the connection.
For example, client's initial window size is 29200 bytes, right?
This means that if it receives 200 bytes from BIG-IP it should go down to 2900 bytes.
Easy, eh? But that's not what always happens in real life.
In fact, in our capture it's the opposite!
Bytes in flight column shows the data BIG-IP (*.143) is sending in bytes to our client (*.135) that has not yet been acknowledged.
I've added a column with Window Size value to make it easier to spot how variable this field is:
It is the OS TCP Flow control implementation that dictates the Receive Window size taking into account the current "health" of its TCP stack and of course your configuration.
Yes, in many cases, especially in the middle of a connection, the Window Size does decrease based on amount of data received/buffered so our first explanation also makes sense!
How does BIG-IP know that client has freed up it's buffer again?
As we can see above, when Client ACKs the receipt of BIG-IP's data, it also informs the size of its buffer in the Window Size value field.
That's how BIG-IP knows how much data it can send to Client before it receives another ACK.
What about the Maximum Segment Size?
Each side also displays a TCP Option - Maximum Segment size of 1460 bytes.
This informs the maximum size of the TCP payload each side can send at a time (per TCP segment).
Looking at the picture above, BIG-IP sent 334 bytes of TCP payload to client, right?
In theory, this could've been up to 1460 bytes as it's also within client's initial buffer of 29200 bytes.
So apart from informing each other about the maximum buffer, the maximum size of TCP segment is also informed.
TCP Len vs Bytes in Flight Column (BIF)
If we look at our last picture, we can see that whatever is in Len field matches what's in our BIF column, right?
Are they the same? No!
Len shows the current size of TCP payload (excluding the size of TCP header).
Remember that TCP payload in this case is the whole HTTP portion that our TCP segment is carrying.
Bytes in flight is not really part of TCP header but that's something Wireshark adds to make it easier for us to troubleshoot.
It just means the number of bytes sent that have not yet been acknowledged by receiver.
In our capture, data is acknowledged immediately so both Len and BIF are the same.
I've picked a different capture here where there are 3 TCP segments sent with no acknowledgement so BIF column increments for each unacknowledged data segment but goes back to zero as soon as an ACK is received by receiver:
Notice that BIF values now differ from TCP payload (the equivalent to Len in Info column).
That's it for now. The next article would be about TCP retransmission.
Appendix - Going in depth into TCP sequence numbers!
Here's a full explanation about what actually takes place on TCP layer from the point of view of BIG-IP:
Just follow along from [1] to [10]. That's it.
- IntothecloudAltocumulus
👌