Do you ever look at specification sheets for technology products, and think to yourself, "hmm... these numbers are bigger than those numbers. That's good, right?" This is a reasonable response in the absesnce of precise information. However, an incomplete understanding often results in applying the wrong solution to a problem. Consider that a peregrine falcon is faster than a person on a bicycle, but a person on a bicycle can perform far more work per trip due to an exponentially larger carrying capacity. The falcon would be great for carrying a single message on a slip of paper. It'd be lousy for carrying a stack of books.
Selecting the device to use, be it falcon or bicycle, requires at least two metrics. We need to understand both the expected velocity and the carrying capacity. Units of networking equipment, be they virtual or physical, typically have published specifications that one can use to compare performance between solutions. My goal is not to suggest one solution or another. Instead, my goal is to give you the ability to understand what the metrics truly mean.
The OSI model provides a logical framework for discussing network concepts. This table describes what you can find at each layer:
|5||Session||TLS (strangely debatable)|
|4||Transport||TCP / UDP|
|3||Network||IPv4 / IPv6|
|2||Data Link||Ethernet frames|
This gives us a common framework for explaining things like "What is the difference between connections per second (CPS) and transactions per second (TPS)?" or "What does infinite encryption session reuse mean?" The key point is that, in the general case, every layer can support multiple interactions of the layer above it. Consider that I can consume multiple HTTP objects over a single TCP connection when using HTTP/1.1. It is possible to request a large HTTP object by using multiple TCP connections, but the way that actually happens is through sending multiple HTTP requests, each with its own TCP connection, specifying different parts of the same object.
Now that we have established a common framework for understanding the intent behind the different metrics, I will define some of the common types.
|CPS||connections per second||The number of L4 connections established per second.|
|TPS||transactions per second||The number of L5-L7 transactions completed per second.|
|RPS||requests per second||The number of L5-L7 requests satisfied per second, often interchangeable with TPS.|
|RPC||requests per connection||The number of L5-L7 requests completed for a single L4 connection. Example: HTTP/1.1 with 100 transactions per TCP connection would be 100-RPC.|
|bits per second||The network bandwidth in bits per second. Some people use GB/s, Gb/s and Gbps interchangeably, but the unit is decidedly important. 1 GB/s (gigabyte per second) is 8 Gbps (gigabits per second) See: data-rate units. Things get a bit pedantic with GiB/s vs GB/s, but 8:1 bytes:bits is correct when working from the same base unit.|
|fps||frames per second||A measure of L2 frames per second, often used for core switching or firewalls. The distinction is that not all network frames contain valid data packets or datagrams.|
|pps||packets per second||A measure of the L3/L4 packets per second, typically for TCP and UDP traffic.|
|CC||concurrent connections||The number of concurrently established L4 connections. Example: if you have 100 CPS, and each connection has a lifetime of 100 s, then the resultant CC is 10,000 (connection rate * connection lifetime).|
As you might expect, there are additional statistics in the wild. The objective is to take a specification sheet label and number, like "800 angry orchids per second", and decompose it. "Where do angry orchids fit in the OSI model? If angry orchids are an L7 concept, how many angry orchids are served through a single TCP session?" This method will allow you to understand any specification sheet metric that is constructed from logical data. More importantly, it will equip you to question metrics that make no sense.
You might think this: "Okay, so this specification sheet for a product says that it can do foo CPS and bar Gbps. Surely it can perform those feats at the same time." You would be wrong. A peregrine falcon cannot dive at maximum velocity while carrying a book - the book has a lower terminal velocity in atmosphere. Specification sheet numbers for things in the networking world are generated by a test designed to exercise a single function.
Consider a TCP connections per second (CPS) test. The harness will be configured such that a TCP handshake is established, a client will send a single request for an object, a server will reply with an object small enough to fit within a single TCP packet, and then the TCP connection is torn down gracefully. The maximum sustained CPS number typically coincides with the maximum CPU capacity for a physical or virtual device.
Now consider a maximum throughput test. The harness will be configured such that a TCP handshake is established, a client will attempt to reuse this TCP connection with HTTP/1.1 to request an object, a server will respond with a large object (typically >= 128 kB), and the TCP connection will not be torn down until multiple HTTP transactions have occurred. If there are no limitations in physical connectivity, and no hardware assistance, then the maximum sustained throughput typically coincides with the maximum CPU capacity for a physical or virtual device.
If both of those tests require all of the available CPU resources, then it is generally impossible to satisfy both conditions at the same time. Therefore, if your workload involves any application level inspection or decisions, then your available CPU resources for L4 connection handling will also be reduced.
This table describes the test process for some of the core metrics. The labels and meanings are repeated from the previous table.
|CPS||connections per second||Determine the maximum number of fully functional (establishment, single small transaction, graceful termination) connections that can be established per second. Each transaction is a small object such that it will not result in a full size network frame.|
|TPS / RPS||transactions per second / requests per second||Determine the maximum number of complete transactions that can be completed over a small number of long-lived connections. Each transaction requests a small object such that it will not result in a full size network frame.|
|bps||bits per second||Determine the maximum sustained throughput for a workload using long-lived connections and large (128 kB or higher) object sizes. This is often expressed for L4 and L7, and the numbers are typically different.|
Note that ADC industry metrics use a "single count" method for measuring throughput, such that traffic between the client and server is counted, but not the traffic between the client and ADC or between the ADC and the server.
|fps||frames per second||Determine the maximum rate at which network frames can be inspected, routed, or mitigated without unintended drops. This is particularly useful when comparing network defense performance, such as TCP SYN flood mitigation.|
|CC||concurrent connections||Determine the maximum number of concurrent connections that can be established without forcefully terminating any of the connections. This is typically tested by establishing millions of connections over a long period of time, and leaving them in an established state. The test is valid only if the connections can successfully complete one transaction before a graceful termination. Stability is achieved when the new connection rate matches the connection termination rate due to old connections being satisfied.|
Example: 10,000 CPS with a 300 second wait time and 1-RPC for a 128 byte object will result in a moving plateau of 3.0M CC. The test should run for a minimum of 300 seconds after the initial connections have been established. These tests take a long time, as there are three phases - establishment (300 seconds), wait and transaction (300 seconds), graceful shutdown (300 seconds).
I used to have the role of customer-facing performance testing. This means that I would use problem descriptions, sample configurations, and other details and constraints from a sales team to create an approximate test harness in the lab. There were two main goals: help the customer to understand how the device will perform with their given workloads, and help the customer and sales team understand what capacity is required for current and future traffic.
However, performance is more dependent upon the actual traffic characteristics than the finer points of tuning a configuration. My questions often focused on what the traffic looks like in the target network environment. Anybody that replied with "it's just IMIX" was wrong; nobody has production traffic that looks like IMIX.
Defining the device configuration is relatively simple. A customer would have some defined deployment scenarios, like "client-ssl with HTTP inspection and compression for IPv6 traffic" or "NAT44 with policy enforcement and logging." The sales engineering talent would work with the customer engineering talent to build an acceptable configuration. This defined the harness layout and traffic types.
However, real traffic has many applications and protocols running concurrently. Clients and servers may have intermittent packet loss or other issues due to over-subscribed resources. Some users have a three-digit number of mostly idle connections. Some users have two video streams incoming and one outgoing. And so on in that fashion. The real mix of network traffic varies wildly between services, and even between points of presence (PoP) within a single service. Real data is required.
Your existing network infrastructure is a valuable source of information. Statistical data from your switches and routers can tell you interesting things for every network segment in your environment:
Your firewalls, ADCs, and other advanced infrastructure can provide information regarding client behavior, transport protocols, and applications:
The trick is collecting all of this data, then asking meaningful questions against it. The full discussion of what you can learn goes far beyond the scope of this article. However, here are a handful of common traffic questions and suggestions for how to determine the answers.
A: Concurrency is the product of the average connection lifetime and the average new connection rate.
Example: We have determined that a particular application receives 35,000 new connections per second (CPS). We also determined that the average connection lifetime for this application is 100 seconds. The calculation for typical concurrency is 35,000 TCP conn/s x 100 s. The units cancel out to give us 3.5M TCP connections.
A: This question is strangely common, and typically pointless. UDP is a stateless protocol. If required, source / destination hashing or something in the payload can be used to ensure that the same client goes to the same back end resources.
A: This can be determined by looking at the average connection lifetime and average bytes in / out for a typical session. The method is to take the total bytes for an average transaction and divide it by the average connection lifetime. This gives you the amount of data per second. Then you divide the data rate capacity by that number.
Example: for a typical user session, our application receives 100 kB and transmits 200 kB. The average session length is 10 seconds. This means that, for a typical session, we are receiving 10 kB/s and transmitting 20 kB/s. We convert this to 80 kbps and 160 kbps (bytes to bits). We use the larger value, in this case transmission back to the client, to determine the maximum session capacity.
1.0 Gbps (capacity) divided by 160 kbps (per-session requirement) gives a value of 6,250 sessions.
A: Absolutely not! Network traffic, much like commuter traffic, tends to have a daily pattern. You should understand your traffic mix and application behavior for at least two points: peak traffic, average traffic. I recommend automating the analysis, and running it hourly for a week. You will find unexpected trends. If you have historical data, analyze that too so you can project growth rates.
Consider how your personal internet habits change throughout the day. At work you have long-lived connections to email and other internal services, probably a music streaming service, maybe an instant message service or three, and a lot of open browser tabs. What about when you get home? Media streaming, maybe internet gaming, possibly a VoIP session, and a lot of open browser tabs. Which usage pattern is more correct for capacity planning? You need data to understand which usage pattern is more resource intensive.
I hope that this information allows you to make precise decisions around resource planning and resource allocation. The goal is that your capacity planning should cease to be driven solely by CPU load thresholds for auto-scaling. Instead, you should be actively considering the resource allocation of every component of an application and the supporting infrastructure. Use the knowledge above to explain precisely where resources are misapplied and how to correct it. Go make the world a little more efficient.