S3 Traffic Optimization with F5 BIG-IP & NetApp StorageGRID
The S3 protocol is seeing tremendous growth, often in projects involving AI where storage is a critical component, ranging from model training to inference with RAG. Model training projects usually need large, readily available datasets to keep GPUs operating efficiently. Additionally, storing sizable checkpoints—detailed snapshots of the model—is essential for these tasks. These are vital for resuming training after interruptions that could otherwise imperil weeks or months-long projects.
Regardless of the AI ecosystem, S3 increasingly comes into play in some manner. For example, it is common for one tier of the storage involved, perhaps an outer tier, to be tasked with network loads to accrue knowledge sources. This data acquisition role is normally achieved today using S3 protocol transactions.
Why does a F5 BIG-IP, an industry-leading ADC solution, work so well for optimizing S3 flows that are directed at a S3 storage solution such as StorageGRID? An interesting aspect of S3 is just how easily extensible it is, to a degree other protocols may not be. Take for example, routing protocols, like OSPF or BGP-4; are governed by RFC’s controlled and published by the IETF (Internet Engineering Task Force). Unwavering compliance to RFC specifications is often non-negotiable with customers. Similarly, storage protocols are often governed by SNIA (Storage Networking Industry Association) and extensibility may not have been top of mind when standards were initially released.
S3, as opposed to these examples, is proactively steered by Amazon. In fact, S3 (Simple Storage Service) was one of the earliest AWS services, launched in 2006. When referring to “S3” today, many times, the reference is normally to the set of S3 API commands that the industry has adopted in general, not specifically the storage service of AWS.
Amazon’s documentation for S3, the starting point is found here, is easily digested, not clinical sets of clauses and archaic “must” or “should” directives. An entire category of user defined metadata is made available, and encourages unbounded extensibility:
“User-defined metadata is metadata that you can choose to set at the time that you upload an object. This user-defined metadata is a set of name-value pairs”
Why S3’s Flexibility Amplifies the Possibilities of BIG-IP and StorageGRID
The extensibility baked into S3 is tailor made for the sophisticated and unique capabilities of BIG-IP. Specifically, BIG-IP ships with what has become an industry standard in Data Plane programmability: F5 iRules. For years, iRules have let IT administrators define specific, real-time actions based on the content and characteristics of network traffic. This enables them to do advanced tasks like content steering, protocol manipulation, customized persistence rules, and security policy enforcement.
StorageGRID from NetApp allows a site of clustered storage nodes to use S3 protocol to both ingest content and deliver requested objects. Automatic backend synchronization allows any node to be offered up as a target by a server load balancer like BIG-IP. This allows overall storage node utilization to be optimized across the node set and scaled performance to reach the highest S3 API bandwidth levels, all while offering high availability to S3 API consumers.
Should any node go off-line, or be taken out of service for maintenance, sophisticated health checks from BIG-IP will remove that node from the pool used by F5, and customer S3 traffic will flow unimpeded. A previous article available here dove into the setup details and value delivered out of the box by BIG-IP for S3 clusters, such as NetApp StorageGRID.
Sample BIG-IP and StorageGRID Configuration – S3 Storage QoS Steering
One potential use case for BIG-IP is to expedite S3 writes such that the initial transaction is directed to the best storage node. Consider a workload that requires perhaps the latest in SSD technology, reflected by the lowest latency and highest IOPS. This will drive the S3 write (or read) towards the fastest S3 transactional completion time possible. A user can add this nuance, the need for this differentiated service level, easily as S3 metadata. The BIG-IP will make this custom field actionable, directing traffic to the backend storage node type required. As per normal StorageGRID behavior, any node can handle subsequent reads due to backend synchronization that occurs post-S3 write.
Here is a sample scenario where storage nodes have been grouped into QoS levels, gold, silver and bronze to reflect the performance of the media in use.
To demonstrate this use case, a lab environment was configured with three pools. This simulated a production solution where each pool could have differing hardware vintages and performance characteristics.
To understand the use of S3 metadata fields, one may look at a simple packet trace of a S3 download (GET) directed towards an StorageGRID solution. Out of the box, a few header fields will indicate that the traffic is S3.
By disabling TLS, a packet analyzer on the client, Wireshark, can display the User-Agent field value as highlighted above. Often S3 traffic will originate with specific S3 graphical utilities like S3 Browser, Cyberduck, or FileZilla. The indicator of S3 is also found with the presence of common HTTP X-headers, in the example we observe x-amz-content-sha256, and x-amz-date highlighted.
The guidance for adding one’s own headers is to preface the new headers with “x-amz-meta-“. In this exercise, we have chosen to include the header “x-amz-meta-object-storage-qos” as a header and have BIG-IP search out the values gold, silver, and bronze to select the appropriate storage node pool. Traffic without this header, including S3 control plane actions, will see traffic proxied by default to the bronze pool.
To exercise the lab setup, we will use the “s3cmd”, a free command-line utility available for Linux and macOS, with about 60 command line options; s3cmd easily allows for header inclusions.
Here is our example, which will move a 1-megabyte file using S3 into a StorageGRID hosted bucket on the “Silver” pool:
s3cmd put large1megabytefile.txt s3://mybucket001/large1megabytefile.txt --host=10.150.92.75:443 --no-check-certificate --add-header="x-amz-meta-object-storage-qos:silver"
BIG-IP Setup and Validation of QoS Steering
The setup of a virtual server on BIG-IP is straightforward in our case. S3 traffic will be accepted on TCP port 443 of the server address, 10.150.92.75, and independent TLS sessions will face both the client and the backend storage nodes. As seen in the following BIG-IP screenshot, three pools have been defined.
An iRule can be written from scratch or, alternatively, can easily be created using AI. The F5 AI Assistant, a chat interface found within the Distributed Cloud console, is extensively trained on iRules. It was used to create the following (double-click image to enlarge).
Upon issuing the client’s s3cmd command to upload the 1-megabyte file with “silver” QoS requirement, we immediately observe the following traffic to the pools, having just zeroed all counters.
Converting the proxied 8.4 million bits to megabytes confirms that the 1,024 KB file indeed was sent to the correct pool. Other S3 control plane interactions make up the small amount of traffic proxied to the bronze, spinning disks pool.
BIG-IP and XML Tag Monitoring for Security Purposes
Beyond HTTP header metadata, S3 user consoles can be harnessed to apply content tags to StorageGRID buckets and objects within buckets. In the following simple example, a bucket “bucketaugust002” has been tied to various XML user-defined tags, such as “corporate-viewer-permissions-level: restricted”.
When a S3 client lists objects within a bucket tagged as restricted, it may be prudent to log this access. One approach would be iRules, but it’s not the optimal path, as the entirety of the HTTP response payload would need to be scoured for bucket listings. iRules could then apply REGEX scanning for “restricted”, as one simple example.
A better approach is to use the Data Guard feature of the BIG-IP Advanced WAF module. Data Guard is built into the TMM (Traffic Management Microkernel) of BIG-IP at great optimization, whereas the TCL engine of iRules is not. Thus, iRules is fine when the exact offset of a pattern in known, and really good with request and response header manipulation, but for launching scans throughout payloads Data Guard is even better.
Within AWAF, one need simply build a policy and under the “Advanced” option put in Data Guard strings of interest in a REGEX format, in our example (?:^|\W)restricted(?:$|\W), as seen below.
Although the screenshot indicates “blocking” is the enforcement mode, in our use case, alerting was only sought. As a S3Browser user explores the contents of a bucket flagged as “restricted” the following log is seen by an authorized BIG-IP administrator.
Other use cases for Data Guard and S3 would include scanning of textual data objects being retrieved, specifically for the presence of fields that might be classified as PII, such as credit card numbers, US social security numbers, or any other custom value an organization is interested in. A multitude of online resources provide REGEX expressions matching the presence of any of a variety of potentially sensitive information mandated through international standards. Occurrences within retrieved data, such as data formats matching drivers’ licenses, national health care values, and passport IDs are all quickly keyed in on. The ability of Data Guard to obfuscate sensitive information, beyond blocking or alerting, is a core reason to use BIG-IP in line with your data.
BIG-IP Advanced Traffic Management - Preserve Customer Performance
The BIG-IP offers a myriad of safeguards and enforcement options around the issue of high-rate traffic. The ways include any one traffic source maliciously or inadvertently overwhelming the StorageGRID cluster with unreasonable loads. The possible protections are simply touched upon in this section, starting with some simple features, enabled per virtual server, with a few clicks.
A BIG-IP deployment can be as simple as one virtual server directing all traffic to a backend cluster of StorageGRID nodes. However, just as easily, it could be another extreme: a virtual server entry reserved for one company, one department, even one specific high value client machine. In the above screenshot, one sees a few options. A virtual server may have a cap set on how many connections are permitted, normally TCP port 443-based for S3. This is a primitive but effective safeguard against denial-of-service attacks that attempt to overwhelm with bulk connection loads.
As seen, an eviction policy can be enabled that closes connections that exhibit negative characteristics, like clients who become unresponsive. Lastly, it is seen that connection rate safeguards are also available and can be applied to consider source IP addresses. This can counteract singular, rogue clients in larger deployments.
Another set of advanced features serve to protect the StorageGRID infrastructure in terms of clamping down on the total traffic that the BIG-IP will forward to a cluster. These features are interesting as many can apply not just at the per virtual server level, but some controls can be applied to the entirety of traffic being sent on particular VLANs or network interfaces that might interconnect to backend pools.
BIG-IP’s advanced features include the aptly named Bandwidth Controllers, and are made up of two varieties: static and dynamic. Static is likely the correct choice for limiting the total bandwidth a given virtual server can direct at the backend pool of Storage Nodes. The setup is trivial, just visit the “Acceleration” tab of BIG-IP and create a static bandwidth controller of the magnitude desired.
At this point, nothing further is required beyond simply choosing the policy from the virtual server advanced configuration screen.
For even more agile behavior, one might choose to use a dynamic bandwidth controller. As seen below, the options now extend into individual user traffic-level metering. With ancillary BIG-IP modules like Access Policy Manager (APM) traffic can be tied back to individual authenticated users by integration with common Identity Providers (IdP) solutions like Microsoft Active Directory (AD) servers or using the SSO framework standard SAML 2.0, others exist as well. However, with LTM by itself, we can consider users as sessions consisting of source IP and source (ephemeral) TCP port pairs and can apply dynamic bandwidth controls upon this definition of a user.
The following provides an example of the degree to which traffic management can be imposed by BIG-IP with dynamic controllers, including per user bandwidth.
The resulting dynamic bandwidth solution can be tied to any virtual server with the following few lines of a simple iRule:
when CLIENT_ACCEPTED { set mycookie [IP::remote_addr]:[TCP:: remote_port]
BWC::policy attach No_user_to_exceed_5Mbps $mycookie}
Summary
The NetApp StorageGRID multi-node S3 compatible object storage solution fits well with a high-performance server load balancer, thus making the F5 BIG-IP a good fit. There are various features within the BIG-IP toolset that can open up a broad set of StorageGRID’s practical use cases. As documented, the ability exists for a BIG-IP iRule to isolate in upon any HTTP-level S3 header, including customized user meta-data fields. This S3 metadata now becomes actionable – the resulting options are bounded only by what can be imagined. One potential option, selecting a specific pool of Storage Nodes based upon a signaled QoS value was successfully tested.
Other use cases include using the advanced WAF capabilities of BIG-IP, such as the Data Guard feature, analyzing real-time response content and alerting upon xml tags or potentially obfuscating or entirely blocking transfers with sensitive data fields.
A rich set of traffic safeguards were also discussed, including per virtual server connection limits and advanced bandwidth controls. When combined with optimizations discussed in other articles, such as the OneConnect profile that can drastically suppress the number of concurrent TCP sessions handled by StorageGrid nodes, one quickly sees that the performance and security improvements attainable make the suggested architecture compelling.