ai
85 TopicsScality RING and F5 BIG-IP: High-Performance S3 Object Storage
The load balancing of F5 BIG-IP, both locally within a site as well as for global traffic steering to an optimal site around large geographies, works effectively with Scality RING, a modern and massively scalable object storage solution. The RING architecture takes an innovative “bring-your-own Linux” approach to turning highly performant servers, equipped with ample disks, into a resilient, durable storage solution. The BIG-IP can scale in lock step with offered S3 access loads, for use cases like AI data delivery for model training as an example, avoiding any single RING node from being a hot spot, with pioneering load balancing algorithms like “Least Connections” or “Fastest”, to name just a couple. From a global server load balancing perspective, BIG-IP DNS can apply similar advanced logic, for instance, steering S3 traffic to the optimal RING site, taking into consideration the geographic locale of the traffic source or leveraging on-going latency measurements from these traffic source sites. Scality RING – High Capacity and Durability for Today’s Object Storage The Scality solution is well known for the ability to grow the capacity of an enterprise’s storage needs with agility; simply license the usable storage needed today and upgrade on an as-needed basis as business warrants. RING supports both object and file storage; however, the focus of this investigation is object. Industry drivers of object storage growth include its prevalence in AI model training, specifically for content accrual, which will in-turn feed GPUs, as well as data lakehouse implementations. There is an extremely long-tailed distribution of other use cases, such as video clip retention in the media and entertainment industry, medical imaging repositories, updates to traditional uses like NAS offload to S3 and the evolution of enterprise storage backups. At the very minimum, a 3-node site, with 200 TB of storage, serves as a starting point for a RING implementation. The underlying servers typically run RHEL 9 or Rocky Linux, running upon x86 or AMD architectures, and a representative server would offer disk bays, front or back, with loaded disks totally anywhere from 10 to dozens of disk units. Generally, S3 objects are stored upon spinning hard disk drives (HDD) while the corresponding metadata warrants inclusion of a subset of flash drives in a typical Scality deployment. A representative diagram of BIG-IP in support of a single RING site would be as follows. One of the known attributes of a well-engineered RING solution is 100 percent data availability. In industry terms, this is an RPO (recovery point objective) of zero, meaning that no data is lost between the moment a failure occurs and the moment the system is restored to its last known good state. This is achieved through means like multiple nodes, multiple disks, and often multiple sites. Included is the combination of replication for small objects, such as retaining 2 or 3 copies of objects smaller than 60 kilobytes and erasure coding (EC) for larger objects. Erasure coding is a nuanced topic within the storage industry. Scality uses a sophisticated take on Erasure coding known as ARC (Advanced Resiliency Coding). In alignment with availability, is the durability of data that can be achieved through RING. This is to say, how “intact” can I believe my data at rest is? The Scality solution is a fourteen 9’s solution, exceeding most other advertised values, including that of AWS. What 9’s correspond to in terms of downtime in a single year can be found here, although it is telling that Wikipedia, as of early 2026, does not even provide calculations beyond twelve 9’s. Finally, in keeping with sound information lifecycle management (ILM), the Scality site may offer an additional server running XDM (eXtended Data Management) to act as a bridge between on-premises RING and public clouds such as AWS and Azure. This allows a tiering approach, where older, “cold” data is moved off-site. Archiving to-tape solutions are also available options. Scality – Quick Overview of Data at Rest Protection The two principal approaches to protecting data in large single or multi-site RING deployments is to combine replication and erasure coding. Replication is simple to understand, for smaller objects an operator simply chooses the number of replicas desired. If two replicas are chosen, indicated by class of service (COS) 2, two copies are spread across nodes. For COS 3, three copies are spread across nodes. A frequent rule of thumb is a three percent rule, this being the fraction of files frequently being 60 kilobytes or less across a full object storage environment, meaning they are to be replicated; replicas are available in cases of hardware disruptions with a given node. Erasure coding is an adjustable technique where larger objects are divided into data chunks, sometimes called data shards or data blocks, and spread (or “striped”) across many nodes. To add resilience, in the case of one or even more hardware issues with nodes or disks within nodes, additional parity chunks are mathematically derived. This way, cleverly and by design, only a subset of the data chunks and parity chunks are required in a solution under duress, and the original object is still easily provided upon an S3 request. In smaller node deployments, it is possible to consider a single RING server as two entities, but dividing storage into two “disk groups.” However, for an ideal, larger RING site, the approach depicted is preferred. The erasure coding depicted, normally referred to with the nomenclature EC(9,3), leads into a deeper design consideration where storage overhead is traded off against data resiliency. In the diagram, as many as 3 nodes holding portions of the data could become unreachable and still the erasure-coded object would be available. The overhead can be considered 33 percent as 3 additional parity chunks were created, beyond the 9 data chunks, and stored. For more risk-adverse operators, an EC of, say, EC(8,4) would allow even more, four points of failure. The trade off would be, in this case, a 50 percent overhead to achieve that increased resiliency. The overhead is still much less than replication, which can see hundreds of percent in overhead, thus the logical choice to use that for only small objects. Together, replication and EC lead to an overall storage efficiency number. Considering a 3 percent small objects environment, an EC(9,3) and COS3 for replication might tactically lead to a long-term palatable data protection posture, all for only a total cost of 41 percent additional storage overhead. The ability to scale out and protect the S3 data in flight is the domain of BIG-IP and what we will review next. BIG-IP – Bring Scale and Traffic Control to Scality RING A starting point for any discussion around BIG-IP are the rich load balancing algorithms and the ability to drop unhealthy nodes from an origin pool, transparent to users who only interact with the configured virtual server. Load balancing for S3 involves avoiding “hot spots”, where a single RING node might otherwise by overly tasked by users directly communicating to it, all while other nodes remain vastly underutilized. By steering DNS resolution of S3 services to BIG-IP, and configured virtual servers, traffic can be spread across all healthy nodes and spread in accordance with interesting algorithms. Popular ones for S3 include: Least Connections – RING nodes with fewer established TCP connections will receive proportionally more of the new S3 transactions, towards a goal of balanced load in the server cluster. Ratio (member) – Although sound practice would be all RING members having similar compute and storage makeup, in some cases, perhaps two vintages of server exist. Ratio will allow proportionally more traffic to target newer, more performant classes of Scality nodes. Fastest (Application) – The number of “in progress” transactions any one server in a pool is handling is considered. If traffic steered to all members is generally similar over time, a member with the least number of transactions actively in progress will be considered a faster member in the pool, and new transactions can favor such low latency servers. The RING nodes are contacted through Scality "S3 Connectors", in an all object deployment the connector resides on the storage node itself. For some configurations, perhaps one with file-based protocols like NFS concurrently running, the S3 Connectors can also be installed on VM or 1U appliances too. Of course, an unhealthy node should be precluded from an origin pool and the ability to do low-impact HTTP-based health monitors, like the HTTP HEAD method to see if an endpoint is responsive are frequently used. With BIG-IP Extended Application Validation (EAV) one can move towards even more sophisticated health checks. An S3 access and secret token pair installed on BIG-IP can be harnessed to perpetually upload and download small objects to each pool member, assuring the BIG-IP administrator that S3 is unequivocally healthy with each pool member. BIG-IP – Control-Plane and Data-Plane Safeguards A popular topic in a Scality software-defined distributed storage solution is that of a noisy neighbor when multiple tenants are considered. Perhaps one tenant has an S3 application which consumes disproportionate amounts of shared resources (CPU, network, or disk I/O), degrading performance for other tenants; controls are needed to counter this. With BIG-IP, a simple control plane threshold can be invoked with a straight-forward iRule, a programmatic rule which can limit the source from producing more than, say, 25 S3 requests over 10 seconds. An iRule is a powerful but normally short, event-driven script. A sample is provided below. Most modern generative AI solutions are well-versed in F5 iRules and can summarize even the most advanced scripts into digestible terms. This iRule examines an application (“client_addr”) that connects to a BIG-IP virtual server and starts a counter, after 10 transactions within 6 seconds the S3 commands will be rejected. The approach is that of a leaky bucket, and the application will be replenished with credits for future transactions over time. Whereas iRules frequently target layer 7, HTTP-layer activity, a wealth of layer 3 and layer 4 controls to limit data plane excessive consumption exist. Take for example the static bandwidth controller concept. Simply create a profile such as the following 10 Mbps example. This bandwidth controller can then be applied against a virtual server, including a virtual server supporting, say, lower-priority S3 application traffic. Focusing on layer-4, the TCP layer, a number of BIG-IP safeguards exist, amongst which are those that can defend against orphaned S3 connections, including those intentionally set up and left open by a bad actor to try to deplete RING resources. Another safeguard, the ability to re-map DiffServ code points or Type of Service (TOS) precedence bits exists. In this manner, a source that exceeds ideal traffic rates can be passed without intervention; however, by remapping heavy upstream traffic, BIG-IP enables network infrastructure adjacent to Scality RING nodes to police or discard such traffic if required. Evolving Modern S3 Traffic with Fresh Takes on TLS TLS underwent a major improvement with the first release of TLS 1.3 in 2018. It removed a number of antiquated security components from official support, things like RSA-style key agreements, SHA-1 hashes, and DES encryption. However, from a performance point of view, the upgrade to TLS 1.3 is equally significant. When establishing a TLS 1.2 session, perhaps towards the goal of an S3 transaction with RING, with a TCP connection established, an application can expect 2 round-trip times to successfully pass the TLS negotiation phase and move forward with encrypted communications. TLS1.3 cuts round trips in half; a new TLS 1.3 session can proceed to encrypted data exchange with a single round-trip time. In fact, when resuming a previously established TLS 1.3 session, 0RTT is possible, meaning the first resumption from the client can itself carry encrypted data. The following packet trace demonstrates 1RTT TLS1.3 establishment (double-click to enlarge image). To turn on this feature, simply use a client-facing TLS profile on BIG-IP and remove the “No TLS1.3” option. Another advancement in TLS, which must have TLS 1.3 enabled to start with, is quantum computing resistance to shared key agreement algorithms in TLS. This is a foundational building block of Post Quantum Computing (PCQ) cryptography, and the most well-known of these techniques is NIST FIPS-203 ML-KEM. The concern with not supporting PCQ today is that traffic in flight, which may be surreptitiously siphoned off and stored long term, will be readable in the future with quantum computers, perhaps as early as 2030. This risk stems from thought leadership like Shor’s algorithm, which indicates public key (asymmetric) cryptography, foundational to shared key establishment between parties in TLS, is at risk. The concern is that due to large-scale, fault-tolerant quantum computers potentially cracking elliptic curve cryptography (ECC) and Diffie-Helman (DH) algorithms. This risk, the so-called Harvest Now, Decrypt Later threat, means sensitive data like tax records, medical information and anything with longer term retention value requires protections today. It cannot be put off safely; action needs to be taken now. FIPS-203 ML-KEM suggests a hybrid approach to shared key derivation, after which TLS parties today can safely continue to use symmetric encryption algorithms like AES, which are thought to be far less susceptible to quantum attacks. Updating our initial one-site topology, we can consider the following improvements. A key understanding is that a hybrid key agreement scheme is used in FIPS -203. Essentially, a parallel set of crypto operations using traditional key agreements like X25519 ECDH key exchange protocol is performed simultaneously to the new MLKEM768 quantum resistant key encapsulation approach. The net result is a significant amount of crypto is carried out, with two sets of calculations, and the final combining of outcomes to come to an agreed upon shared key. The conclusion is this load is likely best suited for only a subset of S3 flows, those with objects housing PII of high long-term potential value. A method to achieve this balance, the trade off between security and performance, is to use multiple BIG-IP virtual servers: a regular set of S3 endpoints with classical TLS support, and higher-security S3 endpoints for selective use. The latter would support the PQC provisions of modern TLS. A full article on configuring BIG-IP for PQC, including a video demonstration of the click-through to add support to a virtual server, can be found here. Multi-site Global Server Load Balancing with BIG-IP and Scality RING An illustrative diagram showing two RING sites, asynchronously connected and offering S3 ingestion and object retrieval is shown below. Note that the BIG-IP DNS, although frequently deployed independently from BIG-IP LTM appliances, can operate on the same, existing LTM appliances as well. In this example, an S3 application physically situated in Phoenix, Arizona, in the American southwest, will use its configured local DNS resolver (frequently shorted to LDNS) to resolve S3 targets to IP addresses. Think, finance.s3.acme.com or humanresources.s3.acme.com. In F5 terms, these example domain names are referred to as “Wide IPs”. An organization such as the fictious acme.com will delegate the relevant sub-domains to F5 DNS, such as s3.acme.com in our example, meaning the F5 appliances in San Francisco and Boston hold the DNS nameservice (NS) resource records for the S3 domain in question, and can answer the client’s DNS resolver authoritatively. The DNS A queries required by the S3 application will land on either BIG-IP DNS platform, San Francisco or Boston. The pair serve for redundancy purposes, and both can provide an enterprise-controlled answer. In other words, should the S3 application target be resolved in Los Angeles or New York City? The F5 solution allows for a multitude of considerations when providing the answer to the above question. Interesting options and their impact on our topology diagram: Global Availability – A common disaster recovery approach. The BIG-IP DNS appliance distributes DNS name resolution requests to the first available virtual server in a pool list the administrator configures. BIG-IP DNS starts at the top of the list of virtual servers and sends requests to the first available virtual server in the list. Only when the virtual server becomes unavailable does BIG-IP DNS send requests to the next virtual server in the list. If we want S3 generally to travel to Los Angeles, and only utilize New York when application availability problems arise, this would be a good approach. Ratio – In a case where we would like a, say, 80/20 split between S3 traffic landing in Los Angeles versus New York, this would be a sound method. Perhaps market reasons make the cost of ingesting traffic in New York more expensive. Round Robin – the logical choice where we would like to see both data centers receive, generally, over time, the same amount of S3 transactions. Topology - BIG-IP DNS distributes DNS name resolution requests using proximity-based load balancing. BIG-IP DNS determines the proximity of the resource by comparing location information derived from the DNS message to the topology records in a topology statement. A great choice if data centers are of similar capacity and S3 transactions are best serviced by the closest physical data center. Note, the source IP address of the application’s DNS resolver is analyzed; if a centralized DNS service is used, perhaps it is not in Phoenix at all. There are techniques like EDNS0 to try to place the actual locality of the application. Round Trip Time – An advanced algorithm that is dynamic, not static. BIG-IP DNS distributes DNS name resolution requests to the virtual server with the fastest measured round-trip time between that data center and a client’s LDNS. This is achieved by having sites send low-impact probes, from “prober pools”, to each application’s DNS resolver over time. Therefore, for new DNS resolution requests, the BIG-IP DNS can tap into real-world latency knowledge to direct S3 traffic to the site, which is demonstrably known to offer the lowest latency. This again works best when the application and DNS resolver are in the same location. The BIG-IP DNS, when selecting between virtual servers, such as in Los Angeles and New York City in our simple example, can have a primary algorithm, a secondary algorithm and a fall-back, hard-coded IP. For instance, consider the first two algorithms are, in order, dynamic approaches, such as prober pools measuring round-trip time and, as a second approach, the measurement of active hop counts between sites and application LDNS. Should both methods fail to provide results, an IP address of last resort, perhaps in our case, Los Angeles, will be provided through the configured fall-back IP. Key takeaway: what is being provided by F5 and Scality is “intelligent” DNS, traffic is directed to the sites not based upon basic network reachability to Los Angeles or New York. In reality, the solution looks behind the local load balancing tier and is aware of the health of each Scality RING member. Thus, traffic is steered in accordance to back-end application health monitoring, something a regular DNS solution would not offer. Multi-site Solutions for Global Deployments and Geo-Awareness One potentially interesting use case for F5 BIG-IP DNS and Scality RING sites would be to tier all data centers into pools, based upon wider geographies. Consider a use case such as the following, with Scality RING sites spread across both North America and Europe. The BIG-IP DNS solution can handle this higher layer of abstraction, the first layer involves choosing between a pool of sites, before delving down one more layer into the pool of virtual servers spread across the sites within the optimal region. Policy is driving the response to a DNS query for S3 services all the way through these two layers. To explore all load balancing methods is an interesting exercise but beyond the scope of this article. The manual here drills into the possible options. To direct traffic at the country or even continent level, one can follow the “Topology” algorithm for first selecting the correct site pool. Persistence can be enabled, allowing future requests from the same LDNS resolver to follow prior outcomes. First, it is good practice to ensure the geo-IP database of BIG-IP is up to date. A brief video here steps a user through the update. The next thing to create is regions. In this diagram the user has created an “Americas” and “Europe” region. In fact, in this particular setup, the Europe region is seen to match all traffic with DNS queries originating outside of North and South American, per the list of member continents. With regions defined, now the one creates simple topology records to control DNS responses for S3 services based upon the source IP of DNS queries on behalf of S3 applications. The net result is a worldwide set of controls with regard to which Scality site S3 transactions will land upon. The decision based upon enterprise objectives can fully consider geographies like continents or individual countries. In our example, once a source region has been decided upon for an inbound DNS request, any of the previous algorithms can kick in. This would include options like global availability for DR within the selected regions, or perhaps measured latency to steer traffic to the most performant site in the region. Summary Scality RING is a software-defined object and file solution that supports data resiliency at levels expected by risk-adverse storage groups, all with contemporary Linux-friendly hardware platforms selected by the enterprise. The F5 BIG-IP application delivery controller complements S3 object traffic involving Scality, through massive scale out of nodes coupled with innovative algorithms for agile spreading of the traffic. Health of RING nodes is perpetually monitored so as to seamlessly bypass any troubled system. When moving to multi-site RING deployments, within a country or even across continents, BIG-IP DNS is harnessed to steer traffic to the optimal site, potentially including geo-ip rules, proximity between user and data center, and established baseline latencies offered by each site to the S3 application’s home location.238Views2likes2CommentsYouTube RSS Newsletter in n8n Root Cause: Why the Ollama Node Broke My Agent
Hey community—Aubrey here. I want to talk about a failure I ran into while building an n8n workflow, because this one cost me some real time and I think it’s going to save you an afternoon if you’re headed down the same road. The short version: I had a workflow working great with OpenAI, and I wanted to swap in Ollama so I could run the LLM locally. Same prompt, same data, same structured output requirements. In my head, that should’ve been a clean plug-and-play change. It wasn’t. It broke in a way that looked like “the model isn’t returning valid JSON,” but the real root cause was something else entirely—and it’s actually documented. What broke (and where it broke) The failure always showed up in the Structured Output Parser. n8n would run the flow, then the parser would throw: "Model output doesn't fit required format," Which is a super reasonable error if your model is rambling, adding commentary, wrapping JSON in markdown, returning tool traces, whatever. So that’s where my head went first: “Okay, I need to tighten the prompt. Maybe the schema is too strict. Maybe Ollama’s being weird.” But here’s the thing: this wasn’t one of those “LLM didn’t obey” moments. This was repeatable, consistent, and it didn’t really matter how I tuned the prompt. The OpenAI version worked; the Ollama version failed, and the parser was just the first place it showed up. The first big clue: the 5-minute wall As I dug in, I started seeing a pattern: a hard failure at exactly five minutes. Not “about five minutes,” not “sometimes,” but right on the dot. That error often surfaced as: "fetch failed." So now we’re not talking about a formatting issue anymore—we’re talking about the request itself failing. That matters, because if the model call dies mid-stream, the structured parser downstream is going to be handed something incomplete or empty or error-shaped, and it’s going to complain that it doesn’t match the schema. That’s not the parser being wrong. That’s the parser doing its job. This 5-minute behavior is also what other folks reported in n8n issue #13655—the Ollama chat model node timing out after 5 minutes even when people tried to change the “keep alive” setting. Reproducing behavior with logs (and why it matters) One of the most useful things I found in that issue thread was simple: Ollama’s own logs clearly show the request dying at 5 minutes when driven by n8n’s AI nodes. You’ll see entries like: failure case: 500 | 5m0s | POST "/api/chat" Then a n8n community member swapped the same payload into a manual HTTP Request node in n8n (which does let you set a timeout), and suddenly the same call works: success case: 200 | 7m0s | POST "/api/chat" That’s a huge diagnostic move. Because it tells you the model isn’t “incapable” or “too slow”—it tells you the client behavior is the problem (timeout / abort / cancellation), not your prompt, not your JSON schema, not the content. And that lined up perfectly with what I was seeing on my side. Getting serious: tcpdump, FIN/ACK, and “context canceled” At some point I wanted proof of what was actually happening on the wire, so I ran a tcpdump against the Ollama port. And yeah—this is where it got real. What I saw was: n8n connects to Ollama fine Data flows for a while (so we’re not talking about “can’t reach host”) At the ~5 minute mark, n8n sends a TCP FIN/ACK (client closes connection) Then also sends an HTTP 500 containing an error like "context cancelled" In the issue thread, you can literally see an example of that pattern: client FIN from n8n → Ollama, then 'HTTP/1.1 500 Internal Server Error' with a body indicating: 'context canceled.' So when I originally said “the structured output parser fails because Ollama’s tool call output isn’t close to what’s expected,” I wasn’t totally wrong about the symptom. But the deeper “why” is: the request is being canceled and what comes back is not a valid structured model output. The parser is just where it becomes obvious when you force the node to return data back to the agent. The root cause (and the part I want everyone to notice) Now here’s the punchline, and this is the part I want to underline, bold, highlight, put on a billboard: The n8n Ollama model node does not work with LLM Tools implementations. That’s not a rumor. That’s in the n8n docs! After a quick recap and discussion, JRahm pointed me to the documentation for the Ollama model integration, and it straight-up says the Ollama node does not support tools, and recommends using Basic LLM Chain instead: What I’m doing next (and what you should do) I’m not done with Ollama—I’m just done trying to use it the wrong way and this is going to spawn two follow-up efforts for me: Attempt to rebuild the same idea using Basic LLM Chain with Ollama, the way the docs recommend. Write a deeper explainer on LLM Tools—what they are, why agents use them, and how that’s different than RAG (because those concepts get mashed together constantly). So if you’re out there wiring up an Agent with structured output and you’re thinking “I’ll just switch the model to Ollama,” don’t do what I did. Read that doc line first. If you need tools, pick a model/node combo that supports tools. If you’re using Ollama, design for the Basic LLM Chain path and you’ll save yourself the five-minute timeout rabbit hole and the structured-parser blame game.65Views0likes0CommentsUsing the Model Context Protocol with Open WebUI
This year we started building out a series of hands-on labs you can do on your own in our AI Step-by-Step repo on GitHub. In my latest lab, I walk you through setting up a Model Context Protocol (MCP) server and the mcpo proxy to allow you to use MCP tools in a locally-hosted Open WebUI + Ollama environment. The steps are well-covered there, but I wanted to highlight what you learn in the lab. What is MCP and why does it matter? MCP is a JSON-based open standard from Anthropic that (shockingly!) is only about 13 months old now. It allows AI assistants to securely connect to external data sources and tools through a unified interface. The key delivery that led to it's rapid adoption is that it solves the fragmentation problem in AI integrations—instead of every AI system needing custom code to connect to each tool or database, MCP provides a single protocol that works across different AI models and data sources. MCP in the local lab My first exposure to MCP was using Claude and Docker tools to replicate a video Sebastian_Maniak released showing how to configure a BIG-IP application service. I wanted to see how F5-agnostic I could be in my prompt and still get a successful result, and it turned out that the only domain-specific language I needed, after it came up with a solution and deployed it, was to specify the load balancing algorithm. Everything else was correct. Kinda blew my mind. I spoke about this experience throughout the year at F5 Academy events and at a solutions days event in Toronto, but more-so, I wanted to see how far I could take this in a local setting away from the pay-to-play tooling offered at that time. This was the genesis for this lab. Tools In this lab, you'll use the following tools: Ollama - Open WebUI mcpo custom mcp server Ollama and Open WebUI are assumed to already be installed, those labs are also in the AI Step-by-Step repo: Installing Ollama Installing Open WebUI Once those are in place, you can clone the repo and deploy in docker or podman, just make sure the containers for open WebUI are in the same network as the repo you're deploying. Results The success for getting your Open WebUI inference through the mcpo proxy and the MCP servers (mine is very basic just for test purposes, there are more that you can test or build yourself) depends greatly on your prompting skills and the abilities of the local models you choose. I had varying success with llama3.2:3b. But the goal here isn't production-ready tooling, it's to build and discover and get comfortable in this new world of AI assistants and leveraging them where it makes sense to augment our toolbox. Drop a comment below if you build this lab and share your successes and failures. Community is the best learning environment.
988Views5likes1CommentThe Fast Path to Safer Labs: CycloneDX SBOMs for F5 Practitioners
Quick note up front about my intent with this lab... I built it to quickly help F5 practitioners keep their lab environments safe from hidden threats. Fast, approachable, and useful on day one. We used the bundled Dependency-Track container because it’s trivial to stand up in a lab. In production, please deploy Dependency-Track backed by a production‑grade database and tune it for scale and durability. Lab-first, but think ahead to enterprise‑ready. Now, let’s talk about why I chose CycloneDX for the SBOM we generated with Trivy, and why it’s the accepted standard I recommend for modern, AI‑heavy workloads. At a high level, an SBOM is your ingredient list for software. Containers that host LLM apps are layered: base OS, GPU drivers and CUDA, language runtimes, Python packages, app binaries, plus external services you call (hosted inference, embeddings, vector databases). If you don’t know what’s in that stack, you can’t manage risk when new CVEs land. CycloneDX gives you that visibility and does it with a security-first design. Here’s why CycloneDX is such a good fit: - Security-first schema. CycloneDX was born into the AppSec world at OWASP. It bakes in identifiers that vulnerability tooling actually uses—package URLs (purls), CPEs, hashes—and a proper dependency graph. That graph matters when the vulnerable thing isn’t your top-level app but the library three layers deep. - Broad component coverage, including services. Real LLM apps don’t stop at “libraries.” CycloneDX can represent applications, libraries, containers, operating systems, files, and services. That service support is huge: if you depend on an external inference API, a hosted vector DB, or a third-party embedding service, CycloneDX can document that right in your SBOM. Your risk picture is no longer just what’s “in the image,” but what the image calls it. - VEX support to cut noise. CycloneDX supports VEX (Vulnerability Exploitability eXchange), which lets you annotate “not affected” or “mitigated” when a CVE shows up in your base image but is not exploitable in your specific deployment. That’s how you keep the signal high and the noise low. - Toolchain adoption. The path we used in the lab—Trivy generates CycloneDX JSON in a single command, Dependency-Track ingests it cleanly—is exactly what you want. Fewer conversions, fewer surprises, more time looking at risk with a project-centric view. So how does that map to LLM app security, specifically? - Containers and drivers: CycloneDX captures the full container context—OS packages, runtime layers, GPU driver stacks—so when you rebuild to pick up a CUDA or base image update, your SBOM reflects the change and your risk dashboard stays current. - Python ecosystems: For model-serving and data pipelines, CycloneDX tracks the Python libraries and their transitive dependencies, so when a popular package pushes a patch for a nasty CVE, you’ll see the impact across your projects. - Model artifacts and files: CycloneDX can represent file components with hashes. If you pin or verify model files, that checksum data helps you detect drift or tampering. - External services: Many LLM apps rely on hosted endpoints. CycloneDX’s service component type lets you document those dependencies, so governance isn’t blind to the parts of your “system” that live outside your containers. Now, let’s compare CycloneDX to other SBOM standards you’ll hear about. SPDX (Software Package Data Exchange) - Strengths: It’s a Linux Foundation standard with deep traction, especially for license compliance. Legal and compliance teams love it for moving license information through CI/CD. - Tradeoffs for AppSec: SPDX can represent dependencies and has added security-relevant fields, but its heritage is compliance rather than vulnerability analysis. Modeling external services is less natural, and a lot of AppSec tooling (like the Trivy -> Dependency-Track workflow we used) is tuned for CycloneDX. If your primary goal is security visibility and CVE triage for containerized AI apps, CycloneDX tends to be the smoother path. SWID tags (ISO/IEC 19770-2) - Strengths: Vendor provided software identification for asset management—who installed what, what version, and how it’s licensed. - Tradeoffs: Limited open tooling, and not a great fit for layered containers or fast-moving dependency graphs. You won’t get the rich, developer-centric view you need for daily AppSec in LLM environments. And a quick reality check: package manifests and lockfiles (pip freeze, requirements.txt, package-lock.json) are useful, but they’re not SBOMs. They miss OS packages, drivers, and container layers. CycloneDX gives you the whole picture. Practically speaking, here’s the loop we ran—and why CycloneDX makes it painless: - Generate: Use Trivy to scan your AI container and spit out CycloneDX JSON. It’s trivial—one line, usually under a minute. - Ingest: Push that SBOM into Dependency-Track via the API. You get components, licenses, vulnerability scores, dependency graphs, and a clean project/version history. - Act: Watch for new CVEs. Use VEX to mark what’s not exploitable in your context. Rebuild, rescan, repeat. Automate it in CI so your SBOM stays fresh without manual babysitting. Production note again, because it matters: the bundled Dependency-Track container is perfect for labs and demos. In production, deploy Dependency-Track with a production-grade database, persistent storage, backups, and access controls that match your enterprise standards. Bottom line: SPDX and CycloneDX are both legitimate, widely accepted SBOM standards. If your priority is license compliance, SPDX is an excellent fit. If your priority is application security for modern, service-heavy, containerized LLM apps, CycloneDX gives you security-first modeling, service coverage, VEX, and an ecosystem that lets you move fast without sacrificing visibility. Voila—grab Trivy, generate CycloneDX, feed Dependency-Track, and start getting signals instead of noise. Fresh installs often look green on day one, but when something changes tomorrow, you’ll see it. That’s the whole game: make hidden threats visible, then make them go away. If you’d like to try the lab, it’s located here. If you want to check out the video of the lab, instead, try this one:
85Views3likes0CommentsIntroducing F5 AI Red Team
F5 AI Red Team simulates adversarial attacks such as prompt injection and jailbreaks at unprecedented speed and scale, allowing for continuous assessment throughout the application lifecycle, providing insights into threats and integrating with F5 AI Guardrails to convert these insights into security policies.
377Views6likes1CommentKey Steps to Securely Scale and Optimize Production-Ready AI for Banking and Financial Services
This article outlines three key actions banks and financial firms can take to better securely scale, connect, and optimize their AI workflows, which will be demonstrated through a scenario of a bank taking a new AI application to production.659Views3likes0CommentsI Tried to Beat OpenAI with Ollama in n8n—Here’s Why It Failed (and the Bug I’m Filing)
Hey, community. I wanted to share a story about how I built the n8n Labs workflow. It watches a YouTube channel, summarizes the latest videos with AI agents, and sends a clean HTML newsletter via Gmail. In the video, I show it working flawlessly with OpenAI. But before I got there, I spent a lot of time trying to copy the same flow using open source models through Ollama with the n8n Ollama node. My results were all over the map. I really wanted this to be a great “open source first” build. I tried many local models via Ollama, tuned prompts, adjusted parameters, and re‑ran tests. The outputs were always unpredictable: sometimes I’d get partial JSON, sometimes extra text around the JSON. Sometimes fields would be missing. Sometimes it would just refuse to stick to the structure I asked for. After enough iterations, I started to doubt whether my understanding of the agent setup was off. So, I built a quick proof inside the n8n Code node. If the AI Agent step is supposed to take the XML→JSON feed and reshape it into a structured list—title, description, content URL, thumbnail URL—then I should be able to do that deterministically in JavaScript and compare. I wrote a tiny snippet that reads the entries array, grabs the media fields, and formats a minimal output. And guess what? Voila. It worked on the first try and my HTML generator lit up exactly the way I wanted. That told me two things: one, my upstream data (HTTP Request + XML→JSON) was solid; and two, my desired output structure was clear and achievable without any trickery. With that proof in hand, I turned to OpenAI. I wired the same agent prompt, the same structured output parser, and the same workflow wiring—but swapped the Ollama node for an OpenAI chat model. It worked immediately. Fast, cheap, predictable. The agent returned a perfectly clean JSON with the fields I requested. My code node transformed it into HTML. The preview looked right, and Gmail sent the newsletter just like in the demo. So at that point, I felt confident the approach was sound and the transcript you saw in the video was repeatable—at least with OpenAI in the loop. Where does that leave Ollama and open source models? I’m not throwing shade—I love open source, and I want this path to be great. My current belief is the failure is somewhere inside the n8n Ollama node code path. I don’t think it’s the models themselves in isolation; I think the node may be mishandling one or more of these details: how messages are composed (system vs user). Whether “JSON mode” or a grammar/format hint is being passed, token/length defaults that cause truncation, stop settings that let extra text leak into the output; or the way the structured output parser constraints are communicated. If you’ve worked with local models, you know they can follow structure very well when you give them a strict format or grammar. If the node isn’t exposing that (or is dropping it on the floor), you get variability. To make sure this gets eyes from the right folks, my intent is to file a bug with n8n for the Ollama node. I’ll include a minimal, reproducible workflow: the same RSS fetch, the same XML→JSON conversion, the same agent prompt and required output shape, and a comparison run where OpenAI succeeds and Ollama does not. I’ll share versions, logs, model names, and settings so the team can trace exactly where the behavior diverges. If there’s a missing parameter (like format: json) or a message-role mix‑up, great—let’s fix it. If it needs a small enhancement to pass a grammar or schema to the model, even better. The net‑net is simple: for AI agents inside n8n to feel predictable with Ollama, we need the node to enforce reliably structured outputs the same way the OpenAI path does. That unlocks a ton of practical automation for folks who prefer local models. In the meantime, if you’re following the lab and want a rock‑solid fallback, you can use the Code node to do the exact transformation the agent would do. Here’s the JavaScript I wrote and tested in the workflow: const entries = $input.first().json.feed?.entry ?? []; function truncate(str, max) { if (!str) return ''; const s = String(str).trim(); return s.length > max ? s.slice(0, max) + '…' : s; // If you want total length (including …) to be max, use: // return s.length > max ? s.slice(0, Math.max(0, max - 1)) + '…' : s; } const output = entries.map(entry => { const g = entry['media:group'] ?? {}; return { title: g['media:title'] ?? '', description: truncate(g['media:description'], 60), contentUrl: g['media:content']?.url ?? '', thumbnailUrl: g['media:thumbnail']?.url ?? '' }; }); return [{ json: { output } }]; That snippet proves the data is there and your HTML builder is fine. If OpenAI reproduces the same structured JSON as the code, and Ollama doesn’t, the issue is likely in the node’s request/response handling rather than your workflow logic. I’ll keep pushing on the bug report so we can make agents with Ollama as predictable as they need to be. Until then, if you want speed and consistency to get the job done, OpenAI works great. If you’re experimenting with open source, try enforcing stricter formats and shorter outputs—and keep an eye on what the node actually sends to the model. As always, I’ll share updates, because I love sharing knowledge—and I want the open-source path to shine right alongside the rest of our AI, agents, n8n, Gmail, and OpenAI workflows. As always, community, if you have a resolution and can pull it off, please share!
513Views2likes1CommentHow I did it.....again "High-Performance S3 Load Balancing with F5 BIG-IP"
Introduction Welcome back to the "How I did it" series! In the previous installment, we explored the high‑performance S3 load balancing of Dell ObjectScale with F5 BIG‑IP. This follow‑up builds on that foundation with BIG‑IP v21.x’s S3‑focused profiles and how to apply them in the wild. We’ll also put the external monitor to work, validating health with real PUT/GET/DELETE checks so your S3-compatible backends aren’t just “up,” they’re truly dependable. New S3 Profiles for the BIG-IP…..well kind of A big part of why F5 BIG-IP excels is because of its advanced traffic profiles, like TCP and SSL/TLS. These profiles let you fine-tune connection behavior—optimizing throughput, reducing latency, and managing congestion—while enforcing strong encryption and protocol settings for secure, efficient data flow. Available with version 21.x the BIG-IP now includes new S3-specific profiles, (s3-tcp and s3-default-clientssl). These profiles are based off existing default parent profiles, (tcp and clientssl respectively) that have been customized or “tuned” to optimize s3 traffic. Let’s take a closer look. Anatomy of a TCP Profile The BIG-IP includes a number of pre-defined TCP profiles that define how the system manages TCP traffic for virtual servers, controlling aspects like connection setup, data transfer, congestion control, and buffer tuning. These profiles allow administrators to optimize performance for different network conditions by adjusting parameters such as initial congestion window, retransmission timeout, and algorithms like Nagle’s or Delayed ACK. The s3-tcp, (see below) has been tweaked with respect to data transfer and congestion window sizes as well as memory management to optimize typical S3 traffic patterns (i.e. high-throughput data transfer, varying request sizes, large payloads, etc.). Tweaking the Client SSL Profile for S3 Client SSL profiles on BIG-IP define how the system terminates and manages SSL/TLS sessions from clients at the virtual server. They specify critical parameters such as certificates, private keys, cipher suites, and supported protocol versions, enabling secure decryption for advanced traffic handling like HTTP optimization, security policies, and iRules. The s3-default-clientssl has been modified, (see below) from the default client ssl profile to optimize SSL/TLS settings for high-throughput object storage traffic, ensuring better performance and compatibility with S3-specific requirements. Advanced S3-compatible health checking with EAV Has anyone ever told you how cool BIG-IP Extended Application Verification (EAV) aka external monitors are? Okay, I suppose “coolness” is subjective, but EAVs are objectively cool. Let me prove it to you. Health monitoring of backend S3-compatible servers typically involves making an HTTP GET request to either the exposed S3 ingest/egress API endpoint or a liveness probe. Get a 200 and all's good. Wouldn’t it be cool if you could verify a backend server's health by verifying it can actually perform the operations as intended? Fortunately, we can do just that using an EAV monitor. Therefore, based on the transitive property, EAVs are cool. —mic drop The bash script located at the bottom of the page performs health checks on S3-compatible storage by executing PUT, GET, and DELETE operations on a test object. The health check creates a temporary health check file with timestamp, retrieves the file to verify read access, and removes the test file to clean up. If all three operations return the expected HTTP status code, the node is marked up otherwise the node is marked down. Installing and using the EAV health check Import the monitor script Save the bash script, (.sh) extension, (located at the bottom of this page) locally and import the file onto the BIG-IP. Log in to the BIG-IP Configuration Utility and navigate to System > File Management > External Monitor Program File List > Import. Use the file selector to navigate to and select the newly created. bash file, provide a name for the file and select 'Import'. Create a new external monitor Navigate to Local Traffic > Monitors > Create Provide a name for the monitor. Select 'External' for the type, and select the previously uploaded file for the 'External Program'. The 'Interval' and 'Timeout' settings can be modified or left at the default as desired. In addition to the backend host and port, the monitor must pass three (3) additional variables to the backend: bucket - The name of an existing bucket where the monitor can place a small text file. During the health check, the monitor will create a file, request the file and delete the file. access_key - S3-compatible access key with permissions to perform the above operations on the specified bucket. secret_key - corresponding S3-compatible secret key. Select 'Finished' to create the monitor. Associate the monitor with the pool Navigate to Local Traffic > Pools > Pool List and select the relevant backend S3 pool. Under 'Health Monitors' select the newly created monitor and move from 'Available' to the 'Active'. Select 'Update' to save the configuration. Additional Links How I did it - "High-Performance S3 Load Balancing of Dell ObjectScale with F5 BIG-IP" F5 BIG-IP v21.0 brings enhanced AI data delivery and ingestion for S3 workflows Overview of BIG-IP EAV external monitors EAV Bash Script #!/bin/bash ################################################################################ # S3 Health Check Monitor for F5 BIG-IP (External Monitor - EAV) ################################################################################ # # Description: # This script performs health checks on S3-compatible storage by # executing PUT, GET, and DELETE operations on a test object. It uses AWS # Signature Version 4 for authentication and is designed to run as a BIG-IP # External Application Verification (EAV) monitor. # # Usage: # This script is intended to be configured as an external monitor in BIG-IP. # BIG-IP automatically provides the first two arguments: # $1 - Pool member IP address (may be IPv6-mapped format: ::ffff:x.x.x.x) # $2 - Pool member port number # # Additional arguments must be configured in the monitor's "Variables" field: # bucket - S3 bucket name # access_key - Access key for authentication # secret_key - Secret key for authentication # # BIG-IP Monitor Configuration: # Type: External # External Program: /path/to/this/script.sh # Variables: # bucket="your-bucket-name" # access_key="your-access-key" # secret_key="your-secret-key" # # Health Check Logic: # 1. PUT - Creates a temporary health check file with timestamp # 2. GET - Retrieves the file to verify read access # 3. DELETE - Removes the test file to clean up # Success: All three operations return expected HTTP status codes # Failure: Any operation fails or times out # # Exit Behavior: # - Prints "UP" to stdout if all checks pass (BIG-IP marks pool member up) # - Silent exit if any check fails (BIG-IP marks pool member down) # # Requirements: # - openssl (for SHA256 hashing and HMAC signing) # - curl (for HTTP requests) # - xxd (for hex encoding) # - Standard bash utilities (date, cut, sed, awk) # # Notes: # - Handles IPv6-mapped IPv4 addresses from BIG-IP (::ffff:x.x.x.x) # - Uses AWS Signature Version 4 authentication # - Logs activity to syslog (local0.notice) # - Creates temporary files that are automatically cleaned up # # Author: [Gregory Coward/F5] # Version: 1.0 # Last Modified: 12/2025 # ################################################################################ # ===== PARAMETER CONFIGURATION ===== # BIG-IP automatically provides these HOST="$1" # Pool member IP (may include ::ffff: prefix for IPv4) PORT="$2" # Pool member port BUCKET="${bucket}" # S3 bucket name ACCESS_KEY="${access_key}" # S3 access key SECRET_KEY="${secret_key}" # S3 secret key OBJECT="${6:-healthcheck.txt}" # Test object name (default: healthcheck.txt) # Strip IPv6-mapped IPv4 prefix if present (::ffff:10.1.1.1 -> 10.1.1.1) # BIG-IP may pass IPv4 addresses in IPv6-mapped format if [[ "$HOST" =~ ^::ffff: ]]; then HOST="${HOST#::ffff:}" fi # ===== S3/AWS CONFIGURATION ===== ENDPOINT="http://$HOST:$PORT" # S3 endpoint URL SERVICE="s3" # AWS service identifier for signature REGION="" # AWS region (leave empty for S3 compatible such as MinIO/Dell) # ===== TEMPORARY FILE SETUP ===== # Create temporary file for health check upload TMP_FILE=$(mktemp) printf "Health check at %s\n" "$(date)" > "$TMP_FILE" # Ensure temp file is deleted on script exit (success or failure) trap "rm -f $TMP_FILE" EXIT # ===== CRYPTOGRAPHIC HELPER FUNCTIONS ===== # Calculate SHA256 hash and return as hex string # Input: stdin # Output: hex-encoded SHA256 hash hex_of_sha256() { openssl dgst -sha256 -hex | sed 's/^.* //' } # Sign data using HMAC-SHA256 and return hex signature # Args: $1=hex-encoded key, $2=data to sign # Output: hex-encoded signature sign_hmac_sha256_hex() { local key_hex="$1" local data="$2" printf "%s" "$data" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$key_hex" | awk '{print $2}' } # Sign data using HMAC-SHA256 and return binary as hex # Args: $1=hex-encoded key, $2=data to sign # Output: hex-encoded binary signature (for key derivation chain) sign_hmac_sha256_binary() { local key_hex="$1" local data="$2" printf "%s" "$data" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$key_hex" -binary | xxd -p -c 256 } # ===== AWS SIGNATURE VERSION 4 IMPLEMENTATION ===== # Generate AWS Signature Version 4 for S3 requests # Args: # $1 - HTTP method (PUT, GET, DELETE, etc.) # $2 - URI path (e.g., /bucket/object) # $3 - Payload hash (SHA256 of request body, or empty hash for GET/DELETE) # $4 - Content-Type header value (empty string if not applicable) # Output: pipe-delimited string "Authorization|Timestamp|Host" aws_sig_v4() { local method="$1" local uri="$2" local payload_hash="$3" local content_type="$4" # Generate timestamp in AWS format (YYYYMMDDTHHMMSSZ) local timestamp=$(date -u +"%Y%m%dT%H%M%SZ" 2>/dev/null || gdate -u +"%Y%m%dT%H%M%SZ") local datestamp=$(date -u +"%Y%m%d") # Build host header (include port if non-standard) local host_header="$HOST" if [ "$PORT" != "80" ] && [ "$PORT" != "443" ]; then host_header="$HOST:$PORT" fi # Build canonical headers and signed headers list local canonical_headers="" local signed_headers="" # Include Content-Type if provided (for PUT requests) if [ -n "$content_type" ]; then canonical_headers="content-type:${content_type}"$'\n' signed_headers="content-type;" fi # Add required headers (must be in alphabetical order) canonical_headers="${canonical_headers}host:${host_header}"$'\n' canonical_headers="${canonical_headers}x-amz-content-sha256:${payload_hash}"$'\n' canonical_headers="${canonical_headers}x-amz-date:${timestamp}" signed_headers="${signed_headers}host;x-amz-content-sha256;x-amz-date" # Build canonical request (AWS Signature V4 format) # Format: METHOD\nURI\nQUERY_STRING\nHEADERS\n\nSIGNED_HEADERS\nPAYLOAD_HASH local canonical_request="${method}"$'\n' canonical_request+="${uri}"$'\n\n' # Empty query string (double newline) canonical_request+="${canonical_headers}"$'\n\n' canonical_request+="${signed_headers}"$'\n' canonical_request+="${payload_hash}" # Hash the canonical request local canonical_hash canonical_hash=$(printf "%s" "$canonical_request" | hex_of_sha256) # Build string to sign local algorithm="AWS4-HMAC-SHA256" local credential_scope="$datestamp/$REGION/$SERVICE/aws4_request" local string_to_sign="${algorithm}"$'\n' string_to_sign+="${timestamp}"$'\n' string_to_sign+="${credential_scope}"$'\n' string_to_sign+="${canonical_hash}" # Derive signing key using HMAC-SHA256 key derivation chain # kSecret = HMAC("AWS4" + secret_key, datestamp) # kRegion = HMAC(kSecret, region) # kService = HMAC(kRegion, service) # kSigning = HMAC(kService, "aws4_request") local k_secret k_secret=$(printf "AWS4%s" "$SECRET_KEY" | xxd -p -c 256) local k_date k_date=$(sign_hmac_sha256_binary "$k_secret" "$datestamp") local k_region k_region=$(sign_hmac_sha256_binary "$k_date" "$REGION") local k_service k_service=$(sign_hmac_sha256_binary "$k_region" "$SERVICE") local k_signing k_signing=$(sign_hmac_sha256_binary "$k_service" "aws4_request") # Calculate final signature local signature signature=$(sign_hmac_sha256_hex "$k_signing" "$string_to_sign") # Return authorization header, timestamp, and host header (pipe-delimited) printf "%s|%s|%s" \ "${algorithm} Credential=${ACCESS_KEY}/${credential_scope}, SignedHeaders=${signed_headers}, Signature=${signature}" \ "$timestamp" \ "$host_header" } # ===== HTTP REQUEST FUNCTION ===== # Execute HTTP request using curl with AWS Signature V4 authentication # Args: # $1 - HTTP method (PUT, GET, DELETE) # $2 - Full URL # $3 - Authorization header value # $4 - Timestamp (x-amz-date header) # $5 - Host header value # $6 - Payload hash (x-amz-content-sha256 header) # $7 - Content-Type (optional, empty for GET/DELETE) # $8 - Data file path (optional, for PUT with body) # Output: HTTP status code (e.g., 200, 404, 500) do_request() { local method="$1" local url="$2" local auth="$3" local timestamp="$4" local host_header="$5" local payload_hash="$6" local content_type="$7" local data_file="$8" # Build curl command with required headers local cmd="curl -s -o /dev/null --connect-timeout 5 --write-out %{http_code} \"$url\"" cmd="$cmd -X $method" cmd="$cmd -H \"Host: $host_header\"" cmd="$cmd -H \"x-amz-date: $timestamp\"" cmd="$cmd -H \"x-amz-content-sha256: $payload_hash\"" # Add optional headers [ -n "$content_type" ] && cmd="$cmd -H \"Content-Type: $content_type\"" cmd="$cmd -H \"Authorization: $auth\"" [ -n "$data_file" ] && cmd="$cmd --data-binary @\"$data_file\"" # Execute request and return HTTP status code eval "$cmd" } # ===== MAIN HEALTH CHECK LOGIC ===== # ===== STEP 1: PUT (Upload Test Object) ===== # Calculate SHA256 hash of the temp file content UPLOAD_HASH=$(openssl dgst -sha256 -binary "$TMP_FILE" | xxd -p -c 256) CONTENT_TYPE="application/octet-stream" # Generate AWS Signature V4 for PUT request SIGN_OUTPUT=$(aws_sig_v4 "PUT" "/$BUCKET/$OBJECT" "$UPLOAD_HASH" "$CONTENT_TYPE") AUTH_PUT=$(cut -d'|' -f1 <<< "$SIGN_OUTPUT") DATE_PUT=$(cut -d'|' -f2 <<< "$SIGN_OUTPUT") HOST_PUT=$(cut -d'|' -f3 <<< "$SIGN_OUTPUT") # Execute PUT request (expect 200 OK) PUT_STATUS=$(do_request "PUT" "$ENDPOINT/$BUCKET/$OBJECT" "$AUTH_PUT" "$DATE_PUT" "$HOST_PUT" "$UPLOAD_HASH" "$CONTENT_TYPE" "$TMP_FILE") # ===== STEP 2: GET (Download Test Object) ===== # SHA256 hash of empty body (for GET requests with no payload) EMPTY_HASH="e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" # Generate AWS Signature V4 for GET request SIGN_OUTPUT=$(aws_sig_v4 "GET" "/$BUCKET/$OBJECT" "$EMPTY_HASH" "") AUTH_GET=$(cut -d'|' -f1 <<< "$SIGN_OUTPUT") DATE_GET=$(cut -d'|' -f2 <<< "$SIGN_OUTPUT") HOST_GET=$(cut -d'|' -f3 <<< "$SIGN_OUTPUT") # Execute GET request (expect 200 OK) GET_STATUS=$(do_request "GET" "$ENDPOINT/$BUCKET/$OBJECT" "$AUTH_GET" "$DATE_GET" "$HOST_GET" "$EMPTY_HASH" "" "") # ===== STEP 3: DELETE (Remove Test Object) ===== # Generate AWS Signature V4 for DELETE request SIGN_OUTPUT=$(aws_sig_v4 "DELETE" "/$BUCKET/$OBJECT" "$EMPTY_HASH" "") AUTH_DEL=$(cut -d'|' -f1 <<< "$SIGN_OUTPUT") DATE_DEL=$(cut -d'|' -f2 <<< "$SIGN_OUTPUT") HOST_DEL=$(cut -d'|' -f3 <<< "$SIGN_OUTPUT") # Execute DELETE request (expect 204 No Content) DEL_STATUS=$(do_request "DELETE" "$ENDPOINT/$BUCKET/$OBJECT" "$AUTH_DEL" "$DATE_DEL" "$HOST_DEL" "$EMPTY_HASH" "" "") # ===== LOG RESULTS ===== # Log all operation results for troubleshooting #logger -p local0.notice "S3 Monitor: PUT=$PUT_STATUS GET=$GET_STATUS DEL=$DEL_STATUS" # ===== EVALUATE HEALTH CHECK RESULT ===== # BIG-IP considers the pool member "UP" only if this script prints "UP" to stdout # Check if all operations returned expected status codes: # PUT: 200 (OK) # GET: 200 (OK) # DELETE: 204 (No Content) if [ "$PUT_STATUS" -eq 200 ] && [ "$GET_STATUS" -eq 200 ] && [ "$DEL_STATUS" -eq 204 ]; then #logger -p local0.notice "S3 Monitor: UP" echo "UP" fi # If any check fails, script exits silently (no "UP" output) # BIG-IP will mark the pool member as DOWN647Views4likes0CommentsF5 BIG-IP and NetApp StorageGRID - Providing Fast and Scalable S3 API for AI apps
F5 BIG-IP, an industry-leading ADC solution, can provide load balancing services for HTTPS servers, with full security applied in-flight and performance levels to meet any enterprise’s capacity targets. Specific to the S3 API, the object storage and retrieval protocol that rides upon HTTPS, an aligned partnering solution exists from NetApp, which allows a large-scale set of S3 API targets to ingest and provide objects. Automatic backend synchronization allows any node to be offered up as a target by a server load balancer like BIG-IP. This allows overall storage node utilization to be optimized across the node set, and scaled performance to reach the highest S3 API bandwidth levels, all while offering high availability to S3 API consumers. If one node fails or is undergoing maintenance, the overall service continues. S3 compatible storage is becoming popular for AI applications due to its superior performance over traditional protocols such as NFS or CIFS, as well as enabling repatriation of data from the cloud to on-prem. These are scenarios where the amount of data faced is large, this drives the requirement for new levels of scalability and performance; S3 compatible object storages such as NetApp StorageGRID are purpose-built to reach such levels. Sample BIG-IP and StorageGRID Configuration This document is based upon tests and measurements using the following lab configuration. All devices in the lab were virtual machine-based offerings. The S3 service to be projected to the outside world, depicted in the above diagram and delivered to the client via the external network, will use a BIG-IP virtual server (VS) which is tied to an origin pool of three large-capacity StorageGRID nodes. The BIG-IP maintains the integrity of the NetApp nodes by frequent HTTP-based health checks. Should an unhealthy node be detected, it will be dropped from the list of active pool members. When content is written via the S3 protocol to any node in the pool, the other members are synchronized to serve up content should they be selected by BIG-IP for future read requests. The key recommendations and observations in building the lab include: Setup a local certificate authority such that all nodes can be trusted by the BIG-IP. Typically the local CA-signed certificate will incorporate every node’s FQDN and IP address within the listed subject alternate names (SAN) to make the backend solution streamlined with one single certificate. Different F5 profiles, such as FastL4 or FastHTTP, can be selected to reach the right tradeoff between the absolute capacity of stateful traffic load-balanced versus rich layer 7 functions like iRules or authentication. Modern techniques such as multi-part uploads or using HTTP Ranges for downloads can take large objects, and concurrently move smaller pieces across the load balancer, lowering total transaction times, and spreading work over more CPU cores. The S3 protocol, at its core, is a set of REST API calls. To facilitate testing, the widely used S3Browser (www.s3browser.com) was used to quickly and intuitively create S3 buckets on the NetApp offering and send/retrieve objects (files) through the BIG-IP load balancer. Setup the BIG-IP and StorageGrid Systems The StorageGrid solution is an array of storage nodes, provisioned with the help of an administrative host, the “Grid Manager”. For interactive users, no thick client is required as on-board web services allow a streamlined experience all through an Internet browser. The following is an example of Grid Manager, taken from a Chrome browser; one sees the three Storage Nodes setup have been successfully added. The load balancer, in our case the BIG-IP, is set up with a virtual server to support HTTPS traffic and distributed that traffic, which is S3 object storage traffic, to the three StorageGRID nodes. The following screenshot demonstrates that the BIG-IP is setup in a standard HA (active-passive pair) configuration and the three pool members are healthy (green, health checks are fine) and receiving/sending S3 traffic, as the byte counts are seen in the image to be non-zero. On the internal side of the BIG-IP, TCP port 18082 is being used for S3 traffic. To do testing of the solution, including features such as multi-part uploads and downloads, a popular S3 tool, S3Browser, was downloaded and used. The following shows the entirety of the S3Browser setup. Simply create an account (StorageGRID-Account-01 in our example) and point the REST API endpoint at the BIG-IP Virtual Server that is acting as the secure front door for our pool of NetApp nodes. The S3 Access Key ID and Secret values are generated at turn-up time of the NetApp appliances. All S3 traffic will, of course, be SSL/TLS encrypted. BIG-IP will intercept the SSL traffic (high-speed decrypt) and then re-encrypt when proxying the traffic to a selected origin pool member. Other valid load balancer setups exist; one might include an “off load” approach to SSL, whereby the S3 nodes safely co-located in a data center may prefer to receive non-SSL HTTP S3 traffic. This may see an overall performance improvement in terms of peak bandwidth per storage node, but this comes at the tradeoff of security considerations. Experimenting with S3 Protocol and Load Balancing With all the elements in place to start understanding the behavior of S3 and spreading traffic across NetApp nodes, a quick test involved creating a S3 bucket and placing some objects in that new bucket. Buckets are logical collections of objects, conceptually not that different from folders or directories in file systems. In fact, a S3 bucket could even be mounted as a folder in an operating system such as Linux. In their simplest form, most commonly, buckets can simply serve as high-capacity, performant storage and retrieval targets for similarly themed structured or unstructured data. In the first test, we created a new bucket (“audio-clip-bucket”) and uploaded four sample files to the new bucket using S3Browser. We then zeroed the statistics for each pool member on the BIG-IP, to see if even this small upload would spread S3 traffic across more than a single NetApp device. Immediately after the upload, the counters reflect that two StorageGRID nodes were selected to receive S3 transactions. Richly detailed, per-transaction visibility can be obtained by leveraging the F5 SSL Orchestrator (SSLO) feature on the BIG-IP, whereby copies of the bi-directional S3 traffic decrypted within the load balancer can be sent to packet loggers, analytics tools, or even protocol analyzers like Wireshark. The BIG-IP also has an onboard analytics tool, Application Visibility and Reporting (AVR) which can provide some details on the nuances of the S3 traffic being proxied. AVR demonstrates the following characteristics of the above traffic, a simple bucket creation and upload of 4 objects. With AVR, one can see the URL values used by S3, which include the bucket name itself as well as transactions incorporating the object names as URLs. Also, the HTTP methods used included both GETS and PUTS. The use of HTTP PUT is expected when creating a new bucket. S3 is not governed by a typical standards body document, such as an IETF Request for Comment (RFC), but rather has evolved out of AWS and their use of S3 since 2006. For details around S3 API characteristics and nomenclature, this site can be referenced. For example, the expected syntax for creating a bucket is provided, including the fact that it should be an HTTP PUT to the root (/) URL target, with the bucket configuration parameters including name provided within the HTTP transaction body. Achieving High Performance S3 with BIG-IP and StorageGRID A common concern with protocols, such as HTTP, is head-of-line blocking, where one large, lengthy transaction blocks subsequent desired, and now queued transactions. This is one of the reasons for parallelism in HTTP 1.X, where loading 30 or more objects to paint a web page will often utilize two, four, or even more concurrent TCP sessions. Another performance issue when dealing with very large transactions is, without parallelism, even those most performant networks will see an established TCP session reach a maximum congestion window (CWND) where no more segments may be in put inflight until new TCP ACKs arrive back. Advanced TCP options like TCP exponential windowing or TCP SACK can help, but regardless of this, the achievable bandwidth of any one TCP session is bounded and may also frequently task only one core in multi-core CPUs. With the BIG-IP serving as the intermediary, large S3 transactions may default to “multi-part” uploads and downloads. The larger objects become a series of smaller objects that conveniently can be load-balanced by BIG-IP across the entire cluster of NetApp nodes. As displayed in the following diagram, we are asking for multi-part uploads to kick in for objects larger than 5 megabytes. After uploading a 20-megabyte file (technically, 20,000,000 bytes) the BIG-IP shows the traffic distributed across multiple NetApp nodes to the tune of 160.9 million bits. The incoming bits, incoming from the perspective of the origin pool members, confirm the delivery of the object with a small amount of protocol overhead (bits divided by eight to reach bytes). The value of load balancing manageable chunks of very large objects will pay dividends over time with faster overall transaction completion times due to the spreading of traffic across NetApp nodes, more TCP sessions reaching high congestion window values, and no single-core bottle necks in multicore equipment. Tuning BIG-IP for High Performance S3 Service Delivery The F5 BIG-IP offers a set of different profiles it can run its Local Traffic Manager (LTM) module in accordance with; LTM is the heart of the server load balancing function. The most performant profile in terms of attainable traffic load is the “FastL4” profile. This, and other profiles such as “OneConnect” or “FastHTTP”, can be tied to a virtual server, and details around each profile can be found here within the BIG-IP GUI: The FastL4 profile can increase virtual server performance and throughput for supported platforms by using the embedded Packet Velocity Acceleration (ePVA) chip to accelerate traffic. The ePVA chip is a hardware acceleration field programmable gate array (FPGA) that delivers high-performance L4 throughput by offloading traffic processing to the hardware acceleration chip. The BIG-IP makes flow acceleration decisions in software and then offloads eligible flows to the ePVA chip for that acceleration. For platforms that do not contain the ePVA chip, the system performs acceleration actions in software. Software-only solutions can increase performance in direct relationship to the hardware offered by the underlying host. As examples of BIG-IP virtual edition (VE) software running on mid-grade hardware platforms, results with Dell can be found here and similar experiences with HPE Proliant platforms are here. One thing to note about FastL4 as the profile to underpin a performance mode BIG-IP virtual server is that it is layer 4 oriented. For certain features that involve layer 7 HTTP related fields, such as using iRules to swap HTTP headers or perform HTTP authentication, a different profile might be more suitable. A bonus of FastL4 are some interesting specific performance features catering to it. In the BIG-IP version 17 release train, there is a feature to quickly tear down, with no delay, TCP sessions no longer required. Most TCP stacks implement TCP “2MSL” rules, where upon receiving and sending TCP FIN messages, the socket enters a lengthy TCP “TIME_WAIT” state, often minutes long. This stems back to historically bad packet loss environments of the very early Internet. A concern was high latency and packet loss might see incoming packets arrive at a target very late, and the TCP state machine would be confused if no record of the socket still existed. As such, the lengthy TIME_WAIT period was adopted even though this is consuming on-board resources to maintain the state. With FastL4, the “fast” close with TCP reset option now exists, such that any incoming TCP FIN message observed by BIG-IP will result in TCP RESETS being sent to both endpoints, normally bypassing TIME_WAIT penalties. OneConnect and FastHTTP Profiles As mentioned, other traffic profiles on BIG-IP are directed towards Layer 7 and HTTP features. One interesting profile is F5’s “OneConnect”. The OneConnect feature set works with HTTP Keep-Alives, which allows the BIG-IP system to minimize the number of server-side TCP connections by making existing connections available for reuse by other clients. This reduces, among other things, excessive TCP 3-way handshakes (Syn, Syn-Ack, Ack) and mitigates the small TCP congestion windows that new TCP sessions start with and only increases with successful traffic delivery. Persistent server-side TCP connections ameliorate this. When a new connection is initiated to the virtual server, if an existing server-side flow to the pool member is idle, the BIG-IP system applies the OneConnect source mask to the IP address in the request to determine whether it is eligible to reuse the existing idle connection. If it is eligible, the BIG-IP system marks the connection as non-idle and sends a client request over it. If the request is not eligible for reuse, or an idle server-side flow is not found, the BIG-IP system creates a new server-side TCP connection and sends client requests over it. The last profile considered is the “Fast HTTP” profile. The Fast HTTP profile is designed to speed up certain types of HTTP connections and again strives to reduce the number of connections opened to the back-end HTTP servers. This is accomplished by combining features from the TCP, HTTP, and OneConnect profiles into a single profile that is optimized for network performance. A resulting high performance HTTP virtual server processes connections on a packet-by-packet basis and buffers only enough data to parse packet headers. The performance HTTP virtual server TCP behavior operates as follows: the BIG-IP system establishes server-side flows by opening TCP connections to pool members. When a client makes a connection to the performance HTTP virtual server, if an existing server-side flow to the pool member is idle, the BIG-IP LTM system marks the connection as non-idle and sends a client request over the connection. Summary The NetApp StorageGRID multi-node S3 compatible object storage solution fits well with a high-performance server load balancer, thus making the F5 BIG-IP a good fit. S3 protocol can itself be adjusted to improve transaction response times, such as through the use of multi-part uploads and downloads, amplifying the default load balancing to now spread even more traffic chunks over many NetApp nodes. BIG-IP has numerous approaches to configuring virtual servers, from highest performance L4-focused profiles to similar offerings that retain L7 HTTP awareness. Lab testing was accomplished using the S3Browser utility and results of traffic flows were confirmed with both the standard BIG-IP GUI and the additional AVR analytics module, which provides additional protocol insight.1.6KViews3likes0Comments