metrics
14 TopicsF5 Distributed Cloud Telemetry (Metrics) - ELK Stack
As we are looking into exporting metrics data to the ELK stack using Python script, let's first get a high-level overview of the same. Metrics are numerical values that provide actionable insights into the performance, health and behavior of systems or applications over time, allowing teams to monitor and improve the reliability, stability and performance of modern distributed systems. ELK Stack (Elasticsearch, Logstash, and Kibana) is a powerful open-source platform. It enables organizations to collect, process, store, and visualize telemetry data such as logs, metrics, and traces from remote systems in real-time.99Views1like0CommentsF5 Distributed Cloud Telemetry (Metrics) - Prometheus
Scope This article walks through the process of collecting metrics from F5 Distributed Cloud’s (XC) Service Graph API and exposing them in a format that Prometheus can scrape. Prometheus then scrapes these metrics, which can be visualized in Grafana. Introduction Metrics are essential for gaining real-time insight into service performance and behaviour. F5 Distributed Cloud (XC) provides a Service Graph API that captures service-to-service communication data across your infrastructure. Prometheus, a leading open-source monitoring system, can scrape and store time-series metrics — and when paired with Grafana, offers powerful visualization capabilities. This article shows how to integrate a custom Python-based exporter that transforms Service Graph API data into Prometheus-compatible metrics. These metrics are then scraped by Prometheus and visualized in Grafana, all running in Docker for easy deployment. Prerequisites Access to F5 Distributed Cloud (XC) SaaS tenant VM with Python3 installed Running Prometheus instance (If not check "Configuring Prometheus" section below) Running Grafana instance (If not check "Configuring Grafana" section below) Note – In this demo, an AWS VM is used with Python installed and running exporter (port - 8888), Prometheus (host port - 9090) and Grafana (port - 3000) running as docker instance, all in same VM. Architecture Overview F5 XC API → Python Exporter → Prometheus → Grafana Building the Python Exporter To collect metrics from the F5 Distributed Cloud (XC) Service Graph API and expose them in a format Prometheus understands, we created a lightweight Python exporter using Flask. This exporter acts as a transformation layer — it fetches service graph data, parses it, and exposes it through a /metrics endpoint that Prometheus can scrape. Code Link -> exporter.py Key Functions of the Exporter Uses XC-Provided .p12 File for Authentication: To authenticate API requests to F5 Distributed Cloud (XC), the exporter uses a client certificate packaged in a .p12 file. This file must be manually downloaded from the F5 XC console (steps) and stored on the VM where the Python script runs. The script expects the full path to the .p12 file and its associated password to be specified in the configuration section. Fetches Service Graph Metrics: The script pulls service-level metrics such as request rates, error rates, throughput, and latency from the XC API. It supports both aggregated and individual load balancer views. Processes and Structures the Data: The exporter parses the raw API response to extract the latest metric values and converts them into Prometheus exposition format. Each metric is labelled (e.g., by vhost and direction) for flexibility in Grafana queries. Exposes a /metrics Endpoint: A Flask web server runs on port 8888, serving the /metrics endpoint. Prometheus periodically scrapes this endpoint to ingest the latest metrics. Handles Multiple Metric Types: Traffic metrics and health scores are handled and formatted individually. Each metric includes a descriptive name, type declaration, and optional labels for fine-grained monitoring and visualization. Running the Exporter python3 exporter.py > python.log 2>&1 & This command runs exporter.py using Python3 in background and redirects all standard output and error messages to python.log for easier debugging. Configuring Prometheus docker run -d --name=prometheus --network=host -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus:latest Prometheus is running as docker instance in host network (port 9090) mode with below configuration (prometheus.yml), scrapping /metrics endpoint exposed from python flask exporter on port 8888 every 60 seconds. Configuring Grafana docker run -d --name=grafana -p 3000:3000 grafana/grafana:latest Private IP of the Prometheus docker instance along with port (9090) is used as data source in Grafana configuration. Once Prometheus is configured under Grafana Data sources, follow below steps: Navigate to Explore menu Select “Prometheus” in data source picker Choose appropriate metric, in this case “f5xc_downstream_http_request_rate” Select desired time range and click “Run query” Observe metrics graph will be displayed Note : Some requests need to be generated for metrics to be visible in Grafana. A broader, high-level view of all metrics can be accessed by navigating to “Drilldown” and selecting “Metrics”, providing a comprehensive snapshot across services. Conclusion F5 Distributed Cloud’s (F5 XC) Service Graph API provides deep visibility into service-to-service communication, and when paired with Prometheus and Grafana, it enables powerful, real-time monitoring without vendor lock-in. This integration highlights F5 XC’s alignment with open-source ecosystems, allowing users to build flexible and scalable observability pipelines. The custom Python exporter bridges the gap between the XC API and Prometheus, offering a lightweight and adaptable solution for transforming and exposing metrics. With Grafana dashboards on top, teams can gain instant insight into service health and performance. This open approach empowers operations teams to respond faster, optimize more effectively, and evolve their observability practices with confidence and control.195Views3likes0CommentsUnlocking Insights: Enhancing Observability in F5 NGINXaaS for Azure for Optimal Operations
Introduction To understand application performance, you need more than just regular health checks. You need to look at the system’s behavior, how users use it, and find possible slowdowns before they become big problems. By using F5 NGINXaaS for Azure, organizations can gain enhanced visibility into their backend applications through extensive metrics, API (access) logs, and operational logs within Azure environments. This proactive approach helps prevent minor issues from developing into major challenges while optimizing resource efficiency. This technical guide highlights advanced observability techniques and demonstrates how organizations can leverage F5 NGINXaaS to create robust, high-performing application delivery solutions that ensure seamless and responsive user experiences. Benefits of F5 NGINX as a Service F5 NGINXaaS for Azure provides robust integration with ecosystem tools designed to monitor and analyze application health and performance. It uses rich telemetry from granular metrics across various protocols, including HTTP, TLS, TCP, and UDP. For technical experts overseeing deployments in Azure, this service delivers valuable insights that facilitate more effective troubleshooting and optimize workflows for streamlined operations. Key advantages of F5 NGINXaaS include access to over 200 detailed health and performance metrics that are critical for ensuring application stability, scalability, and efficiency. Please refer to the documentation for detailed information to learn more about the available metrics. There are two ways to monitor metrics in F5 NGINXaaS for Azure, providing flexibility in how you can track the health and performance of your applications: Azure Monitoring Integration for F5 NGINXaaS: An Azure-native solution delivering detailed analytical reports and customizable alerts. Grafana Dashboard Support: A visualization tool specifically designed to provide real-time, actionable insights into system health and performance. Dive Deep with Azure Monitoring for F5 NGINXaaS Azure Monitoring integration with F5 NGINXaaS provides a comprehensive observability solution tailored to dynamic cloud environments, equipping teams with the tools to enhance application performance and reliability. A crucial aspect of this solution is the integration of F5 NGINXaaS access and error logs, which offers insights essential for troubleshooting and resolving issues effectively. By combining these logs with deep insights into application and performance metrics such as request throughput, latency, error rates, and resource utilization, technical teams can make informed decisions to optimize their applications. Key Features Include: Advanced Analytics: Explore detailed traffic patterns and usage trends to better understand application load dynamics. This allows teams to fine-tune configurations and improve performance based on actual user activity. Customizable Alerts: Set specific thresholds for key performance indicators to receive immediate notifications about anomalies, such as unexpected spikes in 5xx error rates or latency challenges. This proactive approach empowers teams to resolve incidents swiftly and minimize their impact. Detailed Metrics: Utilize comprehensive metrics encompassing connection counts, active connections, and request processing times. These insights facilitate better resource allocation and more efficient traffic management. Logs Integration: Access and analyze F5 NGINXaaS logs alongside performance metrics, providing a holistic view of application behavior. This integration is vital for troubleshooting, enabling teams to correlate log data with observability insights for effective issue identification and resolution. Scalability Insights: Monitor real-time resource allocation and consumption. Predict growth challenges and optimize scaling decisions to ensure your F5 NGINXaaS service deployments can handle variable client load effectively. By integrating Azure Monitoring with F5 NGINXaaS, organizations can significantly enhance their resilience, swiftly tackle performance challenges, and ensure that their services consistently deliver outstanding user experiences. With actionable data at their fingertips, teams are well-positioned to achieve operational excellence and foster greater user satisfaction. Visualize Success with Native Azure Grafana Dashboard Enable the Grafana dashboard and import the F5 NGINXaaS metrics dashboard to take your monitoring capabilities to the next level. This dynamic integration provides a clear view of various performance metrics, allowing teams to make informed decisions backed by insightful data. Together, Azure Monitoring and the Grafana Dashboard form a strong alliance, creating a comprehensive observability solution that amplifies your application’s overall performance and reliability. The Grafana interface allows real-time querying of performance metrics, offering intuitive visual tools like graphs and charts that simplify complex data interpretation. With Azure Monitoring, Grafana builds a robust observability stack, ensuring proactive oversight and reactive diagnostics. Getting Started with NGINXaaS Azure Workshop We have curated self-paced workshops designed to help you effectively leverage the enhanced observability features of F5 NGINXaaS. These workshops provide valuable insights and hands-on experience, empowering you to develop robust observability in a self-directed learning environment. Azure monitoring lab workshop will enhance your skills in creating and analyzing access logs with NGINX. You’ll learn to develop a comprehensive log format, capturing essential details from backend servers. By the end, you'll be equipped to use Azure’s monitoring tools effectively, significantly contributing to your growth and success. In the Native Azure Grafana Dashboard workshop, you'll explore the integration of F5 NGINXaaS for Azure with Grafana for effective service monitoring. You'll create a dashboard to track essential metrics for your backend servers. This hands-on session will equip you with the skills to analyze real-time data and make informed decisions backed by valuable insights. Upon completing this lab exercise, you will have gained practical expertise in leveraging enhanced observability features of F5 NGINXaaS. You will be proficient in creating and analyzing access logs, ensuring you can effectively capture critical data from backend servers. Additionally, you will have developed the skills necessary to integrate F5 NGINXaaS with Grafana, allowing you to build a dynamic dashboard that tracks essential metrics in real-time. This hands-on experience will empower you to make informed decisions based on valuable insights, significantly enhancing your capabilities in monitoring and maintaining your applications. Conclusion By fully utilizing the observability features of F5 NGINXaaS, the organization can gain valuable insights that enhance performance and efficiency. With Azure Monitoring and Grafana working together, teams can manage proactively and make informed, data-driven decisions. This approach leads to smoother web experiences and improves operational performance. Interested in getting started with F5 NGINXaaS for Azure? You can find us on the Azure marketplace.190Views1like0CommentssysHttpStatRespBucket1k SNMP metrics meaning
Hi, I would like to get information about few of the exposed SNMP metrics which description is very unclear... sysHttpStatRespBucket1k with oid 1.3.6.1.4.1.3375.2.1.1.2.4.17 sysHttpStatRespBucket4k with oid 1.3.6.1.4.1.3375.2.1.1.2.4.18 If we take sysHttpStatRespBucket1k, I found following description "The number of responses under 1k." but are we talking of a HTTP response size or a duration? Thanks for light anyone could provide on this topic.Solved49Views0likes1CommentOIDs for virtuals servers stats
hi everybody, I want gets some stats by SNMP like the stats display on LTM BIG-IP GUI (Statistics ›› Module Statistics : Local Traffic >> Statistics Type = virtual servers) What is the OIDs for each stats virtual server name virtual server ip virtual server bits (in/out), packets (in/out) virtual server current connection thanks a lot961Views0likes1CommentF5 Monitoring
Hi, I put together technologies and prepare a full fledge monitoring system for F5 Device overall and LTM module. Can be extend with other modules such as asm, gtm etc. There is a link for presentation regarding all details. Please check if interested can contact with me for details. Here Presentation. Compare tmm cpu cycles with vserver and irules Interface / Vlan PPS/BWD values corelation with vservers showing top usage on what. Showing http compression values as active bandwidth corelation with compression Irule cpu cycle checks and corelate which i rule uses most and what effects after changing irule to operate differently. Saving / Usage on Irules. Note: Much more use cases like this can be added based upon needs. Thanks378Views0likes3CommentsGTM: avoiding flapping DNS answers with RTT method
I am in the need to understand how GTM metrics work for GTM LDNS probes. 1) how they can be logged the decision? I am using 11.2 but moving fast to 11.4.1.. :) 2) Lets make an example. if our GTM chooses a VIP in USA for 100 consecutive times because the RTT is lower going to this USA VIP and then for 1 time it gets a better value - for whatever reason - to go to another VIP, for example to Australia, will it be considered valid the last value which differs from the original 100 previous time? is there cache variation value that can be configured to avoid this flapping choices? (We had this choice in Cisco GSS). 3) how long is the non-optimizes Australian value kept in cache until a new value is reconsidered? It is the Inactive timeout of 28 days?Solved798Views0likes8CommentsArchitecting Scalable Infrastructures: CPS versus DPS
#webperf As we continue to find new ways to make connections more efficient, capacity planning must look to other metrics to ensure scalability without compromising performance. Infrastructure metrics have always been focused on speeds and feeds. Throughput, packets per second, connections per second, etc… These metrics have been used to evaluate and compare network infrastructure for years, ultimately being used as a critical component in data center design. This makes sense. After all, it's not rocket science to figure out that a firewall capable of handling 10,000 connections per second (CPS) will overwhelm a next hop (load balancer, A/V scanner, etc… ) device only capable of 5,000 CPS. Or will it? The problem with old skool performance metrics is they focus on ingress, not egress capacity. With SDN pushing a new focus on both northbound and southbound capabilities, it makes sense to revisit the metrics upon which we evaluate infrastructure and design data centers. CONNECTIONS versus DECISIONS As we've progressed from focusing on packets to sessions, from IP addresses to users, from servers to applications, we've necessarily seen an evolution in the intelligence of network components. It's not just application delivery that's gotten smarter, it's everything. Security, access control, bandwidth management, even routing (think NAC), has become much more intelligent. But that intelligence comes at a price: processing. That processing turns into latency as each device takes a certain amount of time to inspect, evaluate and ultimate decide what to do with the data. And therein lies the key to our conundrum: it makes a decision. That decision might be routing based or security based or even logging based. What the decision is is not as important as the fact that it must be made. SDN necessarily brings this key differentiator between legacy and next-generation infrastructure to the fore, as it's just software-defined but software-deciding networking. When a switch doesn't know what to do with a packet in SDN it asks the controller, which evaluates and makes a decision. The capacity of SDN – and of any modern infrastructure – is at least partially determined by how fast it can make decisions. Examples of decisions: URI-based routing (load balancers, application delivery controllers) Virus-scanning SPAM scanning Traffic anomaly scanning (IPS/IDS) SQLi / XSS inspection (web application firewalls) SYN flood protection (firewalls) BYOD policy enforcement (access control systems) Content scrubbing (web application firewalls) The DPS capacity of a system is not the same as its connection capacity, which is merely the measure of how many new connections a second can be established (and in many cases how many connections can be simultaneously sustained). Such a measure is merely determining how optimized the networking stack of any given solution might be, as connections – whether TCP or UDP or SMTP – are protocol oriented and it is the networking stack that determines how well connections are managed. The CPS rate of any given device tells us nothing about how well it will actually perform its appointed tasks. That's what the Decisions Per Second (DPS) metric tells us. CONSIDERING BOTH CPS and DPS Reality is that most systems will have a higher CPS compared to its DPS. That's not necessarily bad, as evaluating data as it flows through a device requires processing, and processing necessarily takes time. Using both CPS and DPS merely recognizes this truth and forces it to the fore, where it can be used to better design the network. A combined metric helps design the network by offering insight into the real capacity of a given device, rather than a marketing capacity. When we look only at CPS, for example, we might feel perfectly comfortable with a topological design with a flow of similar CPS capacities. But what we really want is to make sure that DPS –> CPS (and vice-versa) capabilities were matched up correctly, lest we introduce more latency than is necessary into a given flow. What we don't want is to end up with is a device with a high DPS rate feeding into a device with a lower CPS rate. We also don't want to design a flow in which DPS rates successively decline. Doing so means we're adding more and more latency into the equation. The DPS rate is a much better indicator of capacity than CPS for designing high-performance networks because it is a realistic measure of performance, and yet a high DPS coupled with a low CPS would be disastrous. Luckily, it is almost always the case that a mismatch in CPS and DPS will favor CPS, with DPS being the lower of the two metrics in almost all cases. What we want to see is as close a CPS:DPS ratio as possible. The ideal is 1:1, of course, but given the nature of inspecting data it is unrealistic to expect such a tight ratio. Still, if the ratio becomes too high, it indicates a potential bottleneck in the network that must be addressed. For example, assume an extreme case of a CPS:DPS of 2:1. The device can establish 10,000 CPS, but only process at a rate of 5,000 DPS, leading to increasing latency or other undesirable performance issues as connections queue up waiting to be processed. Obviously there's more at play than just new CPS and DPS (concurrent connection capability is also a factor) but the new CPS and DPS relationship is a good general indicator of potential issues. Knowing the DPS of a device enables architects to properly scale out the infrastructure to remediate potential bottlenecks. This is particularly true when TCP multiplexing is in play, because it necessarily reduces CPS to the target systems but in no way impacts the DPS. On the ingress, too, are emerging protocols like SPDY that make more efficient use of TCP connections, making CPS an unreliable measure of capacity, especially if DPS is significantly lower than the CPS rating of the system. Relying upon CPS alone – particularly when using TCP connection management technologies - as a means to achieve scalability can negatively impact performance. Testing systems to understand their DPS rate is paramount to designing a scalable infrastructure with consistent performance. The Need for (HTML5) Speed SPDY versus HTML5 WebSockets Y U No Support SPDY Yet? Curing the Cloud Performance Arrhythmia F5 Friday: Performance, Throughput and DPS Data Center Feng Shui: Architecting for Predictable Performance On Cloud, Integration and Performance858Views0likes0CommentsWILS: SSL TPS versus HTTP TPS over SSL
The difference between these two performance metrics is significant so be sure you know which one you’re measuring, and which one you wanted to be measuring. It may be the case that you’ve decided that SSL is, in fact, a good idea for securing data in transit. Excellent. Now you’re trying to figure out how to implement support and you’re testing solutions or perhaps trying to peruse reports someone else generated from testing. Excellent. I’m a huge testing fan and it really is one of the best ways to size a solution specifically for your environment. Some of the terminology used to describe specific performance metrics in application delivery, however, can be misleading. The difference between SSL TPS (Transactions per second) and HTTP TPS over SSL, for example, are significant and therefore should not be used interchangeably when comparing performance and capacity of any solution – that goes for software, hardware, or some yet-to-be-defined combination thereof. The reasons why interpreting claims of SSL TPS are so difficult is due to the ambiguity that comes from SSL itself. SSL “transactions” are, by general industry agreement (unenforceable, of course) a single transaction that is “wrapped” in an SSL session. Generally speaking one SSL transaction is considered: 1. Session establishment (authentication, key exchange) 2. Exchange of data over SSL, often a 1KB file over HTTP 3. Session closure Seems logical, but technically speaking a single SSL transaction could be interpreted as any single transaction conducted over an SSL encrypted session because the very act of transmitting data over the SSL session necessarily requires SSL-related operations. SSL session establishment requires a handshake and an exchange of keys, and the transfer of data within such a session requires the invocation of encryption and decryption operations (often referred to as bulk encryption). Therefore it is technically accurate for SSL capacity/performance metrics to use the term “SSL TPS” and be referring to two completely different things. This means it is important that whomever is interested in such data must do a little research to determine exactly what is meant by SSL TPS when presented with such data. Based on the definition the actual results mean different things. When used to refer to HTTP TPS over SSL the constraint is actually on the bulk encryption rate (related more to response time, latency, and throughput measurements), while SSL TPS measures the number of SSL sessions that can be created per second and is more related to capacity than response time metrics. It can be difficult to determine which method was utilized, but if you see the term “SSL ID re-use” anywhere, you can be relatively certain the test results refer to HTTP TPS over SSL rather than SSL TPS. When SSL session IDs are reused, the handshaking and key exchange steps are skipped, which reduces the number of computationally expensive RSA operations that must be performed and artificially increases the results. As always, if you aren’t sure what a performance metric really means, ask. If you don’t get a straight answer, ask again, or take advantage of all that great social networking you’re doing and find someone you trust to help you determine what was really tested. Basing architectural decisions on misleading or misunderstood data can cause grief and be expensive later when you have to purchase additional licenses or solutions to bring your capacity up to what was originally expected. WILS: Write It Like Seth. Seth Godin always gets his point across with brevity and wit. WILS is an ATTEMPT TO BE concise about application delivery TOPICS AND just get straight to the point. NO DILLY DALLYING AROUND. The Anatomy of an SSL Handshake When Did Specialized Hardware Become a Dirty Word? WILS: Virtual Server versus Virtual IP Address Following Google’s Lead on Security? Don’t Forget to Encrypt Cookies WILS: What Does It Mean to Align IT with the Business WILS: Three Ways To Better Utilize Resources In Any Data Center WILS: Why Does Load Balancing Improve Application Performance? WILS: Application Acceleration versus Optimization All WILS Topics on DevCentral What is server offload and why do I need it?1.4KViews0likes3CommentsF5 Friday: Performance, Throughput and DPS
No, not World of Warcraft “Damage per Second” - infrastructure “Decisions per second”. Metrics are tricky. Period. Comparing metrics is even trickier. The purpose of performance metrics is, of course, to measure performance. But like most tests, before you can administer such a test you really need to know what it is you’re testing. Saying “performance” isn’t enough and never has been, as the term has a wide variety of meanings that are highly dependent on a number of factors. The problem with measuring infrastructure performance today – and this will continue to be a major obstacle in metrics-based comparisons of cloud computing infrastructure services – is that we’re still relying on fairly simple measurements as a means to determine performance. We still focus on speeds and feeds, on wires and protocols processing. We look at throughput, packets per second (PPS) and connections per second (CPS) for network and transport layer protocols. While these are generally accurate for what they’re measuring, we start running into real problems when we evaluate the performance of any component – infrastructure or application – in which processing, i.e. decision making, must occur. Consider the difference in performance metrics between a simple HTTP request / response in which the request is nothing more than a GET request paired with a 0-byte payload response and an HTTP POST request filled with data that requires processing not only on the application server, but on the database, and the serialization of a JSON response. The metrics that describe the performance of these two requests will almost certainly show that the former has a higher capacity and faster response time than the latter. Obviously those who wish to portray a high-performance solution are going to leverage the former test, knowing full well that those metrics are “best case” and will almost never be seen in a real environment because a real environment must perform processing, as per the latter test. Suggestions that a standardized testing environment, similar to application performance comparisons using the Pet Shop Application, are generally met with a frown because using a standardized application to induce real processing delays doesn’t actually test the infrastructure component’s processing capabilities, it merely adds latency on the back-end and stresses capacity of the infrastructure component. Too, such a yardstick would fail to really test what’s important – the speed and capacity of an infrastructure component to perform processing itself, to make decisions and apply them on the component – whether it be security or application routing or transformational in nature. It’s an accepted fact that processing of any kind, at any point along the application delivery service chain induces latency which impacts capacity. Performance numbers used in comparisons should reveal the capacity of a system including that processing impact. Complicating the matter is the fact that since there are no accepted standards for performance measurement, different vendors can use the same term to discuss metrics measured in totally different ways. THROUGHPUT versus PERFORMANCE Infrastructure components, especially those that operate at the higher layers of the networking stack, make decisions all the time. A firewall service may make a fairly simple decision: is this request for this port on this IP address allowed or denied at this time? An identity and access management solution must make similar decisions, taking into account other factors, answering the question is this user coming from this location on this device allowed to access this resource at this time? Application delivery controllers, a.k.a. load balancers, must also make decisions: which instance has the appropriate resources to respond to this user and this particular request within specified performance parameters at this time? We’re not just passing packets anymore, and therefore performance tests that measure only the surface ability to pass packets or open and close connections is simply not enough. Infrastructure today is making decisions and because those decisions often require interception, inspecting and processing of application data – not just individual packets – it becomes more important to compare solutions from the perspective of decisions per second rather than surface-layer protocol per second measurements. Decision-based performance metrics are a more accurate gauge as to how the solution will perform in a “real” environment, to be sure, as it’s portraying the component’s ability to do what it was intended to do: make decisions and perform processing on data. Layer 4 or HTTP throughput metrics seldom come close to representing the performance impact that normal processing will have on a system, and, while important, should only be used with caution when considering performance. Consider the metrics presented by Zeus Technologies in a recent performance test (Zeus Traffic Manager - VMware vSphere 4 Performance on Cisco UCS – 2010 and F5’s performance results from 2010 (F5 2010 Performance Report) While showing impressive throughput in both cases, it also shows the performance impact that occurs when additional processing – decisions – are added into the mix. The ability of any infrastructure component to pass packets or manage connections (TCP capacity) is all well and good, but these metrics are always negatively impacted once the component begins actually doing something, i.e. making decisions. Being able to handle almost 20 Gbps throughput is great but if that measurement wasn’t taken while decisions were being made at the same time, your mileage is not just likely to vary – it will vary wildly. Throughput is important, don’t get me wrong. It’s part of – or should be part of – the equation used to determine what solution will best fit the business and operational needs of the organization. But it’s only part of the equation, and probably a minor part of that decision at that. Decision based metrics should also be one of the primary means of evaluating the performance of an infrastructure component today. “High performance” cannot be measured effectively based on merely passing packets or making connections – high performance means being able to push packets, manage connections and make decisions, all at the same time. This is increasingly a fact of data center life as infrastructure components continue to become more “intelligent”, as they become a first class citizen in the enterprise infrastructure architecture and are more integrated and relied upon to assist in providing the services required to support today’s highly motile data center models. Evaluating a simple load balancing service based on its ability to move HTTP packets from one interface to the other with no inspection or processing is nice, but if you’re ultimately planning on using it to support persistence-based routing, a.k.a. sticky sessions, then the rate at which the service executes the decisions necessary to support that service should be as important – if not more – to your decision making processes. DECISIONS per SECOND There are very few pieces of infrastructure on which decisions are not made on a daily basis. Even the use of VLANs requires inspection and decision-making to occur on the simplest of switches. Identity and access management solutions must evaluate a broad spectrum of data in order to make a simple “deny” or “allow” decision and application delivery services make a variety of decisions across the security, acceleration and optimization demesne for every request they process. And because every solution is architected differently and comprised of different components internally, the speed and accuracy with which such decisions are made are variable and will certainly impact the ability of an architecture to meet or exceed business and operational service-level expectations. If you’re not testing that aspect of the delivery chain before you make a decision, you’re likely to either be pleasantly surprised or hopelessly disappointed in the decision making performance of those solutions. It’s time to start talking about decisions per second and performance of infrastructure in the context it’s actually used in data center architectures rather than as stand-alone, packet-processing, connection-oriented devices. And as we do, we need to remember that every network is different, carrying different amounts of traffic from different applications. That means any published performance numbers are simply guidelines and will not accurately represent the performance experienced in an actual implementation. However, the published numbers can be valuable tools in comparing products… as long as they are based on the same or very similar testing methodology. Before using any numbers from any vendor, understand how those numbers were generated and what they really mean, how much additional processing do they include (if any). When looking at published performance measurements for a device that will be making decisions and processing traffic, make sure you are using metrics based on performing that processing. 1024 Words: Ch-ch-chain of Fools On Cloud, Integration and Performance As Client-Server Style Applications Resurface Performance Metrics Must Include the API F5 Friday: Speeds, Feeds and Boats Data Center Feng Shui: Architecting for Predictable Performance Operational Risk Comprises More Than Just Security Challenging the Firewall Data Center Dogma Dispelling the New SSL Myth555Views0likes0Comments