application delivery
26 Topics02 - Visualization of F5 BIG-IP metrics on Grafana using Prometheus and Telemetry Streaming service
Configuration using CLI of F5 BIG-IP device Following steps for the configuration of telemetry streaming consumer target using CLI of F5 BIG-IP device are discussed below: Once you have accessed your F5 BIG-IP device CLI terminal then access either your default admin credentials or the new user you’ve recently created on the above section. Then execute the following commands on the terminal: On the username and password section, you either enter your default admin credentials or the new user you’ve recently created has the administrator privilege. curl -u username:password -k https://localhost/mgmt/shared/telemetry/declare Note: -k, --insecure to be made secure by using the CA certificate bundle installed by default. This makes all connections considered "insecure" fail unless -k, --insecure is used. ChangChange into tmp directory and create a file called ts-config.json and I am using vi editor for it. cd /tmp vi ts-config.json Paste the Telemetry Streaming declaration and then save the file and exit the vi editor. { "class": "Telemetry", "My_Poller": { "class": "Telemetry_System_Poller", "interval": 0 }, "My_System": { "class": "Telemetry_System", "enable": "true", "systemPoller": [ "My_Poller" ] }, "metrics": { "class": "Telemetry_Pull_Consumer", "type": "Prometheus", "systemPoller": "My_Poller" } } Then execute the following command on the terminal on thesame directory /tmp and change the username and password section with your F5 BIG-IP device credentialshaving the administrator privilege. curl -X POST -u username:password -khttps://localhost/mgmt/shared/telemetry/declare-d @ts-config.json -H “content-type:application/json” To verify the available metrics curl -u username:password -k https://localhost/mgmt/shared/telemetry/pullconsumer/metrics Section III: Configuration of Prometheus Once the telemetry streaming service has been successfully configured and the metrics are available on the path. We need to configure Prometheus in order to scrape the metrics data on the predefined path. The following are the steps to configure the Prometheus: Note: On this user-guide demonstration, both Grafana and Prometheus are installed on the same host with different service ports as mentioned earlier. CentOS 7 is used as the OS for this host machine and you may have different syntax to view the following status check. First, check the status of the Prometheus sudo systemctl status prometheus.service View the current working directory and change into /etc/prometheus pwd cd /etc/prometheus ls -al global: scrape_interval: 10s scrape_configs: - job_name: 'TelemetryStreaming' scrape_timeout: 30s scrape_interval: 30s scheme: https tls_config: insecure_skip_verify: true metrics_path: '/mgmt/shared/telemetry/pullconsumer/metrics' basic_auth: username: 'F5-BIG-IP-username' password: 'F5-BIG-IP-password' static_configs: - targets: ['BIGIP-managementIP:443'] Then restart the Prometheus service and check the status of the Prometheus service. sudo systemctl restart prometheus.service sudo systemctl status prometheus.service Note: If the configuration is correct, then the Prometheus service will be enabled otherwise, the status of the Prometheus service will be disabled. To further verify whether instances has been discovered on the Prometheus: -Go tohttp://prometheus-ip:service/port - Click on the Status option and select the Target option Section IV: Configuration on Grafana using Prometheus as a data source In this section, we need to connect Prometheus as a data source on Grafana Once the data source has been successfully configured on Grafna then Create a new dashboard and select Prometheus as the data source then select the relevant metrics and change the refresh interval as required. Save and apply the panel. Then,Save the dashboard and view the metrics on the Grafana dashboard. The possible issue that can arise during the configuration If you use the default TS declare from the official telemetry streaming document website then you may fail to view the available metrics on the mentioned link: https://<f5-management-ip>/mgmt/shared/telemetry/pullconsumer/metrics3.6KViews3likes0CommentsKnowledge sharing: Velos and rSeries (F5OS) basic troubleshooting, logs and commands
This another part of my Knowledge sharing articles, where I will take a deeper look into Velos and rSeries investigation of issues, logs and command. 1. Velos HA controller and blade issues. As the Velos system is the one with two controllers in active/standby mode only with Velos it could be needed to check if there is an issue with the controller's HA. As the controller's HA order can be different for the system and the different partitions to check the HA for the system use the /var/log_controller/cc-confd file or for a partition HA issue look at the partition velos log at /var/F5/partition<ID>/log/velos.log . Also you can enable HA debug for the controllers with " system dbvars config debug confd ha-state-machine true ". Overview of HA: https://support.f5.com/csp/article/K19204400 Controller HA: https://support.f5.com/csp/article/K21130014 Partition HA: https://support.f5.com/csp/article/K58515297 List of Velos/rSeries services: Overview of F5 VELOS chassis controller services Overview of F5 VELOS partition services Overview of F5 rSeries system services 2. Entering into F5OS objects. The rSeries and Velos tenants are like vCMP quests with VIPRION and sometimes if there are access issues with them it could be needed to open their console. For this the "virtctl" command can be used and as an example " /usr/share/omd/kubevirt/virtctl console<tenant_name>-<tenant_instance_ID> ". Also as velos uses blades and partitions it could be needed to ssh to a blade with " ssh slot<number> " or to enter a partition with " docker exec -it partition<ID>_cli su admin " as sometimes for example to see the GUI logs entering the GUI container for the partition could be needed but F5 support will for this in most cases and maybe this will be the way to enter the BIG-IP NEXT CLI. Overview of VELOS system architecture: https://support.f5.com/csp/article/K73364432 Overview of rSeries system architecture: https://support.f5.com/csp/article/K49918625 rSeries tanant access: https://support.f5.com/csp/article/K33373310 Velos blade and tenant access: https://support.f5.com/csp/article/K65442484 Velos partition access: https://support.f5.com/csp/article/K11206563 3. Usefull commands and logs. For Velos/rSeries as this is a system with a cluster the "show cluster" command is usefull to see any issues (look fo "cluster is NOT ready."). Also the velos.log for the controller and partitions is a great place to start and debug level can be enabled for it under " SYSTEM SETTINGS Log Settings " as this is also the place for rSeries logging to be set to debug. Also the /var/log/openshift.log is good be checked with velos if there are cluster issues or or ks3.log in rSeries. Also the confd logs are like mcpd logs, so they are really usefull for Velos or rSeries. Other nice commands are docker ps, oc get pod --all-namespaces -o wide, kubectl get pod --all-namespaces -o wide but the support will ask for them in most cases. Velos cluster status: https://support.f5.com/csp/article/K27427444 Velos debug: https://support.f5.com/csp/article/K51486849 Velos openshift example issue: https://support.f5.com/csp/article/K01030619 Monitoring Velos: https://clouddocs.f5.com/training/community/velos-training/html/monitoring_velos.html Monitoring rSeries: https://clouddocs.f5.com/training/community/rseries-training/html/monitoring_rseries.html 4. Velos and rSeries tcpdumps packet captures, file utility and qkview files. For Velos qkviews ca be created for controller or partition as they are seperate qkviews. Tcpdumps for client traffic are done a tcpdump utility from the F5OS (su - admin) and a tcpdump in the Linux kernel is just for the managment ip addresses of the appliance , controller (floating or local) , partition or tenant. The file utility allows for file transfers to remote servers or even downloading any log from the Velos/rSeries to your computer as this was not possible before with iSeries or Viprion. Also the file utility starts outbound session to the remote servers so this an extra security as no inbound sessions need to be allowed on the firewall/web proxy and it can be even triggered by API call and I may make a codeshare article for this. Velos tcpdump utility: https://support.f5.com/csp/article/K12313135 rSeries tcpdump utility: https://support.f5.com/csp/article/K80685750 Qkview Velos: https://support.f5.com/csp/article/K02521182 Qkview Velos CLI location: https://support.f5.com/csp/article/K79603072 Qkview rSeries: https://support.f5.com/csp/article/K04756153 SCP: https://support.f5.com/csp/article/K34776373 For rSeries 2000/4000 tcpdump is different as SR-IOV not FPGA (rSeries Networking (f5.com)) is used to attach interfaces directly to the tenant VM: Article Detail (f5.com) 5. A final fast check could be to use ''kubectl get pods -o wide--all-namespaces'' (with Velos also ''oc get pods -o wide --all-namespaces'' should also work) to see that all pods are ok and running. Also ''docker ps'' or '' docker ps --format 'table {{.Names}}\t{{.RunningFor}}\t{{.Status}}' '' are usefull to see a container that could be going down and up and this can be correlated with issues seen with "show cluster" command. 6. The new F5OS has much better hardware diagnostics than the old devices, so no more the need to do EUD tests as all system hardware components and their health can be viewed from the GUI or CLI and also this is shown in F5 ihealth! https://techdocs.f5.com/en-us/velos-1-5-0/velos-systems-administration-configuration/title-system-settings.html 7. For Velos and rSeries always keep the software up to date as for example I will give with the Velos 1.5.1 the cluster rebuild because of the openshift ssl cert being 1 year is much simpler or the F5 rSeries and the Cisco Nexus issues or the corrupt Qkview generation when the GUI not the CLI is used (the velos cluster rebuild with touch /var/omd/CLUSTER_REINSTALL can solve many issues but it will cause some timeout): http://cdn.f5.com/product/bugtracker/ID1135853.html https://my.f5.com/manage/s/article/K000092905 https://support.f5.com/csp/article/K79603072 In the future ''docker'' commands could be not available but then just use "crictl" as this replaces the docker init system for kubernetes.2.7KViews2likes3CommentsKnowledge sharing: High CPU/Memory/Swap investigation/troubleshooting
I will share some basic knowedge about troubleshooting and resolving high data plane or contol plane CPU. First there is an already great article, so first check it: https://devcentral.f5.com/s/articles/Troubleshooting-High-CPU-Utilisation-on-F5-boxes If the CPU 0, 2, 4 are high then it is a data plane tmm issue and if the CPU 1, 3, 5 are high then it is control plane CPU issue. Please read: https://support.f5.com/csp/article/K15468 https://support.f5.com/csp/article/K16739 If the control plane CPU is constantly high just run linux "top" command see which process causes the issues and reboot the process and check for known bugs in google, askf5, the bug tracker, the release notes for known bugs and ihealth but be carefull as restarting critical processes may do some impact and if the process is not critical like the bigd then just restart it as I have seen bugs where the bigd or the snmpd have memory leakage and need restart. https://support.f5.com/csp/article/K67197865 https://support.f5.com/csp/article/K20060182 If the control plane CPU jumps from time to time then it is harder to catch the issue but if you see there is a pattern check the logs, cron job any REST-API scripts that may run at the same time when the CPU jumps. For example the F5 ASM datasyncd may cause periodic jumps as mentioned in https://support.f5.com/csp/article/K02827102 or the ASM policy builder is enabled and learns too many thigs as mentioned in https://support.f5.com/csp/article/K58571155. Also if you see the top command has many "tmsh" processes that means there is a REST-API script that does not close the connection correctly that causes many tmsh sessions to hang causing high CPU and Memory(in this case configure tmsh timeout as it is not configured by default https://support.f5.com/csp/article/K9908). To catch an issue that runs at a random time then you may need to follow the below article or run the top command with some arguments like "top -n 10 -d 10 >> /var/tmp/top.txt" as this will run the top 10 times with interval of 10 seconds: https://support.f5.com/csp/article/K40472403 If the CPU is high for the TMM process during peak working hours your system maybe overutlized then you may need increase the number of cores for virtual systems (if the license allows it) or VCMP (of there are free cores) or buy another device. Things like log messages in the /var/log/kern for the idle enforcer or the "clock advanced" messages in the /var/log/ltm may also indicate tmm cpu issues: https://support.f5.com/csp/article/K10337613 https://support.f5.com/csp/article/K10095 https://support.f5.com/csp/article/K24427880 You may try some small optimizations like: upgrading to the latest version, checking the /var/log/ltm and turning off any irule logging forgotten TCP RST variables, resolving SSL handshakes removing orphaned configuration objects stopping any other debugs or modified system logging variables (better set the F5 to send the logs to external log server server with HSL as this can be done also in an irule with the "HSL::" command) etc. https://support.f5.com/csp/article/K11058264 https://support.f5.com/csp/article/K15292 https://support.f5.com/csp/article/K13223 https://support.f5.com/csp/article/K55131641 https://devcentral.f5.com/s/articles/the101-irules-101-logging-amp-comments https://support.f5.com/csp/article/K15335 For memory issues don't forget that "top" command shows the memory for the date plane and "show sys memory" shows the memory for the F5 tmm subsystems. For example a bad irule is causing the "tcl" subsystem memory to go high. Also logs in /var/log/ltm for the memory sweeper are a good indication https://support.f5.com/csp/article/K13302777 and https://support.f5.com/csp/article/K15740. Also a DDOS may cause high memory so be carefull. For control plane memory don't forget that if you see many tmsh sessions opened in the top then check your REST-API scripts and automations and configure tmsh timeout as I have seen this to many times to even count. The memory for vCMP is increased by adding more cores if needed and for virtual edtions is much more easy. Note: A high Other Used memory usage on the BIG-IP system Dashboard may not indicate an issue, as Linux kernel allocates memory to buffers and disk caching that can be released as needed. https://support.f5.com/csp/article/K16419 https://support.f5.com/csp/article/K16562 Example high control plane memory: https://support.f5.com/csp/article/K93325541 Examples for high tmm data plane memory: K02620345 K13889 K09336400 K15245 ID633402.html K44385170 For SWAP issues now you can enable the top to show you the process causing the issue or jst upload qkview to ihealth and see from there: https://support.f5.com/csp/article/K40027012 https://support.f5.com/csp/article/K55227819 Also don't forget to check the hard disk as it can cause high CPU if the logs can't be written, because of full or faulty hard drive: https://support.f5.com/csp/article/K93344414 https://support.f5.com/csp/article/K144032.7KViews2likes2CommentsF5 XC Distributed Cloud HTTP Header manipulations and matching of the client ip/user HTTP headers
1 . F5 XC distributed cloud HTTP Header manipulations In the F5 XC Distributed Cloud some client information is saved to variables that can be inserted in HTTP headers similar to how F5 Big-IP saves some data that can after that be used in a iRule or Local Traffic Policy. By default XC will insert XFF header with the client IP address but what if the end servers want an HTTP header with another name to contain the real client IP. Under the HTTP load balancer under "Other Options" under "More Options" the "Header Options" can be found. Then the the predefined variables can be used for this job like in the example below the $[client_address] is used. A list of the predefined variables for F5 XC: https://docs.cloud.f5.com/docs/how-to/advanced-security/configure-http-header-processing There is $[user] variable and maybe in the future if F5 XC does the authentication of the users this option will be insert the user in a proxy chaining scenario but for now I think that this just manipulates data in the XAU (X-Authenticated-User) HTTP header. 2. Matching of the real client ip HTTP headers You can also match a XFF header if it is inserted by a proxy device before the F5 XC nodes for security bypass/blocking or for logging in the F5 XC. For User logging from the XFF Under "Common Security Controls" create a "User Identification Policy". You can also match a regex that matches the ip address and this is in case there are multiple IP addresses in the XFF header as there could have been many Proxy devices in the data path and we want see if just one is present. For Security bypass or blocking based based on XFF Under "Common Security Controls" create a "Trusted Client Rules" or "Client Blocking Rules". Also if you have "User Identification Policy" then you can just use the "User Identifier" but it can't use regex in this case. To match a regex value in the header that is just a single IP address, even when the header has many ip addresses, use the regex (1\.1\.1\.1) as an example to mach address 1.1.1.1. To use the client IP address as a source Ip address to the backend Origin Servers in the TCP packet after going through the F5 XC (similar to removing the SNAT pool or Automap in F5 Big-IP) use the option below: The same way the XAU (X-Authenticated-User) HTTP header can be used in a proxy chaining topology, when there is a proxy before the F5 XC that has added this header. Edit: Keep in mind that in some cases in the XC Regex for example (1\.1\.1\.1) should be written without () as 1\.1\.1\.1 , so test it as this could be something new and I have seen it in service policy regex matches, when making a new custom signature that was not in WAAP WAF XC policy. I could make a seperate article for this 🙂2.7KViews8likes1CommentF5 AFM/Edge Firewall and the difference between Edge Firewalls and Next-generation Firewalls (NGFW)
Next-generation Firewalls (NGFW) have a lot of features like policies based on AD users and AD groups, dynamic user quarantine, Application/Service and Virus/Spyware/Vulnerability default or custom signatures to allow traffic only comming from specific applications that is scanned for viruses or other malware types. A long time ago I also did not know the difference between the F5 AFM and NGFW (I even asked a question on the forum https://community.f5.com/t5/technical-forum/to-make-the-f5-afm-like-a-full-ngfw-is-there-plans-the-f5-afm-to/td-p/207685 ), so after time I understood the difference and I have made this post to clear things out 😉 NGFW truly provides a lot of nice options but where they are lacking when they are deployed at the Internet Service Providers, Mobile Operators or at the Edge of big corporate networks or private scrubbing centers as they don't have good DDOS protections or CG-NAT functions. NGFW dp have NAT capabilities but in most cases dose capabilities are limited to basic source PAT, destination NAT or Static NAT. Also at the Edge of the Network the firewall device should have high throughput and there is no need for it to work with AD users/AD groups, user/group redistribution between the firewalls or specific Applications/Services, used just by a specific company as in the case with ISP or Mobile Operators it should protect many customers with the Advanced DOS/DDOS options, to be able to do NAT that is easily traceable in the logs which IP address to which source ip which public ip was allocated (great feature for mobile or Internet providers combined with F5 PEM for user monetization and tracking) Also the Edge firewall device may need to failover to a Scrubbing center if the DDOS attack becomes too big, so this function is nice to have or to have an ip intelligence feed list to block attacks even before doing any deep inspections just based on the source or destination IP address. This is where the F5 AFM comes into the picture as not an replacement of the NGFWs but as a complementary device that is at the Edge of the Network and filters the traffic and then the customer NGFWs do the more fine grade checks. Sometimes AFM is deployed as a server firewall together with F5 LTM/APM/aAWAF after the NGFWs for example to filter the a DDOS attack that the scrubbing center did not block as it was too small and directed to a specific destination and most scrubbing center block only really high volume attacks (most scrubbing centers can't look in the SSL data like the F5 Silverline) that can bring down the entire data center.AFM can now work with subscriber data at the ISP mobile operator level and from what I have seen the NGFW are limited in this field and they are made for internal Enterprise use, where AD groups and AD users are needed not subscriber data. The F5 AFM capabilities that I have not seen at most NGFW are : DOS based protections on the AFM have the option to be Fully Atomatic and to adjust their thresholds based Machne Learning (ML) learning, so there id no need for someone to constantly modify the DOS thresholds like with other DOS protection products. Also the DDOS protection has Dinamic signatures and with this feature a dynamic signature of the DDOS traffic is Automatically generated, so only the attackers to be blocked. By default the DDOS protection thresholds under "Security > DoS Protection > Device Protection " are inforced if a not more specific DOS profile is athached under the Virtual Server. The F5 AFM can be combined with the F5 Advanced WAF/ASM for full layer 3/4/7 DDOS protection and there is device named F5 DDoS Hybrid Defender that is combination between the Layer3/4 and the Layer7 protections and it is configured with a Guided Configuration Wizard. The F5 AFM has DDOS protections not only for TCP, UDP,ICMP traffic but also for HTTP, DNS and SIP protocols. There are great community articles about the DDOS features and their configuration that I will share: https://community.f5.com/t5/technical-articles/explanation-of-f5-ddos-threshold-modes/ta-p/286884 https://community.f5.com/t5/technical-articles/ddos-mitigation-with-big-ip-afm/ta-p/281234 Also this link is helpfull: https://support.f5.com/csp/article/K49869231 The AFM can redirect the traffic to a Scrubing Center if it becomes too big and this may save some money to only use a scrubbing center if the DDOS is too big. If BGP is used the AFM will use the F5 Zebos Routing module that is like a mini router inside F5. The previous F5 product Carrier Grade NAT is now migrated to the AFM which allows you to not only use source nat, destination nat or static nat but also to use NAT features like PBA,Deterministic NAT or PCP.The AFM can also respond to ARP requests for translated source IP addresses and this is called Proxy ARP or to intgrate with the ZebOS routin module that is like a mini router inside the F5 device to advertize the translated addresses. Port block allocation (PBA) mode is a translation mode option that reduces CGNAT logging, by logging only the allocation and release of each block of ports. When a subscriber first establishes a network connection, the BIG-IP® system reserves a block of ports on a single IP address for that subscriber. The system releases the block when no more connections are using it. This reduces the logging overhead because the CGNAT logs only the allocation and release of each block of ports. Deterministic mode is an option used to assign translation address, and is port-based on the client address/port and destination address/port. It uses reversible mapping to reduce logging, while maintaining the ability for translated IP address to be discovered for troubleshooting and compliance with regulations. Deterministic mode also provides an option to configure backup-members.And there is even a tool dnatutil to see the mapping of a client ip address. Port Control Protocol (PCP) is a computer networking protocol that allows hosts on IPv4 or IPv6 networks to control how the incoming IPv4 or IPv6 packets are translated and forwarded by an upstream router that performs network address translation (NAT) or packet filtering. By allowing hosts to create explicit port forwarding rules, handling of the network traffic can be easily configured to make hosts placed behind NATs or firewalls reachable from the rest of the Internet (so they can also act as network servers), which is a requirement for many applications. As logging the user NAT translations is mandatory this can generate a lot of logs for the Service Providers but with DNAT and PBA the needed log space is reduced as much as possible but still keeping the needed log info. The AFM now supports some of the options of F5 PEM for Traffic Intelligence or as in the NGFW applicaion discovery or subscriber discovery and security rules based on subscribers discovered by Radius or DHCP sniffing or iRules as the NGFW have AD users and AD groups but Service and Mobile providers work with IMEI phone codes and not with AD groups/users. https://community.f5.com/t5/technical-articles/traffic-intelligence-in-afm-through-categories/ta-p/295310 Another really wonderful feature is the IP intelligence that will protect you from bad source or destination ip addresses and with the AFM you can also feed the AFM custom list that are generated by your threat intelligence platform.The AFM and Advanced WAF/ASM can automatically place the IP addresses in a shun list that is blocked by the IP intelligence as the IP intelligence checks happen before the ASM or even the AFM in the traffic path! There is a nice community video about this feature:https://community.f5.com/t5/technical-articles/the-power-of-ip-intelligence-ipi/ta-p/300528 The AFM also has port misuse policies or Protocol Inspection profiles that are similar the NGFW Applications/services to allow only the correct protocol on the port not just port number or IPS/Antivirus signatures. The F5 AFM Protocol Inspection is based on SNORT so you can not only block attacks but allow traffic based on the payload, for example providing access to sertain server only if the Referer header is a sertain value by writing custom signatures. It by default has many signatures and protocol RFC compliance checks. The F5 AFM protocol inspection can also be used as as more fine grade way for custom application control than the Port Misuse policies, when creating a custom signature for example to block specific User-Agent HTTP header! One of the best features that the F5 Protocol Inspection IPS has compared even to NGFW products is to place new signatures in staging (for example after a new signature set is downloaded) for some time and to monitor how many times the signatures get triggered in that staging period before enforcing and that feature is really great. For more information I suggest checkingthe link below: https://support.f5.com/csp/article/K00322533 https://f5-agility-labs-firewall.readthedocs.io/en/latest/class2/module3/lab4.html https://support.f5.com/csp/article/K25265787 The F5 AFM is also a great Edge firewall for many protocols like DNS, SSH,SIP not only HTTP.The F5 AFM simiarly to the aWAF/ASM can work in a transperant bridged mode thanks to Vlan Groups, Wildcard VS and Proxy Arp, where it is invisible for the end users (https://support.f5.com/csp/article/K15099). Do not forget that tha AFM is before any other module except the IP intelligence and to decide if it will work in a firewall or ADC mode(https://support.f5.com/csp/article/K92958047). Also the order or the rules is important (Global context policies/rules > Route Domain Context > Virtual Server/Self IP > Managment) . You can even use DNS FQDN names in the security policy rules if needed and trace any issues related to Security Rules and DOS with the Packet Tester tool and with Timer policies you can allow long live connections that do not generate traffic through the firewall if needed!The Managment IP in newer versions can use AFM rules even without AFM being provisioned (https://support.f5.com/csp/article/K46122561), isn't that nice😀 ! F5 supports vWire or Vlan groups, so F5 AFM or F5 DHD (DDOS Hybrid Defender) can be placed not only like a layer 3 firewall but also in Transparent/Invisible layer 2 or in case or Virtual Wire layer 1 mode. The F5 AFM operations guide is trully a nice resource to review: https://support.f5.com/csp/article/K382017552.5KViews3likes0Comments