upgrades
6 TopicsrSeries and tenant upgrades
I manage four r5900 appliances, hosting 16 tenants. The appliances are running F5OS-A 1.5.0; the tenants all run BIG-IP 15.1.10.2. We all know, of course, that the most recent QSN was released today. Normally, I'd upgrade everything to the latest version. However, where F5OS-A 1.7.0 is concerned, there is a "Known Issue" that raises a big red flag. In the release notes, 1380705-1 : BIG-IP tenant is stuck during boot up after doing tenant upgrade from 15.1.x to 17.1.x The issue description says this will not happen with every tenant. But it just has to happen once to ruin a day, right? In the past, I have always upgraded the appliance OS before upgrading the tenants. That just seemed to make sense. Given this issue, though, I am considering upgrading all of the tenants to BIG-IP 17.1.1.3 first, then upgrading the appliances to F5OS-A 1.7.0. Does anybody know of any pitfalls to this approach?1.3KViews0likes7CommentsProblems Overcome During a Major LTM Software/Hardware Upgrade
I recently completed a successful major LTM hardware and software migration which accomplished two high-level goals: · Software upgrade from v9.3.1HF8 to v10.1.0HF1 · Hardware platform migration from 6400 to 6900 I encountered several problems during the migration event that would have stopped me in my tracks had I not (in most cases) encountered them already during my testing. This is a list of those issues and what I did to address them. While I may not have all the documentation about these problems or even fully understand all the details, the bottom line is that they worked. My hope is that someone else will benefit from it when it counts the most (and you know what I mean). Problem #1 – Unable to Access the Configuration Utility (admin GUI) The first issue I had to resolve was apparent immediately after the upgrade finished. When I tried to access the Configuration utility, I was denied: Access forbidden! You don't have permission to access the requested object. Error 403 I happened to find the resolution in SOL7448: Restricting access to the Configuration utility by source IP address. The SOL refers to bigpipe commands, which is what I used initially: bigpipe httpd allow all add bigpipe save Since then, I’ve developed the corresponding TMSH commands, which is F5’s long-term direction toward managing the system: tmsh modify sys httpd allow replace-all-with {all} tmsh save / sys config Problem #2 – Incompatible Profile I encountered the second issue after the upgraded configuration was loaded for the first time: [root@bigip2:INOPERATIVE] config # BIGpipe unknown operation error: 01070752:3: Virtual server vs_0_0_0_0_22 (forwarding type) has an incompatible profile. By reviewing the /config/bigip.conf file, I found that my forwarding virtual servers had a TCP profile applied: virtual vs_0_0_0_0_22 { destination any:22 ip forward ip protocol tcp translate service disable profile custom_tcp } Apparently v9 did not care about this, but v10 would not load until I manually removed these TCP profile references from all of my forwarding virtual servers. Problem #3 – BIGpipe parsing error Then I encountered a second problem while attempting to load the configuration for the first time: BIGpipe parsing error (/config/bigip.conf Line 6870): 012e0022:3: The requested value (x.x.x.x:3d-nfsd {) is invalid (show | <pool member list> | none) [add | delete]) for 'members' in 'pool' While examining this error, I noticed that the port number was translated into a service name – “3d-nfsd”. Fortunately during my initial v10 research, I came across SOL11293 - The default /etc/services file in BIG-IP version 10.1.0 contains service names that may cause a configuration load failure. While I had added a step in my upgrade process to prevent the LTM from service translation, it was not scheduled until after the configuration had been successfully loaded on the new hardware. Instead I had to move this step up in the overall process flow: bigpipe cli service number b save The corresponding TMSH commands are: tmsh modify cli global-settings service number tmsh save / sys config Problem #4 – Command is not valid in current event context This was the final error we encountered when trying to load the upgraded configuration for the first time: BIGpipe rule creation error: 01070151:3: Rule [www.mycompany.com] error: line 28: [command is not valid in current event context (HTTP_RESPONSE)] [HTTP::host] While reviewing the iRule it was obvious that we had a statement which didn’t make any sense, since there is no Host header in an HTTP response. Apparently it didn’t bother v9, but v10 didn’t like it: when HTTP_RESPONSE { switch -glob [string tolower [HTTP::host]] { <do some stuff> } } We simply removed that event from the iRule. Problem #5: Failed Log Rotation After I finished my first migration, I found myself in a situation where none of the logs in the /var/log directory were not being rotated. The /var/log/secure log file held the best clue about the underlying issue: warning crond[7634]: Deprecated pam_stack module called from service "crond" I had to open a case with F5, who found that the PAM crond configuration file (/config/bigip/auth/pam.d/crond) had been pulled from the old unit: # # The PAM configuration file for the cron daemon # # auth sufficient pam_rootok.so auth required pam_stack.so service=system-auth auth required pam_env.so account required pam_stack.so service=system-auth session required pam_limits.so #session optional pam_krb5.so I had to update the file from a clean unit (which I was fortunate enough to have at my disposal): # # The PAM configuration file for the cron daemon # # auth sufficient pam_rootok.so auth required pam_env.so auth include system-auth account required pam_access.so account sufficient pam_permit.so account include system-auth session required pam_loginuid.so session include system-auth and restart crond: bigstart restart crond or in the v10 world: tmsh restart sys service crond Problem #6: LTM/GTM SSL Communication Failure This particular issue is the sole reason that my most recent migration process took 10 hours instead of four. Even if you do have a GTM, you are not likely to encounter it since it was a result of our own configuration. But I thought I’d include it since it isn’t something you’ll see documented by F5. One of the steps in my migration plan was to validate successful LTM/GTM communication with iqdump. When I got to this point in the migration process, I found that iqdump was failing in both directions because of SSL certificate verification despite having installed the new Trusted Server Certificate on the GTM, and Trusted Device Certificates on both the LTM and GTM. After several hours of troubleshooting, I decided to perform a tcpdump to see if I could gain any insight based on what was happening on the wire. I didn’t notice it at first, but when I looked at the trace again later I noticed the hostname on the certificate that the LTM was presenting was not correct. It was a very small detail that could have easily been missed, but was the key in identifying the root cause. Having dealt with Device Certificates in the past, I knew that the Device Certificate file was /config/httpd/conf/ssl.crt/server.crt. When I looked in that directory on the filesystem, there I found a number of certificates (and subsequently, private keys in /config/httpd/conf/ssl.key) that should not have been there. I also found that these certificates and keys were pulled from the configuration on the old hardware. So I removed the extraneous certificates and keys from these directories and restarted the httpd service (“bigstart restart httpd”, or “tmsh restart sys service crond”). After I did that, the LTM presented the correct Device Certificate and LTM/GTM communication was restored. I'm still not sure to this day how those certificates got there in the first place...917Views0likes3Comments11.6.0 HF5 Error 010716bc:3: anybody ?
I recently upgraded one of my AWS F5, from 11.6.0.4 to 11.6.0.5 I use some complex iRules to manage my sites, including recently added procs Now every change I try on the virtual server using the irules, I always get the error: 010716bc:3: HTTP::header command in a proc in rule (/Common/GTM) under HTTP_REQUEST event at virtual server (/Common/STAGE_MYSERVER_80) does not satisfy cmd/event/profile requirement. I know I've not provided much details, but I don't want a validation of my irule: just want to confirm if a change breaking existing code can be expected in a HotFix? Or there simply was a problem with my upgrade. Does anybody know if the irule runtime environment has changed in HotFix5 ??? Where can I find info about that error code? Thank you Angelo.460Views0likes6CommentsRemoving AAM/WAM for a successful upgrade
If you are wanting to upgrade to version 16 or 17 of BIG-IP, one thing that can cause your config not to load, is any element of AAM/WAM/WOM. As I discovered via a customer of mine, even removing all AAM/WAM items from traffic objects is not enough. While I know how to identify things in the conf files and can see them in iHealth, that doesn't help Admins in the field assess if this is an issue for them, and if it is, how to document what needs to be changed for the necessary approvals. With some help, I wrote this knowledge article to meet these needs as well as provide a way to quickly make the changes - https://my.f5.com/manage/s/article/K000149084 I am sharing this in the forum to not only advertise this, but explain some of the commands and help the community understand how they might be used for other tasks. From spending time running a few BIG-IPs myself in a prior life and working with hundreds of customers, I knew that my solution needed to address partitions and even iApps. My coworker Fernando C provided me the syntax to crawl every partition and I quickly found ways to morph that into this document. Lets take a look at the syntax that can read the lan TCP profiles in the Common partition and then see the changes needed to read all partitions. In order to filter the results a bit better we run these from bash so that we have access to a number of tools like grep, awk, sed, etc. # Return all virtual server names in Common that use a TCP Profile from wam or wom (aka AAM) # grep to find the profile prefixes and then piping that to AWK to grab the third word in the output of each line tmsh list ltm virtual one-line | grep -E "(profiles.*(w(a|o)m-tcp-lan*))" | awk '{print $3}' This simply returns the virtual server name without the partition name. Now to read all partitions, the tmsh portion of the command has to change. Specifically, we pass the -c option to tmsh to tell it to run multiple commands. When you enter tmsh, by default you are in the Common partition, so we have to back out to the root. Because we are in the root directory, we need to add the recursive option to read all subfolders which in this case are the partitions. #Read all partitions and filter for virtual servers that use the wam/wom TCP profiles on the lan or server side tmsh -c 'cd /; list ltm virtual recursive one-line' | grep -E "(profiles.*(w(a|o)m-tcp-lan*))" | awk '{print $3}' Now the output is the partition name and virtual server name, or if iApps are involved, the appservice name as well. You can take the output from the first command and pass it to xarg to use your output as a variable in a command to execute. CAUTION, the following command will attempt to make changes to your config. #Read all partitions and filter for virtual servers that use the wam/wom TCP profiles on the lan or server side then insert new profiles and delete the original profile #This will cause an error tmsh -c 'cd /; list ltm virtual recursive one-line' | grep -E "(profiles.*(w(a|o)m-tcp-lan*))" | awk '{print $3}' | xargs -t -I vsName tmsh modify ltm virtual vsName profiles add { f5-tcp-lan { context serverside } } profiles delete { wam-tcp-lan-optimized } If you run this command, it will error out, because without the proper syntax, tmsh assumes you are referencing objects in the /Common partition and as a result it will help you by implicitly adding that to the beginning of every object in your xarg command. I added the -t option to xarg to output the command that it will execute. To correct the syntax error, in the awk command, you add a forward slash and now tmsh will treat your command as if you have explicitly declared the partition name for every object. Caution - This will make changes to your configuration, very fast... #Read all partitions and filter for virtual servers that use the wam/wom TCP profiles on the lan or server side then insert new profiles and delete the original profile #CAUTION - This will make changes to your system. tmsh -c 'cd /; list ltm virtual recursive one-line' | grep -E "(profiles.*(w(a|o)m-tcp-lan*))" | awk '{print "/" $3}' | xargs -t -I vsName tmsh modify ltm virtual vsName profiles add { f5-tcp-lan { context serverside } } profiles delete { wam-tcp-lan-optimized } When I first hit the wall with xarg beyond the /Common partition, I did not realize what the fix was. However my OCD wanted to see a slash in front of the partition name and I had modified the awk to add it, but had given up on the xarg to modify things outside of /Common. It wasn't until I went to show the error to a peer, Chad T., that I discovered I stumbled upon the proper syntax, and realized I could simplify the instructions quite a bit. Where I would love some help from the community would be on ways to crawl the iApps to quickly disable Strict Updates. The xarg commands to modify/delete objects associated with an iApp will fail if the default setting of "Strict Updates" is enabled. Hope this helps, Carl230Views3likes3CommentsTech Tip: BIG-IP as Upgrade Facilitator
If you have machines behind your BIG-IP that are not load balanced, you still get a ton of benefits from their location. Virtualization is a big one, as is the number of metrics that are available in the BIG-IP. This Tech Tip focuses on a simple way to get more out of your BIG-IP by simply putting servers behind it. Yep, we said that. Just put the servers behind the BIG-IP and get more. The idea is simple. Upgrades cost downtime - generally a not insignificant amount of downtime. But what if you could reduce that downtime to zero? What if we said the chances were minuscule that even one customer would be effected? Well you can, if you have extra hardware. Assuming you’ve got your server to be upgraded behind a BIG-IP, here are some simple steps to upgrade: If you’ve got another box that you can use for your upgrade, then place it behind your BIG-IP. Give the new box any old open IP address and hostname. Install everything the machine needs. Place the machine in a pool of its own – we’ll call it testPool. Create a new Virtual Node (we’ll call this testVIP) and assign testPool to be the default pool. Copy all of the relevant data from your current (production) server. Give your testers rights to testVIP, and run acceptance testing. Copy all of the relevant data again to get the changes in production since testing started. Add your new server to the current production pool. Change the state on the old server to “only active connections allowed” this will stop accepting connections and break persistent connections. In short, all new traffic will be routed to the new node. Because we did not allow persistent connections to stay with the old server, when they immediately try to reconnect the BIG-IP will direct them to the new server. After all connections have ended with the old server, remove it from the current production pool. Decommission your old server in whatever manner is normal for your company. The benefit here is the virtualization of BIG-IP. The Virtual Node address of your production server never changed, so users saw no issues. You did not have to take down the existing server, change a bunch of IP settings on the new server, then bring it online. You just add the new server to the pool and change the state of the old server. Regular end users will not experience any downtime in this process, and persistent connections will reset and attach to the new server. You look like a hero for such a smooth upgrade, your boss will get congratulations on “pulling it off”. Everyone is happy, and the extra work involved is simply a setting change in your BIG-IP.221Views0likes0Comments