cancel
Showing results for 
Search instead for 
Did you mean: 
Login & Join the DevCentral Connects Group to watch the Recorded LiveStream (May 12) on Basic iControl Security - show notes included.

BigIP failure

CraigM_17826
Altostratus
Altostratus

Hi all,

 

 

not sure if this is the correct group, so here goes 🙂

 

 

We just experienced a very odd BigIP issue on our providers Vipreon BigIPs. To cut a long story short it seems the configuration file became corrupted and was causing our iRules to fail with bogus errors. Because we don't have accerss to the logfiles this was not imediately evident, the only symtom was that we could not conenct to the VIPs. The errors being logged in the ltm logfile looked like this

 

 

Feb 12 02:44:20 slot1/PSYD3FFILTM001-02 info tmm1[7030]: 01220009:6: Pending rule event HTTP_RESPONSE aborted for 10.12.33.16:8001->10.12.33.4:50868 (listener: //)

 

Feb 12 02:44:23 slot1/PSYD3FFILTM001-02 info tmm1[7030]: 01220009:6: Pending rule event HTTP_RESPONSE aborted for 10.12.33.17:8001->10.12.33.4:18530 (listener: //)

 

There were lots of these errors being logged. Never having seen this before I checked the HTTP_RESPONSE section of the iRule and could not see anything obvious. I assumed the error was refering to some issue around the code in that section of the iRule. As it transipired the only code in that section was to persist the JSESSIONID cookie. Not critical for us, so I decided to comment out the code to see if these errors went away. At this point when I tried to save the changes two things happened.

 

1. The iRule editor displayed in a staus field the following message "Inproper formed XML" and it woudln't save the changes. So I exited from the editor and restarted it.

 

2. Upon reconnecting with the iRule editor I tried the same change again and this time the editor generated a stack dump on trying to save. I didn't write down the entire message, but two things stood out

 

a) It reported or seemed to imply the configurtion file was corrupted

 

b) That our partition account did not have privleges to make config changes. This was odd becaue we had been using the windows iRule editor since day one with the Vipreons and our partition username.

 

 

In the end our hoster failed over to the standby unit and low and behold everything started to work again which was good, but it took us a while to work this out. Maybe we are just slow and more experienced userrs would tried this first, but in our defence the BigIP looked ok, all the VIPs were green, pool statuses were green, we couldn't see anything that indicated a issue.

 

I suppose what I am leading up to is that I am a little worried that the BigIP could get itself into this state and not detect someting is wrong. I realise that's an easy statement to make from a "end-user", but it seems to me to be a issue that the unit can get into this state and not failover. I know the BigIPs will not load corrupted config files at startup, but is seems to me it does not do any live checks on the status of the configuration file. Would it be possible to have the unit do such a check and raise some alarm condition or fail over to the standby if one is present? Once again I realise it won't be as simple as that because there is always the possibility the config on the standby could also be corrupted. I just feel something needs to be considered for this sort of situation because the it can lead to a inoperative unit.

 

To add salt to the wound, although we have holding pages for outages, becuase it is controlled in the iRule, it didn't kick in because the iRule was failing. 😞

 

I should also add that other VIPs on the unit were also missbhaving, so even if I had a standby VIP there was not gurantee it would have worked either.

 

Anyway I would be interested to hear peoples thoughts on this and if they have ever experienced something similar. At the moment our provider has raised the issue with F5 tech support so I'll have to wait to see what they identified the root cause to be.

 

Regards,

 

Craig

 

 

 

4 REPLIES 4

nitass
F5 Employee
F5 Employee
Would it be possible to have the unit do such a check and raise some alarm condition or fail over to the standby if one is present?i think the problem is what we checks and how to know if system is in that condition.

 

 

by the way, i usually force mcpd to reload configuration file if i suspet running and saved configuration is not consistent.

 

 

sol13030: Forcing the mcpd process to reload configuration

 

http://support.f5.com/kb/en-us/solutions/public/13000/000/sol13030.html

 

 

just my 2 cents.

hooleylist
Cirrostratus
Cirrostratus
Hi Craig,

 

 

I'd work with F5 Support on this issue as it's going to require a review of the logs and any core files to troubleshoot.

 

 

Aaron

CraigM_17826
Altostratus
Altostratus

Hi Nitas and Hoolio,

 

 

thanks for your comments. Because it is a hosted BigIP I have very little control over it. The cloud provider who provides the Viprion to us has indeed raised a case with F5 tech support and sent off logs and a copy of what we suspect is a corrupted config file. The later is just my gut feeling on the matter though.

 

 

Also, apologies for all the typos in the original posting..not enough sleep and coffee. Was a long morning. 😞

 

 

Regards,

 

Craig

 

afedden_1985
Cirrus
Cirrus

while waiting for F5 support you could run a QKVIEW through ihealth to see if it finds anything? Hopefully they will give you one but they may not if this device is shared with several customers.