Forum Discussion
Website Unavailable
Hello, We have a BIG-IP 11.2.1 Build 862.0 Hotfix HF2 F5.
I need help troubleshooting a single website that periodically becomes unavailable.
We have a VS object with a 2 node pool and it seems randomly (though I'm sure it's not) the site suddenly becomes unavailable, as in about 30 minutes ago. The site does not go to a maintenance page as it should if the servers were down or failing the monitor. Instead IE returns a "This page can't be displayed" internally and a "This web page is not available" in Chrome and for Firefox we get "The Connection was reset" "The connection to the server was reset while the page was loading."
The in-house developer insists it's not the IIS servers...and logically I would tend to agree. But nothing has changed on our LB and the pool still shows up/green. And the developer did just upload some content (or something) to another site hosted on the same servers right before this current outage. The different site is also load balanced with a separate VS object, pool, monitor, etc.
We have many other VS objects on this LB and none of them are affected, just this one site/VS object.
I don't know how to look at logs on our F5 to analyze the traffic going to/through this VS object/site. Would anybody be able to provide me some direction on how to troubleshoot this?
Thanks in advance!!
Diane
6 Replies
Hi!
Have you checked the /var/log/ltm log file? It could be that no pool members was available at the time or an irule execution failure.
You can also check reset causes with:
show net rst-cause
/Patrik
I see. :)
You can view the LTM log by going to System -> Logs -> System.
- Do you have any iRules assigned to the virtual server?
- What kind of monitors are you using on the pool?
To view the logs as I suggested at first you need to login to the device with SSH (and have advanced shell privileges). If you're using windows I'd recommend Putty as client, if you're using linux or Mac the command is native and available in the terminal applications.
/Patrik
- afedden_1985
Cirrus
You can enable TCP reset debug and see the reset reason in the log or in the payload of the reset seen with a tcp dump. More info on how to enable can be found here for TCP RESET debug we can enable the reset reason in the payload or the logs from: http://support.f5.com/kb/en-us/solutions/public/13000/200/sol13223.html
Hi Diane!
Afedden had an excellent suggestion above with the TCP reset debut. Meanwhile, the ltm log in the web interface is rotated every day. In order to see the issues from ie. yesterday you must login with SSH and check the old logs (ltm.1.gz, ltm.2.gz, etc).
You can check the content by running:
zcat /var/log/ltm.1.gz | less
When the site is down, is the whole site down or just parts of it? What happens if you attempt to browse the monitor file from your client when the problems occurrs?
http://stage.cardiosource.org/lbtest.aspx
Running this command will show you the reasons for resets since the counter was reset the last time (from the SSH terminal):
tmsh show net rst-cause
When the problem occurrs next time I'd advise you to reset the counter and then show the causes again (from the SSH terminal):
tmsh reset net rst-cause tmsh show net rst-cause
Explanations for the causes can be found in the article afedden linked to:
http://support.f5.com/kb/en-us/solutions/public/13000/200/sol13223.html
Summary:
- If you can't load the monitor page when the problems occurrs and the pool members are green it'd indicate that there are some issues on the way.
- Check the ltm logs as instructed above to verify that nothing actually went down.
- Run the rst cause
Some additional questions:
- Could you please post the result of the rst-cause before you reset the counter?
- Could you please let me know the monitor interval and timeout?
- Do you use the default tcp profile on the virtual server?
Please note that I'm in Sweden so my answers might be a bit late. 🙂
/Patrik
- dlogsdonmd
Nimbostratus
Thanks again Patrick and to Afedden. I think I'm now going to have to see about opening a ticket. Earlier today I could ssh into the LB but when I do now I don't get the right command prompt so not sure what changed. A colleague was working with me to expand the LTM logs but then gibberish started displaying so we quit and now when we log in the prompt is different and we get different command options.
But to some of the questions...the site remains down, we don't know how to bring it back up. Any sub/micro site to stage.cardiosourc.org is down and browsing to the monitor page also fails through the LB. The site is reachable and works properly via a host entry pointing just to either server in the pool though.
We also want to see logs before issuing any reset commands, as you also indicated would be ideal. But now I'm not sure how to get to them since my login is now not working as it should be. What I see after login is: admin@(RDLB1) (cfg-sync Changes Pending) (Active) (/Common) (tmos) I don't know what I saw before, but this is different.
Thank you both for your help, I think I'm out of my realm of expertise here and will need official tech support help.
Diane
Hi Diane
Looks like you don't have advanced shell configured for your user. See if "run util bash" lets you open a shell or if you can configure your user.
The gibberish you saw could be due to using cat instead of zcat while viewing the ltm files in the terminal. Try again with zcat and it should work.
The site remains down? I got the impression that the problem was not permanent? Do you know if the load balancer has self IPs on the same VLAN as the servers? Does the servers use the load balancer as default gateway? One thing you could try, if not to rule out routing could be to enable address translation SNAT automap under the virtual server config.
/Patrik
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com