Forum Discussion

Davidfisher_345's avatar
Davidfisher_345
Icon for Altocumulus rankAltocumulus
Jul 18, 2018

Bad gateway error 502 on statistic pages

Hey Guys,

We are getting this on one of our boxes. It only shows up in the statistics tabs in virtual servers and pools for now.

I tried restarting the httpd and tomcat but nothing.

This is the ltm log:

tail -f /var/log/ltm
Jul 18 11:52:51 hostname err tmm1[26088]: 01010221:3: Per-invocation log rate exceeded; throttling.
Jul 18 11:52:51 hostname err tmm2[26088]: 01010221:3: Per-invocation log rate exceeded; throttling.
Jul 18 11:52:51 hostname err tmm5[26088]: 01010221:3: Per-invocation log rate exceeded; throttling.
Jul 18 11:52:51 hostname err tmm4[26088]: 01010221:3: Per-invocation log rate exceeded; throttling.
Jul 18 11:52:51 hostname err tmm3[26088]: 01010221:3: Per-invocation log rate exceeded; throttling.
Jul 18 11:52:52 hostname notice mcpd[7982]: 01070727:5: Pool /Common/tibco-preproduction_9257_pool member /Common/dx930:9257 monitor status up. [ /Common/tcp: up ]  [ was down for 0hr:0min:28sec ]
Jul 18 11:52:53 hostname notice mcpd[7982]: 01070727:5: Pool /Common/tibco-preproduction_9362_pool member /Common/dx930:9362 monitor status up. [ /Common/tcp: up ]  [ was down for 0hr:0min:29sec ]
Jul 18 11:52:57 hostname notice mcpd[7982]: 01070727:5: Pool /Common/tibco-preproduction_9059_pool member /Common/dx930:9059 monitor status up. [ /Common/tcp: up ]  [ was down for 0hr:0min:28sec ]
Jul 18 11:52:57 hostname notice logger: /usr/bin/syscalld  ==> /usr/bin/bigstart restart tomcat
Jul 18 11:53:08 hostname warning tmm5[26088]: 01260009:4: Connection error: hud_ssl_handler:1199: codec alert (20)
Jul 18 11:53:24 hostname notice mcpd[7982]: 01070638:5: Pool /Common/arcsight-f5_tcp_514_pool member /Common/AS1285AUFAL02-Sec:515 monitor status down. [ /Common/tcp: down; last error: /Common/tcp: No successful responses received before deadline.; Could not connect. @2018/07/18 11:53:24.  ]  [ was up for 26hrs:43mins:27sec ]
Jul 18 11:53:56 hostname notice mcpd[7982]: 01070727:5: Pool /Common/arcsight-f5_tcp_514_pool member /Common/AS1285AUFAL02-Sec:515 monitor status up. [ /Common/tcp: up ]  [ was down for 0hr:0min:32sec ]
Jul 18 11:54:56 hostname warning tmm5[26088]: 01260009:4: Connection error: ssl_passthru:4003: not SSL (40)
Jul 18 11:55:28 hostname notice logger: /usr/bin/syscalld  ==> /usr/bin/bigstart restart tomcat

Also did the pcap on the mgmt and it was all clean.

The version is 12.1.2 on the box and we see this both on active and standby boxes.

So, what you think??. .Thanks.

  • Jul 18 11:52:51 hostname err tmm1[26088]: 01010221:3: Per-invocation log rate exceeded; throttling.

    This means more than 5 log messages with the same message ID were logged within a one-second interval. It is informative only, advising the system is regulating the log messages.

    https://support.f5.com/csp/article/K10524

    If you look up the log ID you'll see it's indicating the pool now has availible members

    01010221 : Pool %s now has available members
    Location:
    /var/log/ltm
    
    Conditions:
    A pool with no available members now has available members. The pool may have had no available members due to administrative action, monitors, connection limits, or other constraints on pool member selection.
    
    Impact:
    This indicates that traffic is now load-balanced to the available member as desired.
    
    Recommended Action:
    None.
    

    source: https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/log-messages.html

    That said, it does not completely explain the 503 error when accessing the statistics page. TMM is the OS managing the health monitors and your LTM in general. Linux is responsible for the Web GUI. I would perhaps open a ticket with F5

  • As per my knowledge error page and log is no where related. I suspect, 2 or more user login same time and execute their task. httpd service restart will solve issue but i can see its Active box. So take the action accordingly.

     

    Check the tmm & httpd service log for more information. If still face issue open support case.

     

    Note: Full box restart will solve issue.

     

    • Davidfisher_345's avatar
      Davidfisher_345
      Icon for Altocumulus rankAltocumulus

      As I mentioned, the first thing we did was restart HTTPD and TOMCAT, but that did not fix it.

      Now f5 support has answered :

      The issue with the 503 errors observed while trying to view stats in the GUI might be related to the java process out of memory errors seen in the tomcat logs:
      
      /var/log/tomcat/catalina.out
      Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space Jul 15, 2018 2:55:33 PM org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run
      SEVERE: Caught exception (java.lang.OutOfMemoryError: Java heap space) executing org.apache.jk.common.ChannelSocket$SocketAcceptor@189f044, terminating thread
          at java.util.Arrays.copyOf(Arrays.java:2367)
          at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
          at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
          at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
          at java.lang.StringBuilder.append(StringBuilder.java:132)
          at com.f5.mcp.io.ConnectionManager.run(ConnectionManager.java:267)
          at java.lang.Thread.run(Thread.java:744)
      com.f5.mcp.io.McpQueryException: Empty reply message
      
      Please follow the steps in the solution links below to provision additional memory to the tomcat process. I'd suggest you start with an extra 20MB and verify if the issue persists
      
      K25554628: The tomcat process may experience an out of memory exception, become unresponsive, and fail to automatically restart
      https://support.f5.com/csp/article/K25554628
      
      K9719: Error Message: java.lang.OutOfMemoryError
      https://support.f5.com/csp/article/K9719
      
      If the issue persists after increasing the memory, we can continue troubleshooting from there
      

      This is a BIGIP 10000 box and has 2000+ virtual servers, is the load causing this problem?

      Can this cause any problems to the production traffic at all?