I am running galera MySQL behind F5 with performance Layer 4 type and i have setup 3 mysql node in pool member with Priority so only 1 mysql node will be used and other two will be standby. So everything was good but i found today when i shutdown Primary node which was active and i found my application break and when i have checked logs found: (2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))") So solution was restart application, look like active member mysql connection not bleeding off to other pool member, what is wrong with my setup?

can you share your virtual server and pool configuration? (from tmsh) Cheers, Kees

The BIG-IP is sending a TCP Reset the moment the pool member with priority 100 goes offline/down. (See article K15095 ). The BIG-IP system sends RST or ICMP messages to reset active connections and removes them from the BIG-IP connection table. Note: This selection is named "reset" instead of "reject" when using the TMOS Shell (tmsh). And it seems your application is not trying to reconnect when it receives this reset.

Were you expecting the BigIP to automatically connect an existing TCP connection to another host without the client having to participate? Even for the most basic of protocols (e.g. DNS, SNMP), this won't happen. When the pool member goes down you will (by default) device a RST to indicate that connection is no longer valid. You have the option to send the mid-stream connection to another host, but as that host has no idea of the connection, you will (again) recieve a RST. Now you can (In theory) write an iRule to migrate MySQL connections from one host to another (Triggered when the pool member goes down - this is where you'd normally recieve a RST back to the client). I have done it in the dim dark past for LDAP, but in practice it isn't trivial (And possibly non-practical, but I'd love to see someone do it) You'd have to implement a protocol specific proxy to track what was sent and what was received. For a SQL database you'd have to track transactions and be ready to replay the whole transaction to the second server if you had to migrate it for any reason... Given it's usually a pretty application specific thing I'd probably suggest altering the app rather than the BigIP to accomplish migrations of MySQL connections). Or running mySQL Cluster which does (apparently) guarantee uninterrupted access from clients... but I've never tried it and I'd suspect it's not cheap either...

MySQL active connection never bleed off to other pool member

KeesvandenBos
MVP
Aug 17, 2018
can you share your virtual server and pool configuration? (from tmsh)

Cheers,

Kees

satish_txt_2254

Cirrus

Aug 17, 2018

Virtual Server

create ltm virtual /OSTACK/OSTACK_VS_GALERA { destination 172.28.0.9:3306 ip-protocol tcp mask 255.255.255.255 pool /OSTACK/OSTACK_POOL_GALERA profiles replace-all-with { /Common/fastL4 { } }  mirror enabled source-address-translation { pool /OSTACK/OSTACK_SNATPOOL type snat } }

POOL

create ltm pool /OSTACK/OSTACK_POOL_GALERA { load-balancing-mode least-connections-node members replace-all-with { OSTACK_NODE_ostack-infra-02_galera_container-fa5d9e98:3306 { priority-group 100 } OSTACK_NODE_ostack-infra-03_galera_container-eaacd880:3306 { priority-group 95 } OSTACK_NODE_ostack-infra-01_galera_container-6c126d29:3306 { priority-group 90 } } min-active-members 1 service-down-action reset slow-ramp-time 0 monitor /OSTACK/OSTACK_MON_GALERA }

KeesvandenBos
MVP
Aug 17, 2018
The BIG-IP is sending a TCP Reset the moment the pool member with priority 100 goes offline/down.

(See article K15095 ).

The BIG-IP system sends RST or ICMP messages to reset active connections and removes them from the BIG-IP connection table.

Note: This selection is named "reset" instead of "reject" when using the TMOS Shell (tmsh).

And it seems your application is not trying to reconnect when it receives this reset.
Hamish
Cirrocumulus
Aug 17, 2018
Were you expecting the BigIP to automatically connect an existing TCP connection to another host without the client having to participate?

Even for the most basic of protocols (e.g. DNS, SNMP), this won't happen. When the pool member goes down you will (by default) device a RST to indicate that connection is no longer valid. You have the option to send the mid-stream connection to another host, but as that host has no idea of the connection, you will (again) recieve a RST. Now you can (In theory) write an iRule to migrate MySQL connections from one host to another (Triggered when the pool member goes down - this is where you'd normally recieve a RST back to the client). I have done it in the dim dark past for LDAP, but in practice it isn't trivial (And possibly non-practical, but I'd love to see someone do it) You'd have to implement a protocol specific proxy to track what was sent and what was received. For a SQL database you'd have to track transactions and be ready to replay the whole transaction to the second server if you had to migrate it for any reason...

Given it's usually a pretty application specific thing I'd probably suggest altering the app rather than the BigIP to accomplish migrations of MySQL connections). Or running mySQL Cluster which does (apparently) guarantee uninterrupted access from clients... but I've never tried it and I'd suspect it's not cheap either...

satish_txt_2254

Cirrus

Aug 19, 2018

After lots of debugging i found following. If i point my application to F5 base LB then i am seeing following error, every minute.

(2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))")

Here is the full output of error

2018-08-19 09:19:50.789 11159 ERROR oslo_db.sqlalchemy.engines [req-aa221914-d720-490c-a8e8-f9d7b780a353 8ec61b0530b94a699c4dcf164115f365 328fc75d4f944a64ad1b8699c02350ca - default default] Database connection was found disconnected; reconnecting: DBConnectionError: (pymysql.err.OperationalError) (2006, "MySQL server has gone away (error(104, 'Connection reset by peer'))") [SQL: u'SELECT 1'] (Background on this error at: http://sqlalche.me/e/e3q8)
2018-08-19 09:19:50.789 11159 ERROR oslo_db.sqlalchemy.engines Traceback (most recent call last):
2018-08-19 09:19:50.789 11159 ERROR oslo_db.sqlalchemy.engines   File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener
2018-08-19 09:19:50.789 11159 ERROR oslo_db.sqlalchemy.engines     connection.scalar(select([1]))

Interesting thing is wen i point my application to haproxy (mysql LB) vip then error disappered, also if i point application directly to Galera mysql node then also error disappered, look like something going on with F5 based VIP

Do you think i should create "standar" VIP instead of "performance layer4" ? I did use persistent source addr but still same error.

satish_txt_2254
Cirrus
Aug 19, 2018
In
tcpdump
i am seeing F5 sending
RST
packet to both client and server and terminating connection every minute. This is new installation and there is not customer traffic or any high volume traffic yet... very odd why F5 sending RST?
satish_txt_2254
Cirrus
Aug 20, 2018
Ready for FUN after switching from
SNAT
to
automap
it fixed all my issue. Now i am really really curious and would like to know what is the difference here?

After switching to
automap
all my mysql connection error disappeared and i am not seeing any tcp
RST
packet from F5 now.
quantiti_170569
Nimbostratus
Aug 20, 2018
You can read about SNAT here https://support.f5.com/csp/article/K7820
amintej
Cirrus
Aug 20, 2018
Hello satish.txt, maybe the problem is related to PVA acceleration. You can try to configure SNAT and attache new FastL4 profile with PVA acceleration to None or Offload State to EST.
satish_txt_2254
Cirrus
Aug 20, 2018
@amintej,

I will try that but curious what is the relation with PVA acceleration?

I have other SNAT running on same F5 they all are working great except this MySQL one.

Current setting is PVA Acceleration = FULL

Forum Discussion

MySQL active connection never bleed off to other pool member

Recent Discussions

ip x-forwarding

F5 ASM API-Protection Policy

ports are showing open on online scanning tool

Upgrade F5 BIGIP

DNS HM Probe issue

Related Content

pool members can't connect to another Virtual Server

DevCentral's Featured Member for April - Mihai Cziraki

Ansible playbook run tasks only on Active LTM member

DevCentral's Featured Member for November - Mohamed Salah

DevCentral Connects hosts Capture the Flag!

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS