Forum Discussion
iRule to test tcp availability
I have a group of servers (currently six of them) that each host some number of Java applications. There are around 100 total applications spread out among these servers. Each application only runs on one server at a time, and each application listens on a predefined port or set of ports. An application running on server1 today may be on server3 tomorrow but there is no way to predict which server it will be on. The whole setup is essentially a high availability scheme. The number of applications will increase as time goes on, as will the number of ports listened on and the number of servers in the group.
I've been tasked with load balancing this mess. Clients need to be able to send messages to a single VIP on a specific port and get their request delivered to the node that currently has an application listening on the specified port.
I have two ideas to make this work:
1) Send each request to each node. Only one node will ever accept the connection so the refused connections from the rest can be ignored. This actually creates an amplification attack upon each request so it is not optimal, but it may be acceptable if no better solutions exist.
2) An iRule that intercepts the request as it comes in, then performs a TCP connection on each node in succession until it finds one that accepts the connection on the port in question. Then it would assign that node to use for this and future requests for $port. This also creates an amplification attack but not necessarily on every request since future requests for $port would keep going to the same node. When a request is sent to a node that is no longer listening on $port, it should trigger a new "scan" to find the new listening node. The client applications are configured to retry some number of times so it's ok if the request is sent once to a $node:$port that is no longer listening.
Example:
request1 comes in for port 9500
|
|- test node1:9500 (connection refused)
|
|- test node2:9500 (connection accepted)
|
|- assign node "node2" to the request
|
|- send request to node2:9500
|
`- node2:9500 accepts the request
request 2 comes in for port 9500
|
|- assign node "node2" to the request
|
|- send request to node2:9500
|
`- node2:9500 accepts the request
request 3 comes in for port 9500
|
|- assign node "node2" to the request
|
|- send request to node2:9500
|
|- node2:9500 refuses the connection
|
|- test node1:9500 (connection refused)
|
|- test node3:9500 (no need to test node2 again) (connection accepted)
|
|- assign node "node3" to the request
|
|- send request to node3:9500
|
`- node3:9500 accepts the request
Is my idea plausible or is there a better way?
Thanks for taking a look!
--
Kris Dames
9 Replies
- Kevin_Stewart
Employee
Is this a scenario where an application runs on a specified port and you have pools defined for each application based on that port?
If so, I think a TCP monitor applied to each pool might be a safe bet. With a given number of servers in a pool, the monitor would keep track of which server was alive at any given moment (within the monitor's window). - krisdames
Cirrus
Kevin,
Thanks for the response! Yes, each application runs on a specified port. I am trying to avoid a pool per application because that would be around 100 pools and that number will grow with time. I'm really looking for something dynamic so I don't have to create a new pool each time an application is added.
--
Kris Dames
- Christopher_Boo
Cirrostratus
Posted By krisdames on 03/11/2013 04:25 PMThe whole setup is essentially a high availability scheme.
Can you provide more detail as to why it is setup this way? This does sound like a mess.Chris
- Kevin_Stewart
Employee
I still tend to believe the pool option is your best approach, and you can certainly automate the management and creation of these pools with some pretty simple TMSH scripts.
That said, and this is a little on the wacky side, but you might also be able to do something like this:
1. Maintain a data group of all of the applications by port number (essentially a list of ports - assuming all of the servers will serve every application).
2. Create an internally-accessible "service" virtual server that maintains a table. It would also have an iRule that both populates/creates the table and that accepts simple HTTP requests to update the table entries. The table consist of one subtable per application (port) and each subtable defines the status of each IP per that port.
Ex. subtable "9500":
IP1: down
IP2: down
IP3: up
IP4: down
IP5: down
IP6: down
3. A monitor script attached to the service VIP that cycles through the data group (or subtables) and updates the status of each value (up or down) in the subtables - via a simple HTTP call to the VIP.
4. You application virtual servers would then check for the one "up" value in a given subtable before forwarding to the node.
It's absolutely imperative that the service VIP not be accessible to users. Also, depending on the frequency of the VIP's monitor and how many applications it has to cycle through, it may not respond as quickly as having a monitor attached to individual pools. - nitass
Employee
what about this one?[root@ve10:Active] config b virtual bar list virtual bar { snat automap pool foo destination 172.28.19.252:any ip protocol 6 rules myrule } [root@ve10:Active] config b pool foo list pool foo { members { 200.200.200.101:any {} 200.200.200.102:any {} 200.200.200.111:any {} 200.200.200.112:any {} } } [root@ve10:Active] config b rule myrule list rule myrule { when CLIENT_ACCEPTED { set c 1 log local0. "--" log local0. "\[table lookup tbl[TCP::local_port]\] [table lookup tbl[TCP::local_port]]" if { [table lookup tbl[TCP::local_port]] ne "" } { node [table lookup tbl[TCP::local_port]] log local0. "node [table lookup tbl[TCP::local_port]]" } } when LB_SELECTED { log local0. "\[LB::server addr\] [LB::server addr]" } when LB_FAILED { if { $c < [active_members foo] } { LB::reselect pool foo log local0. "LB::reselect pool foo" incr c } } when SERVER_CONNECTED { table set "tbl[TCP::remote_port]" [IP::remote_addr] log local0. "table set \"tbl[TCP::remote_port]\" [IP::remote_addr]" log local0. "connection [IP::client_addr]:[TCP::client_port] > [clientside {IP::local_addr}]:[clientside {TCP::local_port}] > [IP::remote_addr]:[TCP::remote_port]" } } 1st request: load balance and save successful pool member Mar 12 22:17:36 local/tmm info tmm[4950]: Rule myrule : -- Mar 12 22:17:36 local/tmm info tmm[4950]: Rule myrule : [table lookup tbl80] Mar 12 22:17:36 local/tmm info tmm[4950]: Rule myrule : [LB::server addr] 200.200.200.101 Mar 12 22:17:36 local/tmm info tmm[4950]: Rule myrule : table set "tbl80" 200.200.200.101 Mar 12 22:17:36 local/tmm info tmm[4950]: Rule myrule : connection 172.28.19.251:43695 > 172.28.19.252:80 > 200.200.200.101:80 2nd request: send request to the pool member previously saved Mar 12 22:17:44 local/tmm info tmm[4950]: Rule myrule : -- Mar 12 22:17:44 local/tmm info tmm[4950]: Rule myrule : [table lookup tbl80] 200.200.200.101 Mar 12 22:17:44 local/tmm info tmm[4950]: Rule myrule : node 200.200.200.101 Mar 12 22:17:44 local/tmm info tmm[4950]: Rule myrule : [LB::server addr] 200.200.200.101 Mar 12 22:17:44 local/tmm info tmm[4950]: Rule myrule : table set "tbl80" 200.200.200.101 Mar 12 22:17:44 local/tmm info tmm[4950]: Rule myrule : connection 172.28.19.251:43696 > 172.28.19.252:80 > 200.200.200.101:80 3rd request: if server is changed to another pool member (i.e. 200.200.200.101:80 is no longer availabe), load balance and save new successful pool member Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : -- Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : [table lookup tbl80] 200.200.200.101 Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : node 200.200.200.101 Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : [LB::server addr] 200.200.200.101 Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : LB::reselect pool foo Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : [LB::server addr] 200.200.200.111 Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : table set "tbl80" 200.200.200.111 Mar 12 22:18:02 local/tmm info tmm[4950]: Rule myrule : connection 172.28.19.251:43697 > 172.28.19.252:80 > 200.200.200.111:80 - krisdames
Cirrus
Kevin,
Thank you kindly for the response. I think this is a good idea and I believe it would work. The only issue I have with it is that I would have to maintain the data group and I am willing to sacrifice some overhead on the F5 in order to make this as dynamic as possible.
--Kris - Kevin_Stewart
Employee
A 100-application data group should only need 100 entries. But then again 100 individual pools wouldn't be a big deal either - and would make your monitoring efforts MUCH more robust. The big issue seems to the management of each. Enter iControls and TMSH management scripts. ;) - krisdames
Cirrus
nitass,
I had to set the increment variable to 0 instead of 1 because the test was stopping before the last pool member was tested. Other than than, the logic is great and it seems to be working (our developers are performing more rigorous testing as I speak). This is exactly what I was looking for because it requires no maintenance. Well done!--
Kris
- nitass
Employee
I had to set the increment variable to 0 instead of 1 because the test was stopping before the last pool member was tested.oh yes, thanks!
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com