Forum Discussion

Nimbostratus

Aug 02, 2013

High Packet Drop and connection failure

Have a pair of LTM 1600 (named LTM1 & LTM2) and a pair of cisco2960 (2960-1 2960-2) whereby the detailed connection are as below:-

LTM1 internal-trunk = interface 1.3 + 1.4

LTM1 internal-trunk (LACP Enabled, LACP Mode=Active, LACP Timeout = Long, Link Selection Policy = Auto, Frame Distribution Hash=Src/Dst IP)

LTM1 fibre = interface 2.1 + 2.2

LTM1 VLAN External (Tag=10, Untagged Interface=1.1)

LTM1 VLAN Internal (Tag=4093, Untagged Interface=internal-trunk)

LTM1 VLAN pri-failover (tag=4092, Untagged Interface=Fibre)

LTM1 interface 1.1 -> uplink cisco

LTM1 internal-trunk -> 2960-1 port channel 3

LTM1 Fibre -> LTM2 Fibre

LTM2 with exactly the same configuration

2960-1 port channel 5 -> 2960-2 port channel 5

Please find below show run cutting relevant information :-

2960-1show run

Building configuration...

Current configuration : 6188 bytes

version 12.2

hostname 2960-1

no ip source-route

no ip domain-lookup

vtp domain f5-private

vtp mode transparent

spanning-tree mode pvst

spanning-tree extend system-id

port-channel load-balance src-dst-ip

vlan internal allocation policy ascending

vlan 4093

name f5-private-vlan

interface Port-channel3

switchport access vlan 4093

switchport mode access

no keepalive

flowcontrol receive desired

interface Port-channel5

switchport access vlan 4093

switchport mode access

interface GigabitEthernet1/0/1

switchport access vlan 4093

switchport mode access

no keepalive

flowcontrol receive desired

no cdp enable

no cdp tlv server-location

no cdp tlv app

spanning-tree portfast disable

channel-group 3 mode active

interface GigabitEthernet1/0/2

switchport access vlan 4093

switchport mode access

no keepalive

flowcontrol receive desired

no cdp enable

no cdp tlv server-location

no cdp tlv app

spanning-tree portfast disable

channel-group 3 mode active

interface GigabitEthernet1/0/3

switchport access vlan 4093

switchport mode access

spanning-tree portfast disable

channel-group 5 mode desirable non-silent

interface GigabitEthernet1/0/4

switchport access vlan 4093

switchport mode access

spanning-tree portfast disable

channel-group 5 mode desirable non-silent

interface Vlan1

no ip address

shutdown

interface Vlan4093

ip address 192.168.1.1 255.255.255.0

ip sla enable reaction-alerts

no cdp run

end

2960-2 with exactly the same configuration. The detailed situation is that it seems to have high connection failure rate from external subnet to virtual server. I have done a flood ping from 2960-1 to LTM1 without problem vice versa, but I have observed that there are around 10% packet drop when I tried to ping from LTM1 to LTM2 using either internal IP or external IP. Have reached the same result (10% packet drop) when I tried to ping from any host sitting in the internal subnet of LTM to LTM1/LTM2 using either internal or external IP. But I can reach 0 packet drop when I ping from host to 2960-1/2960-2 or vice versa. Is this caused by mis-configuration? How can I troubleshoot this?

config

design

marco_octavian_
Nimbostratus
Aug 06, 2013
These types of problems are always like going down the rabbit hole, especially over a medium like this. It does look like we are starting to narrow it down.

The LTM1 <> LTM2 ping loss is worth looking into, but not yet. I would be more concerned about losing pins from LTM1/LTM2 to any node under the 2960. I want you to find a node that is actually plugged into 2960-1. This is important. We don't want that traffic going across the PAgP link. I then want you to perform two ping tests. One with a 1000 and one with 10000 pings issued from LTM to the internal/inside node that is plugged into 2960-1 physically. Please report back.

If you are getting TCP RSTs from the client, then you need to dig further into this. This is the clue we have been looking for. Please verify the RST is from the client first. Make sure there isn't one from the server beforehand and also verify what communication took place right before the RST. Sample a couple of RSTs to see if it is the same call causing this. Focus in using the tcp.port or udp.port filter in wireshark.

The client is probably sending TCP RSTs because you're not doing anything and you ending the session (QUIT command). Sounds like normal behavior to me.

BTW, what version are you running again? 10.2?
nitass
Employee
Aug 07, 2013
The client is probably sending TCP RSTs because you're not doing anything and you ending the session (QUIT command). Sounds like normal behavior to me. it does send FIN here. since QUIT is sent right after connect, i do not think client should send RST.

just my 2 cents.
marco_octavian_
Nimbostratus
Aug 07, 2013
Nitass, In my experience of going through app traces for years, I have seen that at least half of all apps don't end sessions cleanly. You don't get the 4-way tcp closure very often, not like you would expect. I see Resets being very common and it's often messy for both sides of the connection.

If this particular conversation of the trace could be posted here, that would help. You could export it to text and the substitute the ip addresses with anonymous ones.

nitass

Employee

Aug 07, 2013

In my experience of going through app traces for years, I have seen that at least half of all apps don't end sessions cleanly. You don't get the 4-way tcp closure very often, not like you would expect. I see Resets being very common and it's often messy for both sides of the connection.i see. thanks 🙂

If this particular conversation of the trace could be posted here, that would help. You could export it to text and the substitute the ip addresses with anonymous ones.this is what i tested here.

 virtual server

[root@ve11a:Active:Changes Pending] config  tmsh list ltm virtual bar
ltm virtual bar {
    destination 172.28.20.111:25
    ip-protocol tcp
    mask 255.255.255.255
    pool foo
    profiles {
        tcp { }
    }
    source 0.0.0.0/0
    source-address-translation {
        type automap
    }
    vs-index 6
}

 client

[root@centos17 ~] telnet 172.28.20.111 25
Trying 172.28.20.111...
Connected to 172.28.20.111 (172.28.20.111).
Escape character is '^]'.
220 ESMTP
quit
221 
Connection closed by foreign host.

 packet trace

[root@ve11a:Active:Changes Pending] config  tcpdump -nni 0.0 host 172.28.20.111 and port 25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on 0.0, link-type EN10MB (Ethernet), capture size 96 bytes
23:36:36.489354 IP 172.28.20.17.55215 > 172.28.20.111.25: S 2510838242:2510838242(0) win 5840 
23:36:36.489397 IP 172.28.20.111.25 > 172.28.20.17.55215: S 485573986:485573986(0) ack 2510838243 win 4380 
23:36:36.490487 IP 172.28.20.17.55215 > 172.28.20.111.25: . ack 1 win 5840 
23:36:36.874546 IP 172.28.20.111.25 > 172.28.20.17.55215: P 1:24(23) ack 1 win 4380 
23:36:36.876531 IP 172.28.20.17.55215 > 172.28.20.111.25: . ack 24 win 5840 
23:36:37.921334 IP 172.28.20.17.55215 > 172.28.20.111.25: P 1:7(6) ack 24 win 5840 
23:36:37.921360 IP 172.28.20.111.25 > 172.28.20.17.55215: . ack 7 win 4386 
23:36:38.111502 IP 172.28.20.111.25 > 172.28.20.17.55215: P 24:41(17) ack 7 win 4386 
23:36:38.111515 IP 172.28.20.111.25 > 172.28.20.17.55215: F 41:41(0) ack 7 win 4386 
23:36:38.112569 IP 172.28.20.17.55215 > 172.28.20.111.25: . ack 41 win 5840 
23:36:38.112573 IP 172.28.20.17.55215 > 172.28.20.111.25: F 7:7(0) ack 42 win 5840 
23:36:38.112586 IP 172.28.20.111.25 > 172.28.20.17.55215: . ack 8 win 4386

marco_octavian_
Nimbostratus
Aug 08, 2013
Frank,

I think we are at a stalemate here. Yes, retransmissions are normal. Resets are common but not a good thing (most of the time). Nitass is right also. You shouldn't get a reset, of course. I just don't want you get hung up on it. It's just a clue (symptom) to uncovering the issue.

Items:

1) These simple tests may not be enough. It is not the same as your production traffic. Just sending a hello and/or just a quit isn't much. Where's the mail to and from? Smtp doesn't really have that many commands but it's the one's your missing in your tests that could be causing the problem. Perhaps there is an authentication issue (if required)? Perhaps a username isn't being recognized. ??? Maybe an unsupported command is coming across. ??

2) I don't know why telnet causes the resets for you. It is probably due to the buffer and the EOF all being pushed down the pipe. I know EOF is just to let the shell know the input is ended but it all gets put into the buffer and it makes a difference in the trace. I]m wondering if the QUIT is getting there before the 220 comes back. It works itself out but it makes the order of the packets look a bit strange. I have attached two wireshark images (taken from LTM tcpudmps). The one called smtp_without_eof.png and is a simple connect and QUIT without using EOF. The second image is called smtpeof.png and is when I used EOF like you did. Notice how it throws off the decodes in wireshark. I get weird results sometimes when replicating http monitors using telnet. There's no reason to put more cycles int this, at this time.

3) Did you compare your telnet session to a legitimate connection using tcpdump/wireshark? Let's focus on getting the right captures at this point. Just take captures from the front and back and then focus on the resets or long delta times for starters. Let's see if the same type of connections/users causing the resets or if it is load. You can also open a case and they will help you parse through the captures.

4) Did you test to both/all of your smtp servers directly or were you hitting the vip everytime? Are you using persistence? I also assume you are running these tests/scripts from the LTM.

When reply back regarding 3, we can examine it further.
frankcheong_304
Nimbostratus
Sep 03, 2013
Finally, we found happened. It is because the Pool do not have session persistence set. We have changed the persistence profile to source address and the issues is solved. Thx everyone for help.

Recent Discussions

Under Attack? F5 Will Help You.
Contacting F5 Support?

DevCentral Quicklinks

* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com

Discover DevCentral Connects

* Podcasts
* Social Channels
* Video Streaming

Forum Discussion

High Packet Drop and connection failure

Recent Discussions

show system - attribute values

how does gtm/dns monitor wild-ip pool members?

Upgrade to 'BIGIP-15.1.0.5-0.0.8 (Could not access configuration source; sda, 1)

ISP Load Balancing, SNAT pool instead of Automap link self-ip, anyway to do health check ?

F5xC Migration

Related Content

Decrypting BIG-IP Packet Captures Automatically

Monitoring Failure OAuth request

F5 packet buffer

Re: Comprehensive solution for OWASP Web App A09:2021 Security Logging & Monitoring Failures fro

Mitigating OWASP Web Application Risk: Software and Data Integrity Failures using F5 XC Platform

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS