Forum Discussion
High Packet Drop and connection failure
Have a pair of LTM 1600 (named LTM1 & LTM2) and a pair of cisco2960 (2960-1 2960-2) whereby the detailed connection are as below:-
LTM1 internal-trunk = interface 1.3 + 1.4
LTM1 internal-trunk (LACP Enabled, LACP Mode=Active, LACP Timeout = Long, Link Selection Policy = Auto, Frame Distribution Hash=Src/Dst IP)
LTM1 fibre = interface 2.1 + 2.2
LTM1 VLAN External (Tag=10, Untagged Interface=1.1)
LTM1 VLAN Internal (Tag=4093, Untagged Interface=internal-trunk)
LTM1 VLAN pri-failover (tag=4092, Untagged Interface=Fibre)
LTM1 interface 1.1 -> uplink cisco
LTM1 internal-trunk -> 2960-1 port channel 3
LTM1 Fibre -> LTM2 Fibre
LTM2 with exactly the same configuration
2960-1 port channel 5 -> 2960-2 port channel 5
Please find below show run cutting relevant information :-
2960-1show run
Building configuration...
Current configuration : 6188 bytes
!
version 12.2
hostname 2960-1
no ip source-route
!
no ip domain-lookup
vtp domain f5-private
vtp mode transparent
!
!
spanning-tree mode pvst
spanning-tree extend system-id
!
port-channel load-balance src-dst-ip
!
vlan internal allocation policy ascending
!
vlan 4093
name f5-private-vlan
!
!
!
interface Port-channel3
switchport access vlan 4093
switchport mode access
no keepalive
flowcontrol receive desired
!
interface Port-channel5
switchport access vlan 4093
switchport mode access
!
interface GigabitEthernet1/0/1
switchport access vlan 4093
switchport mode access
no keepalive
flowcontrol receive desired
no cdp enable
no cdp tlv server-location
no cdp tlv app
spanning-tree portfast disable
channel-group 3 mode active
!
interface GigabitEthernet1/0/2
switchport access vlan 4093
switchport mode access
no keepalive
flowcontrol receive desired
no cdp enable
no cdp tlv server-location
no cdp tlv app
spanning-tree portfast disable
channel-group 3 mode active
!
interface GigabitEthernet1/0/3
switchport access vlan 4093
switchport mode access
spanning-tree portfast disable
channel-group 5 mode desirable non-silent
!
interface GigabitEthernet1/0/4
switchport access vlan 4093
switchport mode access
spanning-tree portfast disable
channel-group 5 mode desirable non-silent
!!
interface Vlan1
no ip address
shutdown
!
interface Vlan4093
ip address 192.168.1.1 255.255.255.0
!
ip sla enable reaction-alerts
no cdp run
!
end
2960-2 with exactly the same configuration. The detailed situation is that it seems to have high connection failure rate from external subnet to virtual server. I have done a flood ping from 2960-1 to LTM1 without problem vice versa, but I have observed that there are around 10% packet drop when I tried to ping from LTM1 to LTM2 using either internal IP or external IP. Have reached the same result (10% packet drop) when I tried to ping from any host sitting in the internal subnet of LTM to LTM1/LTM2 using either internal or external IP. But I can reach 0 packet drop when I ping from host to 2960-1/2960-2 or vice versa. Is this caused by mis-configuration? How can I troubleshoot this?
- marco_octavian_NimbostratusThese types of problems are always like going down the rabbit hole, especially over a medium like this. It does look like we are starting to narrow it down.
- nitassEmployeeThe client is probably sending TCP RSTs because you're not doing anything and you ending the session (QUIT command). Sounds like normal behavior to me. it does send FIN here. since QUIT is sent right after connect, i do not think client should send RST.
- marco_octavian_NimbostratusNitass, In my experience of going through app traces for years, I have seen that at least half of all apps don't end sessions cleanly. You don't get the 4-way tcp closure very often, not like you would expect. I see Resets being very common and it's often messy for both sides of the connection.
- nitassEmployeeIn my experience of going through app traces for years, I have seen that at least half of all apps don't end sessions cleanly. You don't get the 4-way tcp closure very often, not like you would expect. I see Resets being very common and it's often messy for both sides of the connection.i see. thanks 🙂
virtual server [root@ve11a:Active:Changes Pending] config tmsh list ltm virtual bar ltm virtual bar { destination 172.28.20.111:25 ip-protocol tcp mask 255.255.255.255 pool foo profiles { tcp { } } source 0.0.0.0/0 source-address-translation { type automap } vs-index 6 } client [root@centos17 ~] telnet 172.28.20.111 25 Trying 172.28.20.111... Connected to 172.28.20.111 (172.28.20.111). Escape character is '^]'. 220 ESMTP quit 221 Connection closed by foreign host. packet trace [root@ve11a:Active:Changes Pending] config tcpdump -nni 0.0 host 172.28.20.111 and port 25 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on 0.0, link-type EN10MB (Ethernet), capture size 96 bytes 23:36:36.489354 IP 172.28.20.17.55215 > 172.28.20.111.25: S 2510838242:2510838242(0) win 5840 23:36:36.489397 IP 172.28.20.111.25 > 172.28.20.17.55215: S 485573986:485573986(0) ack 2510838243 win 4380 23:36:36.490487 IP 172.28.20.17.55215 > 172.28.20.111.25: . ack 1 win 5840 23:36:36.874546 IP 172.28.20.111.25 > 172.28.20.17.55215: P 1:24(23) ack 1 win 4380 23:36:36.876531 IP 172.28.20.17.55215 > 172.28.20.111.25: . ack 24 win 5840 23:36:37.921334 IP 172.28.20.17.55215 > 172.28.20.111.25: P 1:7(6) ack 24 win 5840 23:36:37.921360 IP 172.28.20.111.25 > 172.28.20.17.55215: . ack 7 win 4386 23:36:38.111502 IP 172.28.20.111.25 > 172.28.20.17.55215: P 24:41(17) ack 7 win 4386 23:36:38.111515 IP 172.28.20.111.25 > 172.28.20.17.55215: F 41:41(0) ack 7 win 4386 23:36:38.112569 IP 172.28.20.17.55215 > 172.28.20.111.25: . ack 41 win 5840 23:36:38.112573 IP 172.28.20.17.55215 > 172.28.20.111.25: F 7:7(0) ack 42 win 5840 23:36:38.112586 IP 172.28.20.111.25 > 172.28.20.17.55215: . ack 8 win 4386
- marco_octavian_Nimbostratus
Frank,
I think we are at a stalemate here. Yes, retransmissions are normal. Resets are common but not a good thing (most of the time). Nitass is right also. You shouldn't get a reset, of course. I just don't want you get hung up on it. It's just a clue (symptom) to uncovering the issue.
Items:
1) These simple tests may not be enough. It is not the same as your production traffic. Just sending a hello and/or just a quit isn't much. Where's the mail to and from? Smtp doesn't really have that many commands but it's the one's your missing in your tests that could be causing the problem. Perhaps there is an authentication issue (if required)? Perhaps a username isn't being recognized. ??? Maybe an unsupported command is coming across. ??
2) I don't know why telnet causes the resets for you. It is probably due to the buffer and the EOF all being pushed down the pipe. I know EOF is just to let the shell know the input is ended but it all gets put into the buffer and it makes a difference in the trace. I]m wondering if the QUIT is getting there before the 220 comes back. It works itself out but it makes the order of the packets look a bit strange. I have attached two wireshark images (taken from LTM tcpudmps). The one called smtp_without_eof.png and is a simple connect and QUIT without using EOF. The second image is called smtpeof.png and is when I used EOF like you did. Notice how it throws off the decodes in wireshark. I get weird results sometimes when replicating http monitors using telnet. There's no reason to put more cycles int this, at this time.
3) Did you compare your telnet session to a legitimate connection using tcpdump/wireshark? Let's focus on getting the right captures at this point. Just take captures from the front and back and then focus on the resets or long delta times for starters. Let's see if the same type of connections/users causing the resets or if it is load. You can also open a case and they will help you parse through the captures.
4) Did you test to both/all of your smtp servers directly or were you hitting the vip everytime? Are you using persistence? I also assume you are running these tests/scripts from the LTM.
When reply back regarding 3, we can examine it further.
- frankcheong_304Nimbostratus
Finally, we found happened. It is because the Pool do not have session persistence set. We have changed the persistence profile to source address and the issues is solved. Thx everyone for help.
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com