Troubleshooting OSPF on BIG-IP systems
This is more of a what to look for when troubleshooting guide than a step-by-step guide as I believe that troubleshooting is not an exact science..
Here's a rough road map of what we need to know and what I'm gonna show you in this article:
- Know what OSPF is and how to enable it on BIG-IP
- Know what RHI is
- Determine if OSPF problem is about Route Health Injection (RHI) or pure OSPF
- RHI Troubleshooting
- Check if K route is present
- Given that K route is present, check if it is injected into OSPF
- Debug NSM (BIG-IP's Network Services Module)
- Pure OSPF Troubleshooting
- OSPF neighbour/adjacency not established
- DBD MTU Mismatch
- Route is present in OSPF database but no in Routing Table
- Make sure TMM is forwarding data to BIG-IP's Control plane and vice-versa
- OSPF Debug
Know what OSPF is and how to enable it on BIG-IP
OSPF is a routing protocol that roughly tells each router in the OSPF domain where to send the packet to. If there was no OSPF then we'd have to figure it out by ourselves and configure static routes.
There is much more to it and in order to know more about OSPF I'd recommend this friendly illustrated book: Bryant's advantage ROUTE book.
Apart from that, in order to enable OSPF we need to do it via Route Domain configuration:
And here is how we access the CLI configuration:
Here's where ZebOS/OSPF configuration is stored (per routing domain) because ZebOS is not aware of Routing Domains:
If we had enabled it on Route Domain 1 (for example), then the path would be /config/zebos/rd1 and so on.
Know what RHI is
K14267 explains what it is and even though the article is about BGP, it also applies for OSPF.
Roughly speaking, RHI allows us to inject a virtual-address into ZebOS' routing table and advertise it into your OSPF/BGP domain and it is very easy to do it.
Here's my routing table when I have no RHI:
Now I use one of my VIPs as example:
Instead of clicking on the VIP itself I go to the virtual-address list:
I picked 10.199.3.143 and enabled Route Advertisement:
- Disabled is the default
- Enabled means route will always be advertised regardless
- Selective means route will only be advertised if virtual-address is available, i.e. Availability field above is Green.
- Availability of virtual-address is based on Availability Calculation field above
- Any means when ANY virtual server using this virtual address is Available
- All means ONLY advertise route when ALL virtual servers using this virtual address are Available
PS: Selective is the usual choice but in older versions of BIG-IP we might find only two options (Enabled and Disabled). When this is the case, Enabled behaves like Selective does in newer versions. Please check this AskF5 article about that.
Now look at my routing table and notice that a K route that matches my virtual address magically appeared:
Now we just need to redistribute this route into OSPF with redistribute kernel command similar to what K14267 describes.
Determine if OSPF problem is about Route Health Injection (RHI) or pure OSPF
- If it is RHI, we're talking about the K routes above
- K route is not being advertised via OSPF (e.g. old bug - ID529977 OSPF may not process updates to redistributed routes)
- K route doesn't even appear at all
- Pure OSPF is everything else, e.g:
- OSPF neighbour relationship/adjacency not established
- DBD MTU mismatch
- Route is present in OSPF database but not in routing table (i.e. route is seen in 'show ip ospf database' but not in 'show ip route').
Check if K route is present
- Enable NSM debug and learn how to interpret it
- If we suspect BIG-IP's control plane is not receiving route from tmm (or mcpd in in 10.x/11.x) enable tmrouted debug (tmrouted is BIG-IP's routing control plane daemon)
Given that K route is present, check if it is injected into OSPF
- Check 'show ip ospf database' in local peer or 'show ip route' and 'show ip ospf database' in remote peer
- Check 'redistribute' command in BIG-IP configuration. K route is only present when 'redistribute kernel' is present.
- Check for the present of route-maps in redistribute kernel command. E.g. redistribute kernel route-map my-filter
- Sometimes we might be filtering by mistake routes are redistributed with a route-map
We can type this on BIG-IP:
All RHI relevant information should be on /var/tmp/my-box.log. We can even issue tail -f command to follow along in real time.
Pure OSPF Troubleshooting
OSPF neighbour/adjacency not established
- check 'show ip ospf neighbour'
- 2-way means neighbour relationship is established but we have not (or will not) exchange route with this particular neighbour
- Full means adjacency is established and this means we did exchange routes with this particular neighbour
- DR/BDR are only present in Ethernet network type for efficiency purposes which means a DR (designated router) and BDR (backup designated router) are elected and all the other routers maintain adjacency with both DR/BDR but only neighbour relationship among them.
- Any changes in the network are supposed to be advertised to the DR/BDR and only then it's spread to all the other routers.
- We can imagine how inefficient it would be in a network with hundreds of routers sharing the same network if any topology change was supposed to be advertised to all routers hence the idea of DR/BDR.
- Our job here is to make sure OSPF is not stuck in any of the intermediate stages due to misconfiguration, typically one of the options below do not match:
- Brief explanation of above highlighted fields for reference:
- Area ID: This is always 32 bit and area 0 is 0.0.0.0. Keep in mind that virtual-links would also appear to be originating from area 0 too.
- Auth Type: 0 is Null, 1 is password protected.
- Auth Data: Nothing if set to Null, MD5 hash if set to MD5 or clear-text password if set to Plain Text.
- Hello Interval [sec]: how often Hello packet is sent. This must match on both sides.
- N: When enabled (1), this means area is not-so-stubby¹, i.e. does not accept type 5 External LSAs (O E1 and O E2 from show ip route command) but converts type 7 LSAs into type 5 in order to advertise route to other areas
- MC: When enabled (1), this means BIG-IP also supports multicast routing (MOSPF) apart from unicast routing.
- E: Disabled (0) means area is stub², i.e. does not accept type 5 External LSAs (O E1 and O E2 route types from show ip route command)
- Router Dead Interval [sec]: number of seconds neighbour is declared down when OSPF stops receiving Hello packets. Hello packets reset RouterDeadInterval counter.
¹ area <area number> nssa command under router ospf mode
² area <area number> stub command under router ospf mode
- Take a packet capture:
- # tcpdump -nvi <VLAN name>:nnn -s0 -w /var/tmp/ospf_neighbor-tmm-net.pcap ip proto ospf -v
- In the pcap we should see the OSPF router ID of peer in the Active Neighbor list and this indicates that they're either neighbours or adjacent:
- Then, if we look at a packet capture and we see that they're both active neighbours for a while and suddenly they're not, then this means the side that removed the neighbour from Active Neighbor list likely flapped, restarted or disconnected.
DBD MTU Mismatch
- After BIG-IP receives first Hello Packet, it enters 2-way state and then moves to ExStart where it exchanges DBD packets. At this stage, MTU size is checked and neighbour relationship won't go any further if they do not match.
- To confirm this we can check the MTU configuration in the VLAN on BIG-IP and compare it to the one on the peer.
- Take a packet capture and look for the DBD's Interface MTU field:
Route is present in OSPF database but not in Routing table
- Yes, OSPF has its own routing table where it decides what to add to the regular routing table that we all know and love
- To troubleshoot this we need to check the details of the LSA with show ip ospf database command.
- There are times when a route with a better Administrative Distance might be in place (e.g. static route)
Make sure TMM is forwarding data to BIG-IP's Control Plane and vice-versa
TMM <-> Network:
TMM <-> Control Plane:
The first packet capture will record the communication externally between BIG-IP and the external device.
The second one will record the communication (internally) between BIG-IP's forwarding plane (tmm) and BIG-IP's control plane daemon responsible to process the routes and install in routing table.
The above commands will print debug information about OSPF networking as well as control plane.