troubleshooting
2 TopicsPacket Analysis with Scapy and tcpdump: Checking Compatibility with F5 SSL Orchestrator
In this guide I want to demonstrate how you can use Scapy (https://scapy.net/) and tcpdump for reproducing and troubleshooting layer 2 issues with F5 BIG-IP devices. Just in case you get into a finger-pointing situation... Starting situation This is a quite recent story from the trenches: My customer uses a Bypass Tap to forward or mirror data traffic to inline tools such as IDS/IPS, WAF or threat intelligence systems. This ByPass Tap offers a feature called Network Failsafe (also known as Fail-to-Wire). This is a fault tolerance feature that protects the flow of data in the event of a power outage and/or system failure. It allows traffic to be rerouted while the inline tools (IDS/IPS, WAF or threat intelligence systems) are shutting down, restarting, or unexpectedly losing power (see red line namedFallbackin the picture below). Since the ByPass Tap itself does not have support for SSL decryption and re-encryption, an F5 BIG-IP SSL Orchestrator shall be introduced as an inline tool in a Layer 2 inbound topology. Tools directly connected to the Bypass Tap will be connected to the SSL Orchestrator for better visibility. To check the status of the inline tools, the Bypass Tap sends health checks through the inline tools. What is sent on one interface must be seen on the other interface and vice versa. So if all is OK (health check is green), traffic will be forwarded to the SSL Orchestrator, decrypted and sent to the IDS/IPS and the TAP, and then re-encrypted and sent back to the Bypass Tap. If the Bypass Tap detects that the SSL Orchestrator is in a failure state, it will just forward the traffic to the switch. This is the traffic flow of the health checks: Target topology This results in the following topology: Problem description During commissioning of the new topology, it turned out that the health check packets are not forwarded through the vWire configured on the BIG-IP. A packet analysis with Wireshark revealed that the manufacturer uses ARP-like packets with opcode 512 (HEX 02 00). This opcode is not defined in the RFC that describes ARP (https://datatracker.ietf.org/doc/html/rfc826), the RFC only describes the opcodes Request (1 or HEX 00 01) and Reply (2 or HEX 00 02). NOTE:Don't get confused that you see ARP packets on port 1.1 and 1.2. They are not passing through, the Bypass Tap is just send those packets from both sides of the vWire, as explained above. The source MAC on port 1.1 and 1.2 are different. Since the Bypass Tap is located right behind the customer's edge firewall, lengthy and time-consuming tests on the live system are not an option, since it would result in a massive service interruption. Therefore, a BIG-IP i5800 (the same model as the customer's) was set up as SSL Orchestrator and a vWire configuration was build in my employers lab. The vWire configuration can be found in this guide (https://clouddocs.f5.com/sslo-deployment-guide/chapter2/page2.7.html). INFO:For those not familiar with vWire: "Virtual wire … creates a layer 2 bridge across the defined interfaces. Any traffic that does not match a topology listener will pass across this bridge." Lab Topology The following topology was used for the lab: I build a vWire configuration on the SSL Orchestrator, as in the customer's environment. A Linux system with Scapy installed was connected to Interface 1.1. With Scapy TCP, UDP and ARP packets can be crafted or sent like a replay from a Wireshark capture. Interface 1.3 was connected to another Linux system that should receive the ARP packets. All tcpdumps were captured on the F5 and analyzed on the admin system (not plotted). Validating vWire Configuration To check the functionality of the F5 and the vWire configuration, two tests were performed. A replay of the Healthcheck packets from the Bypass Tap and a test with RFC-compliant ARP requests. Use Scapy to resend the faulty packets First, I used Wireshark to extract a single packet from packet analysis we took in the customer environment and saved it to a pcap file. I replayed this pcap file to the F5 with Scapy. The sendp() function will work at layer 2, it requires the parametersrdpcap(location of the pcap file for replay) andiface(which interface it shall use for sending). webserverdude@tux480:~$ sudo scapy -H WARNING: IPython not available. Using standard Python shell instead. AutoCompletion, History are disabled. Welcome to Scapy (2.5.0) >>> sendp(rdpcap("/home/webserverdude/cusomter-case/bad-example.pcap"),iface="enp0s31f6") . Sent 1 packets. This test confirmed the behavior that was observed in the customer's environment. The F5 BIG-IP does not forward this packet. Use PING and Scapy to send RFC-compliant ARP packets To create RFC-compliant ARP requests, I first sent an ARP request (opcode 1) through the vWire via PING command. As expected, this was sent through the vWire. To ensure that this also works with Scapy, I also resent this packet with Scapy. >>> sendp(rdpcap("/home/webserverdude/cusomter-case/good-example.pcap"),iface="enp0s31f6") . Sent 1 packets. In the Wireshark analysis it can be seen that this packet is incoming on port 1.1 and then forwarded to port 1.3 through the vWire. Solving the issue with the help of the vendor It became evident that the BIG-IP was dropping ARP packets that failed to meet RFC compliance, rendering the Bypass Tap from this particular vendor seemingly incompatible with the BIG-IP. Following my analysis, the vendor was able to develop and provide a new firmware release addressing this issue. To verify that the issue was resolved in this firmware release, my customer's setup, the exact same model of the Bypass Tap and a BIG-IP i5800, were deployed in my lab, where the new firmware underwent thorough testing. With this approach I could test the functionality and compatibility of the systems under controlled conditions. In this Wireshark analysis it can be seen that the Healthcheck packets are incoming on port 1.1 and then forwarded to port 1.3 through the vWire (marked in green) and also the other way round, coming in on port 1.3 and then forwarded to port 1.1 (marked in pink). Also now you can see that the packet is a proper gratuitous ARP reply (https://wiki.wireshark.org/Gratuitous_ARP). Because the Healthcheck packets were not longer dropped by the BIG-IP, but were forwarded through the vWire the Bypass Tap subsequently marked the BIG-IP as healthy and available. The new firmware resolved the issue. Consequently, my customer could confidently proceed with this project, free from the constraints imposed by the compatibility issue.435Views2likes2Commentsringdump
Problem this snippet solves: loop a tcpdump until a log message is seen Code : # Updated: 10/16/06 #!/usr/bin/perl ## VERSION v0.9b use strict; ################ # tcpdump settings ########## my %SETTINGS = ( external => { filter => "port 443" }, internal => { filter => "port 80" }, lo0 => { filter => "port 80" }, ); my $SNAPLEN = 4352; ################ # script settings ###### # free space checking my $FREE_SPACE_CHECK_INTERVAL = 1; # check free space every this number of seconds my $MIN_FREE_SPACE = 5; # minimum percent space left on parition my $CAPTURE_LOCATION = $ARGV[0]; # file rotation settings my $CAPTURES_TO_ROTATE = 4; # tcpdump capture files to rotate my $DESIRED_CAPTURE_SIZE = 15; # megabytes per capture file before rotating my $OVERLAP_DURING_ROTATE = 5; # seconds to overlap previous capture while starting a new one my $CAPTURE_CHECK_INTERVAL = 1; # how often (seconds) to check the size of capture files for rotating # trigger settings - time (run tcpdumps for x seconds) #my $TRIGGER = "time-based"; my $TIME_TO_CAPTURE = 300; # trigger settings - log-message (stop tcpdump when log message is received) my $TRIGGER = "log-message based"; my $LOG_FILE = "/var/log/messages"; my $LOG_MESSAGE = "no space in response line"; my $FOUND_MESSAGE_WAIT = 5; # how many seconds to gather tcpdumps after we match the log message # misc my $IDLE_TIMER = 5; # if ! receiving log entries, how long before checking if log is rotated my $MAX_ROTATED_LINES = 10000; # max lines to read from file we're re-reading because it's been rotated my $PID_FILE = "/var/run/ring_dump.pid"; my $DEBUG = 0; # 0/1 #################################################### # END OF THINGS THAT SHOULD NEED TO BE CONFIGURED #################################################### ######## # set defaults ### $SNAPLEN ||= 4352; $TRIGGER ||= "time"; $CAPTURE_LOCATION ||= "/var/tmp"; $TIME_TO_CAPTURE ||= 60; $FREE_SPACE_CHECK_INTERVAL ||= 5; $CAPTURES_TO_ROTATE ||= 3; $DESIRED_CAPTURE_SIZE ||= 10; $OVERLAP_DURING_ROTATE ||= 5; $CAPTURE_CHECK_INTERVAL ||= 5; $MIN_FREE_SPACE ||= 5; $LOG_FILE ||= "/var/log/messages"; $LOG_MESSAGE ||= "FAILED"; $FOUND_MESSAGE_WAIT ||= 5; $IDLE_TIMER ||= 5; $PID_FILE ||= "/var/run/ring_dump.pid"; $DEBUG ||= 0; unless (-d $CAPTURE_LOCATION) { print "$CAPTURE_LOCATION isn't a directory, using /mnt instead\n\n"; $CAPTURE_LOCATION = "/mnt"; } if (! -r $LOG_FILE) { die "Can't read \"$LOG_FILE\", EXIT\n"; } # insert code to find tcpdump instead of relying on path HERE: my $tcpdump = "/usr/sbin/tcpdump"; ###### # misc global variable declaration ########## my($answer, $interface, $pid, $tail_child, $F_LOG); my($current_size, $current_inode, $last_size, $last_inode); my @child_pids; my $ppid = $$; my $min_megabytes = $CAPTURES_TO_ROTATE * $DESIRED_CAPTURE_SIZE; $current_size = $current_inode = $last_size = $last_inode = 0; $|++; ########### # functions ####### # exit function that does does necessary child handling sub finish { $_ = shift(); if (defined($_) && $_ ne "") { print; } foreach $interface (keys( %SETTINGS )) { push(@child_pids, $SETTINGS{$interface}{pid}); } $DEBUG && print "INTERRUPT: sending SIGINT and SIGTERM to: ", join(" ", @child_pids), "\n"; kill(2, @child_pids); sleep(1); kill(15, @child_pids); $DEBUG && print "INTERRUPT: done, unlink pidfile and exit\n"; unlink($PID_FILE); exit(0); } $SIG{INT} = sub { finish(); }; # report usage on CAPTURE_LOCATION's MB free from df sub free_megabytes { my $partition = shift(); $partition ||= $CAPTURE_LOCATION; my $free_megabytes; $DEBUG && print "free_megabytes(): capture partition is $partition\n"; open(DF, "df $partition|"); # discard the first line; $_ = ; # parse the usage out of the second line $_ = ; $free_megabytes = (split)[3]; $free_megabytes = int($free_megabytes / 1024); close(DF); $DEBUG && print "free_megabytes(): finished reading df, output is: $free_megabytes\n"; $free_megabytes; } # report usage on CAPTURE_LOCATION's % usage from df sub free_percent { my $partition = shift(); $partition ||= $CAPTURE_LOCATION; my $free_percent; $DEBUG && print "free_percent(): capture partition is $partition\n"; open(DF, "df $partition|"); # discard the first line; $_ = ; # parse the usage out of the second line $_ = ; $free_percent = (split)[4]; chop($free_percent); ## chop off '%' $free_percent = (100 - $free_percent); close(DF); $DEBUG && print "free_percent(): finished reading df, output is: $free_percent\n"; $free_percent; } # simple sub to send SIGHUP to syslogd sub restart_syslogd () { if (-f "/var/run/syslog.pid") { open(PIDFILE, "; chomp; kill(1, ($_)); 1; } # simple wrapper to start tcpdumps, assuming obvious globals sub start_tcpdump { my $interface = shift(); my $capture_file = shift(); my $filter = shift(); my @cmd = ("$tcpdump", "-s$SNAPLEN", "-i$interface", "-w$capture_file", "$filter"); $DEBUG || open(STDERR, ">/dev/null"); $DEBUG && print "start_tcpdump(): about to start: ", join(" ", @cmd), "\n"; exec($cmd[0], @cmd[1..$#cmd]) || print "start_tcpdump(): FAILED to start: ", join(" ", @cmd), ", command not found\n"; $DEBUG || close(STDERR); exit(1); } # sub to see how much space a given capture file is using (to decide to rotate or not) sub capture_space ($) { my $capture_file = shift(); my $size = ( stat($capture_file) )[7]; $DEBUG && print "capture_space(): size of $capture_file is $size\n"; # return size of argument in megabytes, but don't divide by zero if ($size == 0) { return 0; } else { return ($size / 1048576); } } # gives user the option to create a MFS sub create_mfs () { if (-d $CAPTURE_LOCATION) { $DEBUG && print "create_mfs(): directory $CAPTURE_LOCATION exists\n"; } else { mkdir($CAPTURE_LOCATION, oct(0755)) || die "FAILED to create $CAPTURE_LOCATION\n"; print "Capture directory ($CAPTURE_LOCATION) did not exist, so it was created\n"; } # figure out the partition CAPTURE_LOCATION is on. This is cheap... fixme my $partition = $CAPTURE_LOCATION; $partition =~ s!(/[A-z0-9]*)/{0,1}.*!$1!g; open(MOUNT, "mount|") || die "FAILED to run \"mount\": !$\n"; while ( ) { next unless ((split())[2] =~ /^$partition$/); $DEBUG && print "create_mfs(): partition: $partition is already mounted, return\n"; # return 1 if it's already mounted return 1; } close(MOUNT); print "Mount a Memory File System (MFS) on ${CAPTURE_LOCATION}? [y/n]: "; my $answer = ; if (lc($answer) =~ "y") { print "Enter size of MFS in blocks (200000 = 100M), or just press enter for 100M: "; chomp (my $mfs_size = ); $mfs_size = 200000 if ($mfs_size eq ""); print "Allocating $mfs_size blocks to $CAPTURE_LOCATION for MFS\n"; system("mount_mfs -s $mfs_size $CAPTURE_LOCATION"); if (($? >> 8) != 0) { print "an error occurring trying to mount the MFS filesystem, exit status: $?\n"; 0; } else { print "MFS file system established\n\n"; 1; } } } sub fork_to_background ($) { my $cmd = shift(); my $pid = fork(); if ($pid == 0) { exec($cmd) || die "exec() failed: $!\n"; } else { return($pid); } } sub popen_read ($) { my $cmd = shift(); my $child; $DEBUG && print "Background: \"$cmd\"\n"; pipe(READLOG, WRITELOG); select(READLOG); $|++; select(WRITELOG); $|++; select(STDOUT); ## dup STDOUT and STDERR open(T_STDOUT, ">&STDOUT"); open(T_STDERR, ">&STDERR"); ## redir STDOUT to pipe for child open(STDOUT, ">&WRITELOG"); open(STDERR, ">&WRITELOG"); $child = fork_to_background($cmd); ## close STDOUT, STDERR and FILE close(STDOUT); close(STDERR); ## re-open STDOUT as normal and close dup open(STDOUT, ">&T_STDOUT"); close(T_STDOUT); open(STDERR, ">&T_STDERR"); close(T_STDERR); return($child, \*READLOG); } sub open_log ($$) { my $LOG_FILE = shift(); my $lines = shift(); if (defined($F_LOG) && defined(fileno($F_LOG)) ) { $DEBUG && print "Killing child before closing LOG\n"; kill(15, $tail_child); waitpid($tail_child, 0); $DEBUG && print "Closing LOG\n"; close($F_LOG); } $DEBUG && print "Opening \"$LOG_FILE\"\n"; ($tail_child, $F_LOG) = popen_read("tail -n $lines -f $LOG_FILE"); push(@child_pids, $tail_child); 1; } ## check to see if log is rotated, returns true if rotated sub is_rotated ($) { my $LOG_FILE = shift(); $DEBUG && print "enter is_rotated()\n"; ($current_inode, $current_size) = (stat($LOG_FILE))[1,7]; if (($last_size != 0) && ($last_size > $current_size)) { $DEBUG && print "File is now smaller. File must have been rotated\n"; $last_size = $current_size; $last_inode = $current_inode; open_log($LOG_FILE, $MAX_ROTATED_LINES) || die "open_log $LOG_FILE failed: $!\n"; return(1); } elsif (($last_inode != 0) && ($last_inode != $current_inode)) { $DEBUG && print "Inode changed. File must have been rotated\n"; $last_inode = $current_inode; $last_size = $current_size; open_log($LOG_FILE, $MAX_ROTATED_LINES) || die "open_log $LOG_FILE failed: $!\n"; return(1); } ($last_inode, $last_size) = ($current_inode, $current_size); 0; } ########### # MAIN ######## if (free_megabytes() < $min_megabytes) { print "free space on $CAPTURE_LOCATION is below ${min_megabytes}MB, you must create a Memory File System or choose another location to gather tcpdumps\n"; goto MUST_MFS; } ######### GET USER INPUT ############### if (free_percent() < $MIN_FREE_SPACE) { print "free space on $CAPTURE_LOCATION is below ${MIN_FREE_SPACE}%, you must create a Memory File System or choose another location to gather tcpdumps\n"; MUST_MFS: # require the user to create a MFS if they don't have enough free space exit(1) unless (create_mfs()); } else { create_mfs(); } if (free_percent() < $MIN_FREE_SPACE || free_megabytes() < $min_megabytes) { print "it appears the Memory File System is in place, but there is still insufficient space, exiting\n"; exit(1); } print "capturing to $CAPTURE_LOCATION using the following interfaces and filters:\n"; foreach $interface (keys( %SETTINGS )) { system("ifconfig $interface >/dev/null 2>&1"); if ( ($? >> 8) != 0) { $DEBUG && print "couldn't ifconfig $interface, removing from list\n"; delete( $SETTINGS{$interface} ); } else { print " $interface: $SETTINGS{$interface}{filter}\n"; } } print "does this look right? [y/n]: "; $answer = ; exit unless lc($answer) =~ "y"; ####### DAEMONIZE ############# chdir("/"); exit unless (fork() == 0); # kill old self, write pid file if (-f $PID_FILE) { open(PIDFILE, "<$PID_FILE"); kill(15, ); close(PIDFILE); } open(PIDFILE, ">$PID_FILE"); syswrite(PIDFILE, $$); close(PIDFILE); ########### START PROCESSING ############### foreach $interface (keys( %SETTINGS )) { my $filter = $SETTINGS{$interface}{filter}; $pid = fork(); $SETTINGS{$interface}{rotate_number} = 1; if (!defined($pid)) { print "fork() failed! exiting\n"; exit 1; } if ($pid == 0) { start_tcpdump( $interface, "$CAPTURE_LOCATION/${interface}.dump.$SETTINGS{$interface}{rotate_number}", $filter ); exit 1; } else { $SETTINGS{$interface}{pid} = $pid; print "started tcpdump as pid $pid on \"$interface\" filtered as \"$filter\"\n"; } } ###### # fork off a process to keep an eye on free space ######## $pid = fork(); if ($pid == 0) { while (1) { my $sleep_return = sleep($FREE_SPACE_CHECK_INTERVAL); $DEBUG && ($sleep_return != $FREE_SPACE_CHECK_INTERVAL) && print "WARN: free_percent() loop: sleep returned $sleep_return instead of $FREE_SPACE_CHECK_INTERVAL !\n"; if (free_percent() < $MIN_FREE_SPACE) { print "WARN: free space is below ${MIN_FREE_SPACE}%, killing main script\n"; kill(2, $ppid); sleep(1); kill(15, $ppid); print "WARN: sent SIGTERM to $ppid (main script), exiting\n"; exit 1; } else { $DEBUG && print "free_percent(): space is fine, continue\n"; } } } else { push(@child_pids, $pid); $DEBUG && print "started free_percent watcher as: $pid\n"; } ###### # fork off a process to rotate capture files as necessary ######## $pid = fork(); if ($pid == 0) { my $capture_file; while (1) { my $sleep_return = sleep($CAPTURE_CHECK_INTERVAL); $DEBUG && ($sleep_return != $CAPTURE_CHECK_INTERVAL) && print "WARN: start_tcpdump() loop: sleep returned $sleep_return instead of $CAPTURE_CHECK_INTERVAL !\n"; foreach $interface (keys( %SETTINGS )) { if (capture_space("$CAPTURE_LOCATION/${interface}.dump.$SETTINGS{$interface}{rotate_number}") >= $DESIRED_CAPTURE_SIZE) { if ($SETTINGS{$interface}{rotate_number} == $CAPTURES_TO_ROTATE) { print "reached maximum number of captures to rotate: $CAPTURES_TO_ROTATE, starting over at 1\n"; $SETTINGS{$interface}{rotate_number} = 1; } else { $SETTINGS{$interface}{rotate_number}++; } print "rotating capture file: ${interface}.dump, new extension .$SETTINGS{$interface}{rotate_number}\n"; $pid = fork(); if ($pid == 0) { start_tcpdump( $interface, "$CAPTURE_LOCATION/${interface}.dump.$SETTINGS{$interface}{rotate_number}", $SETTINGS{$interface}{filter}, ); exit 0; } push(@child_pids, $pid); # get some overlap in the two files sleep($OVERLAP_DURING_ROTATE); # kill the old tcpdump kill(2, $SETTINGS{$interface}{pid}); $DEBUG && print "sent SIGINT to $interface: $SETTINGS{$interface}{pid}, new pid $pid\n"; # record the new pid $SETTINGS{$interface}{pid} = $pid; } else { $DEBUG && print "capture file doesn't need to be rotated yet: ${interface}.dump\n"; } } # Reap any zombies from old tcpdumps $DEBUG && print "start_tcpdump() loop: \@child_pids = (", join(' ', @child_pids), ")\n"; while (1) { use POSIX ":sys_wait_h"; my $child = waitpid(-1, WNOHANG); if (defined $child and $child > 0) { # remove PID from @child_pids @child_pids = grep {$_ != $child} @child_pids; $DEBUG && print "start_tcpdump() loop: reaped child PID $child\n"; } else { # no one to reap last; } } } } else { push(@child_pids, $pid); $DEBUG && print "started capture file watcher as: $pid\n"; } ################ # watch triggers (time or log based) #################### $SIG{TERM} = sub { finish(); }; if (lc($TRIGGER) =~ /time/) { print "time-based trigger, will capture for $TIME_TO_CAPTURE seconds\n"; sleep($TIME_TO_CAPTURE); print "captured for $TIME_TO_CAPTURE seconds, stopping tcpdumps\n"; } elsif (lc($TRIGGER) =~ /log/) { print "log-based trigger, waiting for \"$LOG_MESSAGE\" in \"$LOG_FILE\"\n"; # creates global $F_LOG filehandle of $LOG_FILE open_log($LOG_FILE, 0) || finish("open_log $LOG_FILE failed: $!\n"); # flush syslogd's buffers (avoid never getting the message due to "last message repeated....") restart_syslogd() || finish("Restarting syslogd failed, EXIT\n"); # tail -f the log and wait for message while (1) { # reap any zombies during each loop my $return; while (1) { use POSIX ":sys_wait_h"; my $child = waitpid(-1, WNOHANG); if (defined $child and $child > 0) { $DEBUG && print "log trigger loop: reaped child PID $child\n"; } else { # no one to reap last; } } eval { $SIG{ALRM} = sub { die("ALRM\n"); }; alarm($IDLE_TIMER); $_ = <$F_LOG>; alarm(0); }; if ($@) { # this only occurs if we're idle for $IDLE_TIMER seconds because no new log entries are occuring $@ = undef; is_rotated($LOG_FILE); next; } $DEBUG && print "in LOG reading loop, current line: \"$_\"\n"; if (/$LOG_MESSAGE/) { $DEBUG && print "Current line matches: \"$LOG_MESSAGE\"\n"; last; } $DEBUG && print "no match, next\n"; } print "received log message, sleeping $FOUND_MESSAGE_WAIT seconds then stopping tcpdumps\n"; sleep($FOUND_MESSAGE_WAIT); } # figure out current tcpdump child_pids and push them onto the list foreach $interface (keys( %SETTINGS )) { push(@child_pids, $SETTINGS{$interface}{pid}); } # kill all tcpdumps + free space watcher + capture file rotator -- doesn't return finish(); 0;718Views0likes1Comment