routing
10 TopicsUnderstanding the BGP Peering details your CCNA didn't teach you
1 Quick Facts BGP usesTCP port 179 and is path vector protocol, rather than distance vector or link-state Keep-alive is 60s by default and Hold time is 180s Only hold time isnegotiatedin Open message Lowest hold time is used Routes are only installed in the routing table and advertised to peer is deemed valid (*) and best (>) For BGP to advertise a route using network command such route has to be in the routing table first BGP transport and route advertisement (known as NLRI in BGP) are independent Transport can use either IPv4 or IPv6 IPv6 NLRI can be advertised over an IPv4 BGP connection and vice-versa 2. The BGP peering for IPv4 BIG-IP establishes TCP connection on port 179 with peer and each one exchangeOPEN Message: Here's BIG-IP'sOPEN Message: Here's Cisco'sOPEN Message: Markeris always the same value and is just to signal the beginning of BGP message so don't worry about it. Lengthis just the total length of this BGP message Typecould be UPDATE (2), KEEPALIVE (4) but in this case it is an OPEN Message (1). Versionis always the same as this is BGPv4. My ASis the same because this is iBGP but if it was eBGP My AS on both sides would differ. Hold Timepicked value is the lowest one and this is negotiated. Since lowest value on BIG-IP's side is 90 seconds then both peers will send KEEPALIVE messages at 30 seconds interval. BGP Identifieris the bgp router-id that we typed in. Optional Parameters Lengthis the total length combined of all BGP extensions Optional Parametersare BGP extensions. At this point, if something goes wrong with peering, we should see BGP sending a NOTIFICATION message to terminate peering process: The above one was issued because I shutdown the neighbour relationship manually but other types of error would reveal a different message. In a running packet capture, we should see KEEPALIVE messages and corresponding TCP ACKs at agreed interval based on Hold Time: Other than that, UPDATE messages are also common and I will show them right now. 3. The UPDATE message for IPv4 It's kind of built-in. For IPv4 we will see Type 2 message (UPDATE) and the NLRI (routes). The 2 prefixes below share the same attributes and this is why they're grouped together: 4. The BGP peering for IPv6 It's the same thing as IPv4 but an additional capability is negotiated calledMultiprotocol extensions capabilitywithIPv6asAFIvalue: In case you never heard of AFI/SAFI here's what they mean: AFI (Address Family Indicator):This is the kind of route BGP is capable of handling, normally IPv4 or IPv6 but can also be VPNv4 for MP-BGP (out of scope here). SAFI (Subsequent Address Family Indicator): This is whether route is UNICAST or MULTICAST, normally UNICAST. If you ever configured BGP and typed in address-family commands then you've touched AFI/SAFI already. 5. BONUS! Carrying IPv4 prefixes over an IPv6 BGP connection (WHAT?) Yes, it's possible but next-hop of an IPv4 route is going to be IPv4 so make sure there is connectivity or create the relevant route-map. Notice that we're transporting BGP packets over IPv6 but routes advertised are IPv4 prefixes: You can also carry IPv6 prefixes over IPv4 if we do the other way round.1.2KViews1like11CommentsBidirectional Forwarding Detection (BFD) Protocol Cheat Sheet
Definition This is a protocol initially described inRFC5880and IPv4/IPv6 specifics inRFC5881. I would say this is an aggressive 'hello-like' protocol with shorter timers but very lightweight on the wire and requiring very little processing as it is designed to be implemented in forwarding plane (although RFC does not forbid it to be implemented in control plane). It also contains a feature called Echo that further leaves cpu processing cycle to roughly ZERO which literally just 'loops' BFD control packets sent from peer back to them without even 'touching' (processing) it. BFD helps routing protocol detects peers failure at sub-second level and BIG-IP supports it on all its routing protocols. On BIG-IP it is control-plane independent as TMM that takes care of BFD sending/receivingunicastprobes (yes,no multicast!) and BIG-IP's Advanced Routing Module® being responsible only for its configuration (of course!) and to receive state information from TMM that is displayed in show commands. BIG-IP's control plane daemon communicates with TMM isoamd. This daemon starts when BFD is enabled in the route domain like any other routing protocol. BFD Handshake Explained 218: BFD was configured on Cisco Router but not on BIG-IP so neighbour signals BIG-IP sessionstate is Down and no flags 219: I had just enabled BFD on BIG-IP, session state is now Init and only Control Plane Independent flag set¹ 220: Poll flag is set to validate initial bidirectional connectivity (expecting Final flag set in response) 221: BIG-IP sets Final flag and handshake is complete² ¹Control Plane Independentflag is set because BFD is not actively performed by BIG-IP's control plane. BIG-IP's BFD control plane daemon (oamd) just signals TMM what BFD sessions are required and TMM takes care of sending/receiving of all BFD control traffic and informs session state back to Advanced Routing Module's daemon. ²Packets 222-223are just to show that after handshake is finished all flags are cleared unless there is another event that triggers them. Packet 218 is how Cisco Router sees BIG-IP when BFD is not enabled. Control Plane Independent flag on BIG-IP remains though for the reasons already explained above. Protocol fields Diagnostic codes 0 (No Diagnostic): Typically seen when BFD session is UP and there are no errors. 1 (Control Detection Time Expired):BFD Detect Timer expired and session was marked down. 2 (Echo Function Failed):BFD Echo packet loop verification failed, session is marked down and this is the diagnostic code number. 3 (Neighbor Signaled Session Down):If either neighbour advertised state or our local state is down you will see this diagnostic code 4 (Forwarding Planet Reset):When forwarding-plane (TMM) is reset and peer should not rely on BFD so session is marked down. 5 (Path Down):Ondemand mode external application can signal BFD that path is down so we can set session down using this code 6 Concatenated Path Down): 7 (Administratively Down):Session is marked down by an administrator 8 (Reverse Concatenated Path Down): 9-31:Reserved for future use BFD verification 'show' commands ³Type IP address to see specific session Modes Asynchronous(default): hello-like mode where BIG-IP periodically sends (and receives) BFD control packets and if control detection timer expires, session is marked as down.It uses UDP port3784. Demand: BFD handshake is completed but no periodic BFD control packets are exchanged as this mode assumes system has its own way of verifying connectivity with peer and may trigger BFD verification on demand, i.e. when it needs to use it according to its implementation.BIG-IP currently does not support this mode. Asynchronous + Echo Function: When enabled, TMM literally loops BFDecho-specificcontrol packetson UDP port 3785sent from peers back to them without processing it as it wasn't enough that this protocol is already lightweight. In this mode, a less aggressive timer (> 1 second) should be used for regularBFD control packets over port 3784 and more aggressive timer is used by echo function.BIG-IP currently does not support this mode. Header Fields Protocol Version:BFD version used. Latest one is v1 (RFC5880) Diagnostic Code: BFD error code for diagnostics purpose. Session State: How transmitting system sees the session state which can be AdminDown, Down, Init or Up. Message Flags:Additional session configuration or functionality (e.g. flag that says authentication is enabled) Detect Time Multiplier:Informs remote peer BFD session is supposed to be marked down ifDesired Min TX Intervalmultiplied by this value is reached Message Length(bytes):Length of BFD Control packet My Discriminator:For each BFD session each peer will use a unique discriminator to differentiate multiple session. Your Discriminator:When BIG-IP receives BFD control message back from its peer we add peer's My Discriminator to Your Discriminator in our header. Desired Min TX Interval(microseconds):Fastest we can send BFD control packets to remote peer (no less than configured value here) Required Min RX Interval(microseconds):Fastest we can receive BFD control packets from remote peer (no less than configured value here) Required Min Echo Interval(microseconds):Fastest we can loop BFD echo packets back to remote system (0 means Echo function is disabled) Session State AdminDown:Administratively forced down by command Down: Either control detection time expired in an already established BFD session or it never came up.If probing time (min_tx) is set to 100ms for example, and multiplier is 3 then no response after 300ms makes system go down. Init:Signals a desire to bring session up in the beginning of BFD handshake. Up:Indicates session is Up Message Flags Poll:Pool flag is just a 'ping' that requires peer box to respond with Final flag. In BFD handshake as well as in Demand mode pool message is a request to validate bidirectional connectivity. Final:Sent in response to packet with Poll bit set Control Plane Independent:Set if BFD can continue to function if control plane is disrupted¹ Authentication Present:Only set if authentication is being used Demand:If set, it is implied that periodic BFD control packets are no longer sent and another mechanism (on demand) is used instead. Multipoint:Reserved for future use of point-to-multipoint extension. Should be 0 on both sides. ¹ This is the case for BIG-IP as BFD is implemented in forwarding plane (TMM) BFD Configuration Configure desired transmit and receive intervals as well as multiplier on BIG-IP. And Cisco Router: You will typically configure the above regardless of routing protocol used. BFD BGP Configuration And Cisco Router: BFD OSPFv2/v3 Configuration BFD ISIS Configuration BFD RIPv1/v2 Configuration BFD Static Configuration All interfaces no matter what: Specific interface only: Tie BFD configuration to static route:9.5KViews0likes7CommentsTear Down This Wall! (Or at least punch a hole in it)
Back in 1987, President Reagan stood at Brandenburg Gate in Berlin and issued these iconic words: "Mr. Gorbachev, tear down this wall!" Sure, there was a path from east to west during the cold war at Checkpoint Charlie, but the existence of said path and the ability to traverse that path were not one in the same, at least among those valuing their life. At this point you might be asking what the heck this has to do with BIG-IP? Well, let me tell you! One of the more common misconceptions about BIG-IP concerns routing. A route (n) is (as defined in the Google) "a way or course taken in getting from a starting point to a destination." So far so good. Routes are easy to add and easy to follow. Static routes anyway. I believe the confusion lies in the fact that the BIG-IP is a default deny box. That means, like Checkpoint Charlie, just because there is a path doesn't mean that traffic is going to be allowed on that path. This is not an "If you build it, they will come" situation, as much as it pains me to break it to Ray Kinsella there to the right. In order for traffic to flow through the BIG-IP, it has to be explicitly allowed and this is done with virtual servers. Virtual servers are the tearing down (or punching holes) of the BIG-IP default deny wall. Whether you need to route all protocols for specific networks, or deliver a variety of applications, you can configure specific or wildcard servers of many different types, meticulous details of which can be found in Solution 14163. For a specific example, consider this question raised in Q&A yesterday. The original poster wanted to be able to route some traffic between vlans with no translation, while allowing translation on other traffic. Again, all this work is done in the forwarding configuration, not the routing configuration. Assumptions: outside vlan: 172.16.100.0/24 (nat outbound traffic to this subnet from inside vlans) inside vlans: 192.168.1.0/24, 192.168.2.0/24 (do not nat traffic between these or to the remote servers) ipsec link vlan: 192.168.99.0/24 remote servers: 192.168.100.0/24 (do not nat internal server traffic to remote servers) The diagram (remote servers on other side of a ipsec tunnel): Alexander came up with a working configuration after seeding the discussion with some of the above concepts, go check it out!226Views0likes0CommentsManaging BIG-IP Routes with the Python Bigsuds Library
Perusing the Q&A section of DevCentral a couple days ago, I noticed a question on adding routes from community member Richard: Is there a smart way to add a bunch of routes that doesn't mandate manual entry... (The JRahm paraphrase) There were a couple great answers on using merge or copy/paste, but this jogged the memory that "hey, I've done that before!" A few yeas ago I was just learning python and how to use the iControl API and wrote a three part series on this very thing: pyControl Apps #1 - BIG-IP Routing Table pyControl Apps #2 - Deleting BIG-IP Routes pyControl Apps #3 - Adding BIG-IP Routes That works, but pycontrol v1 (these scripts) and even pycontrol v2 are no longer supported/developed since the bigsuds library was introduced late last year. So I decided to update the script to support the bigsuds library, the python argparse module over the originally used and discontinued optparse module, and finally, using the RouteTableV2 iControl interface instead of the deprecated RouteTable interface. The Main Loop First, the code: if __name__ == "__main__": import bigsuds as pc import getpass import argparse parser = argparse.ArgumentParser() parser.add_argument("-s", "--system", required=True) parser.add_argument("-u", "--username", required=True) parser.add_argument("-d", "--delRoute") parser.add_argument("-a", "--addRoute") args = vars(parser.parse_args()) print "%s, enter your " % args['username'], upass = getpass.getpass() b = pc.BIGIP(args['system'], args['username'], upass) if args['delRoute']: del_tmmRoutes(b.Networking.RouteTable, args['delRoute']) if args['addRoute']: add_tmmRoutes(b.Networking.RouteTableV2, args['addRoute'], b.LocalLB.Pool, b.Networking.VLAN) get_tmmRoutes(b.Networking.RouteTableV2) The main loop remains unchanged in structure. First, I update the iControl reference for the bigsuds library instead of the pycontrol one and update the instantiation arguments. Next, I needed to swap out optparse in favor of argparse to handle the command line arguments for the script. Finally, I updated the iControl references from RouteTable to RouteTableV2, except in the reference to the delete route function. More on that below. The Delete Route Function Beginning in TMOS version 11, the RouteTable iControl interface was deprecated and RouteTableV2 introduced. However, the parameters for the delete_static_route method changed from an address/mask structure to a string, expecting the route name instead of the route address/mask. This is fine if you know the route names, but I haven't yet found an easy way to remove routes solely based on the routing information in the new interface. def del_tmmRoutes(obj, filename): routefile = open(filename, 'r') headers = routefile.readline().strip().split(',') stored_rows = [] for line in routefile: route = line.strip().split(',') stored_rows.append(dict(zip(headers, route))) for route in stored_rows: obj.delete_static_route(routes = [route]) print "Deleting Route %s/%s" % (route['destination'], route['netmask']) The Add Route Function This function just needed a little updating for the change from the add_static_route method to the create_static_route method. Now that routes require names, I had to account for the slice of headers/data I took from the routes in the text file. Structurally there were no changes. One thing I don't yet have working in the new interface is the reject route, so I've removed that code from the function displayed below, though the get routes function below will still display any reject routes in the system. def add_tmmRoutes(obj, filename, pool, vlan): pools = pool.get_list() vlans = vlan.get_list() routefile = open(filename, 'r') headers = routefile.readline().strip().split(',') rt_hdrs = headers[:1] dest_hdrs = headers[1:3] atr_hdrs = ['gateway','pool_name','vlan_name'] rts = [] dests = [] atrs = [] for line in routefile: ln = line.strip().split(',') rt_name = ln[:1] dest = ln[1:3] atr = ln[-2:] if atr[0] == 'pool': if atr[1] in pools: atrs.append(dict(zip(atr_hdrs, ['',atr[1],'']))) dests.append(dict(zip(dest_hdrs, dest))) rts.append(rt_name) else: print "Pool ", atr[1], " does not exist" elif atr[0] == 'vlan': if atr[1] in vlans: atrs.append(dict(zip(atr_hdrs, ['','',atr[1]]))) dests.append(dict(zip(dest_hdrs, dest))) rts.append(rt_name) else: print "Vlan ", atr[1], " does not exist" elif atr[0] == 'gateway': atrs.append(dict(zip(atr_hdrs, [atr[1],'','']))) dests.append(dict(zip(dest_hdrs, dest))) rts.append(rt_name) combined = zip(rts, dests, atrs) for x in combined: xl = list(x) obj.create_static_route(routes = xl[0], destinations = [xl[1]], attributes = [xl[2]]) The IP Sorting Function This function isn't necessary and can be removed if desired, I just included it to sort all the routes properly. The only update here was to change the data reference (i[2]['address']) in the list comprehension. def sort_ip_dict(ip_list): from IPy import IP ipl = [ (IP(i[2]['address']).int(), i) for i in ip_list] ipl.sort() return [ip[1] for ip in ipl] The Get Routes Function In this function I had to update a couple of the methods specific to the RouteTableV2 interface, work with some changing data types between the ZSI (pycontrol) and suds (bigsuds) libraries. I updated the output a little as well. def get_tmmRoutes(obj): try: tmmStatic = obj.get_static_route_list() tmmRtType = obj.get_static_route_type(routes = tmmStatic) tmmRtDest = obj.get_static_route_destination(routes = tmmStatic) except: "Unable to fetch route information - check trace log" combined = zip(tmmStatic, tmmRtType, tmmRtDest) combined = [list(a) for a in combined] ldict_gw_ip = [] ldict_gw_pool = [] ldict_gw_vlan = [] ldict_gw_reject = [] for x in combined: if x[1] == 'ROUTE_TYPE_GATEWAY': x.append(obj.get_static_route_gateway(routes = [x[0]])[0]) ldict_gw_ip.append(x) if x[1] == 'ROUTE_TYPE_POOL': x.append(obj.get_static_route_pool(routes = [x[0]])[0]) ldict_gw_pool.append(x) if x[1] == 'ROUTE_TYPE_INTERFACE': x.append(obj.get_static_route_vlan(routes = [x[0]])[0]) ldict_gw_vlan.append(x) if x[1] == 'ROUTE_TYPE_REJECT': ldict_gw_reject.append(x) gw_ip = sort_ip_dict(ldict_gw_ip) gw_pool = sort_ip_dict(ldict_gw_pool) gw_vlan = sort_ip_dict(ldict_gw_vlan) gw_reject = sort_ip_dict(ldict_gw_reject) print "\n"*2 print "TMM IP Routes: (Name: Net/Mask -> Gateway IP)" for x in gw_ip: print "\t%s: %s/%s -> %s" % (x[0], x[2]['address'], x[2]['netmask'], x[3]) print "\n"*2 print "TMM Pool Routes: (Name: Net/Mask -> Gateway Pool)" for x in gw_pool: print "\t%s: %s/%s -> %s" % (x[0], x[2]['address'], x[2]['netmask'], x[3]) print "\n"*2 print "TMM Vlan Routes: (Name: Net/Mask -> Gateway Vlan)" for x in gw_vlan: print "\t%s: %s/%s -> %s" % (x[0], x[2]['address'], x[2]['netmask'], x[3]) print "\n"*2 print "TMM Rejected Routes: (Name: Net/Mask)" for x in gw_reject: print "\t%s: %s/%s" % (x[0], x[2]['address'], x[2]['netmask']) The Route File Formats When adding/removing routes, the following file formats are necessary to work with the script. ### Add Routes File Format ### name,address,netmask,route_type,gateway /Common/r5,172.16.1.0,255.255.255.0,pool,/Common/testpool /Common/r6,172.16.2.0,255.255.255.0,vlan,/Common/vmnet3 /Common/r7,172.16.3.0,255.255.255.0,gateway,10.10.10.1 /Common/r8,172.16.4.0,255.255.255.0,gateway,10.10.10.1 ### Delete Routes File Format ### destination,netmask 172.16.1.0,255.255.255.0 172.16.2.0,255.255.255.0 172.16.3.0,255.255.255.0 172.16.4.0,255.255.255.0 The Test Now that the script and the test files are prepared, let's take this for a spin! First, I'll run this without file arguments. C:\>python getRoutes.py -s 192.168.6.5 -u admin admin, enter your Password: TMM IP Routes: (Name: Net/Mask -> Gateway IP) /Common/r.default: 0.0.0.0/0.0.0.0 -> 10.10.10.1 /Common/r1: 65.23.5.88/255.255.255.248 -> 10.10.10.1 /Common/r2: 192.32.32.0/255.255.255.0 -> 10.10.10.1 TMM Pool Routes: (Name: Net/Mask -> Gateway Pool) TMM Vlan Routes: (Name: Net/Mask -> Gateway Vlan) TMM Rejected Routes: (Name: Net/Mask) Now, I'll add some routes (same as shown above in the formats section.) C:\>python getRoutes.py -s 192.168.6.5 -u admin -a routes admin, enter your Password: TMM IP Routes: (Name: Net/Mask -> Gateway IP) /Common/r.default: 0.0.0.0/0.0.0.0 -> 10.10.10.1 /Common/r1: 65.23.5.88/255.255.255.248 -> 10.10.10.1 /Common/r7: 172.16.3.0/255.255.255.0 -> 10.10.10.1 /Common/r8: 172.16.4.0/255.255.255.0 -> 10.10.10.1 /Common/r2: 192.32.32.0/255.255.255.0 -> 10.10.10.1 TMM Pool Routes: (Name: Net/Mask -> Gateway Pool) /Common/r5: 172.16.1.0/255.255.255.0 -> /Common/testpool TMM Vlan Routes: (Name: Net/Mask -> Gateway Vlan) /Common/r6: 172.16.2.0/255.255.255.0 -> /Common/vmnet3 TMM Rejected Routes: (Name: Net/Mask) You can see that routes r5-r8 were added to the system. Now, I'll delete them. C:\>python getRoutes.py -s 192.168.6.5 -u admin -d routedel admin, enter your Password: Deleting Route 172.16.1.0/255.255.255.0 Deleting Route 172.16.2.0/255.255.255.0 Deleting Route 172.16.3.0/255.255.255.0 Deleting Route 172.16.4.0/255.255.255.0 TMM IP Routes: (Name: Net/Mask -> Gateway IP) /Common/r.default: 0.0.0.0/0.0.0.0 -> 10.10.10.1 /Common/r1: 65.23.5.88/255.255.255.248 -> 10.10.10.1 /Common/r2: 192.32.32.0/255.255.255.0 -> 10.10.10.1 TMM Pool Routes: (Name: Net/Mask -> Gateway Pool) TMM Vlan Routes: (Name: Net/Mask -> Gateway Vlan) TMM Rejected Routes: (Name: Net/Mask) Conclusion Hopefully this was a useful exercise in converting pycontrol code to bigsuds. Looks like i have my work cut out for me in converting the rest of the codeshare! This example in full is in the iControl codeshare.294Views0likes0CommentsF5 Friday: Creating a DNS Blackhole. On Purpose
#infosec #DNS #v11 DNS is like your mom, remember? Sometimes she knows better. Generally speaking, blackhole routing is a problem, not a solution. A route to nowhere is not exactly a good thing, after all. But in some cases it’s an approved and even recommended solution, usually implemented as a means to filter out bad packets at the routing level that might be malformed or are otherwise dangerous to pass around inside the data center. This technique is also used at the DNS layer as a means to prevent responding to queries with known infected or otherwise malicious sites. Generally speaking, DNS does nothing more than act like a phone book; you ask for an address, it gives it to you. That may have been acceptable through the last decade, but it is increasingly undesirable as it often unwittingly serves as part of the distribution network for malware and other malicious intent. In networking, black holes refer to places in the network where incoming traffic is silently discarded (or "dropped"), without informing the source that the data did not reach its intended recipient. When examining the topology of the network, the black holes themselves are invisible, and can only be detected by monitoring the lost traffic; hence the name. (http://en.wikipedia.org/wiki/Black_hole_(networking)) What we’d like to do is prevent DNS servers from returning addresses for sites which we know – or are at least pretty darn sure – are infected. While we can’t provide such safeguards for everyone (unless you’re the authoritative server for such sites) we can at least better protect the corporate network and users from such sites by ensuring such queries are not answered with the infected addresses. Such a solution requires the implementation of a DNS blackhole – a filtering of queries at the DNS level. This can be done using F5 iRules to inspect queries against a list of known bad sites and returning an internal address for those that match. What’s cool about using iRules to perform this function is the ability to leverage external lookups to perform the inspection. Sideband connections were introduced in BIG-IP v11 and these connections allow external, i.e. off device, lookups for solutions like this. Such a solution is similar to the way in which you’d want to look up the IP address and/or domain of the sender during an e-mail exchange, to validate the sender is not on the “bad spammer” lists maintained by a variety of organizations and offered as a service. Jason Rahm recently detailed this solution as architected by Hugh O’Donnel, complete with iRules, in a DevCentral Tech Tip. You can find a more comprehensive description of the solution as well as the iRules to implement in the tech tip. v11.1: DNS Blackhole with iRules Happy (DNS) Routing! F5 Friday: No DNS? No … Anything. BIG-IP v11 Information High-Performance DNS Services in BIG-IP Version 11 DNS is Like Your Mom F5 Friday: Multi-Layer Security for Multi-Layer Attacks The Many Faces of DDoS: Variations on a Theme or Two High-Performance DNS Services in BIG-IP Version 11361Views0likes0CommentsWILS: Content (Application) Switching is like VLANs for HTTP
We focus a lot on encouraging developers to get more “ops” oriented, but seem to have forgotten networking pros also need to get more “apps” oriented. Most networking professionals know their relevant protocols, the ones they work with day in and day out, that many of them are able to read a live packet capture without requiring a protocol translation to “plain English”. These folks can narrow down a packet as having come from a specific component from its ARP address because they’ve spent a lot of time analyzing and troubleshooting network issues. And while these same pros understanding load balancing from a traffic routing decision making point of view because in many ways it is similar to trunking and link aggregation (LAG) – teaming and bonding – things get a bit less clear as we move up the stack. Sure, TCP (layer 4) load balancing makes sense, it’s port and IP based and there’s plenty of ways in which networking protocols can be manipulated and routed based on a combination of the two. But let’s move up to HTTP and Layer 7 load balancing, beyond the simple traffic in –> traffic out decision making that’s associated with simple load balancing algorithms like round robin or its cousins least connections and fastest response time. Content – or application - switching is the use of application protocols or data in making a load balancing (application routing) decision. Instead of letting an algorithm decide which pool of servers will service a request, the decision is made by inspecting the HTTP headers and data in the exchange. The simplest, and most common case, involves using the URI as the basis for a sharding-style scalability domain in which content is sorted out at the load balancing device and directed to appropriate pools of compute resources. CONTENT SWITCHING = VLANs for HTTP Examining a simple diagram, it’s a fairly trivial configuration and architecture that requires only that the URIs upon which decisions will be made are known and simplified to a common factor. You wouldn’t want to specify every single possible URI in the configuration, that would be like configuring static routing tables for every IP address in your network. Ugly – and not of the Shrek ugly kind, but the “made for SyFy" horror-flick kind, ugly and painful. Networking pros would likely never architect a solution that requires that level of routing granularity as it would negatively impact performance as well as make any changes behind the switch horribly disruptive. Instead, they’d likely leverage VLAN and VLAN routing, instead, to provide the kind of separation of traffic necessary to implement the desired network architecture. When packets arrive at the switch in question, it has (may have) a VLAN tag. The switch intercepts the packet, inspects it, and upon finding the VLAN tag routes the packet out the appropriate egress port to the next hop. In this way, traffic and users and applications can be segregated, bandwidth utilization more evenly distributed across a network, and routing tables simplified because they can be based on VLAN ID rather than individual IP addresses, making adds and removals non-disruptive from a network configuration viewpoint. The use of VLAN tagging enables network virtualization in much the same way server virtualization is used: to divvy up physical resources into discrete, virtual packages that can be constrained and more easily managed. Content switching moves into the realm of application virtualization, in which an application is divvied up and distributed across resources as a means to achieve higher efficiency and better performance. Content (application or layer 7) switching utilizes the same concepts: an HTTP request arrives, the load balancing service intercepts it, inspects the HTTP header (instead of the IP headers) for the URI “tag”, and then routes the request to the appropriate pool (next hop) of resources. Basically, if you treat content switching as though it were VLANs for HTTP, with the “tag” being the HTTP header URI, you’d be right on the money. WILS: Write It Like Seth. Seth Godin always gets his point across with brevity and wit. WILS is an ATTEMPT TO BE concise about application delivery TOPICS AND just get straight to the point. NO DILLY DALLYING AROUND. WILS: Layer 7 (Protocol) versus Layer 7 (Application) What is Network-based Application Virtualization and Why Do You Need It? WILS: Three Ways To Better Utilize Resources In Any Data Center WILS: The Concise Guide to *-Load Balancing WILS: Network Load Balancing versus Application Load Balancing Infrastructure Scalability Pattern: Sharding Sessions Applying Scalability Patterns to Infrastructure Architecture Infrastructure Scalability Pattern: Partition by Function or Type207Views0likes0CommentsF5 Friday: ‘IPv4 and IPv6 Can Coexist’ or ‘How to eat your cake and have it too’
Migration is not going to happen overnight and it’s going to require simultaneous support for both IPv4 and IPv6 until both sides of the equation are ready. Making the switch from IPv4 to IPv6 is not a task anyone with any significant investment in infrastructure wants to undertake. The reliance on IP addresses of infrastructure to control, secure, route, and track everything from simple network housekeeping to complying with complex governmental regulations makes it difficult to simply “flick a switch” and move from the old form of addressing (IPv4) to the new (IPv6). This reliance is spread up and down the network stack, and spans not only infrastructure but the very processes that keep data centers running smoothly. Firewall rules, ACLs, scripts that automate mundane tasks, routing from layer 2 to layer 7, and application architecture are likely to communicate using IPv4 addresses. Clients, too, may not be ready depending on their age and operating system, which makes a simple “cut over” strategy impossible or, at best, fraught with the potential for techncial support and business challenges. IT’S NOT JUST SIZE THAT MATTERS The differences between IPv4 and IPv6 in addressing are probably the most visible and oft referenced change, as it is the length of the IPv6 address that dramatically expands the available pool of IP addresses and thus is of the most interest. IPv4 IP addresses are 32-bits long while IPv6 addresses are 128-bits long. But IPv6 addresses can (and do) interoperate with IPv4 addresses, through a variety of methods that allow IPv6 to carry along IPv4 addresses. This is achieved through the use of IPv4 mapped IPV6 addresses and IPv4 compatible IPv6 addresses. This allows IPv4 addresses to be represented in IPv6 addresses.194Views0likes2CommentsWILS: Client IP or Not Client IP, SNAT is the Question
Ever wonder why requests coming through proxy-based solutions, particularly load balancers, end up with an IP address other than the real client? It’s not just a network administrator having fun at your expense. SNAT is the question – and the answer. SNAT is the common abbreviation for Secure NAT, so-called because the configured address will not accept inbound connections and is, therefore, supposed to be secure. It is also sometimes (more accurately in the opinion of many) referred to as Source NAT, however, because it acts on source IP address instead of the destination IP address as is the case for NAT usage. In load balancing scenarios SNAT is used to change the source IP of incoming requests to that of the Load balancer. Now you’re probably thinking this is the reason we end up having to jump through hoops like X-FORWARDED-FOR to get the real client IP address and you’d be right. But the use of SNAT for this purpose isn’t intentionally malevolent. Really. In most cases it’s used to force the return path for responses through the load balancer, which is important when network routing from the server (virtual or physical) to the client would bypass the load balancer. This is often true because servers need a way to access the Internet for various reasons including automated updates and when the application hosted on the server needs to call out to a third-party application, such as integrating with a Web 2.0 site via an API call. In these situations it is desirable for the server to bypass the load balancer because the traffic is initiated by the server, and is not usually being managed by the load balancer. In the case of a request coming from a client the response needs to return through the load balancer because incoming requests are usually destination NAT’d in most load balancing configurations, so the traffic has to traverse the same path, in reverse, in order to undo that translation and ensure the response is delivered to the client. Most land balancing solutions offer the ability to specify, on a per-IP address basis, the SNAT mappings as well as providing an “auto map” feature which uses the IP addresses assigned to load balancer (often called “self-ip” addresses) to perform the SNAT mappings. Advanced load balancers have additional methods of assigning SNAT mappings including assigning a “pool” of addresses to a virtual (network) server to be used automatically as well as intelligent SNAT capabilities that allow the use of network-side scripting to manipulate on a case-by-case basis the SNAT mappings. Most configurations can comfortably use the auto map feature to manage SNAT, by far the least complex of the available configurations. WILS: Write It Like Seth. Seth Godin always gets his point across with brevity and wit. WILS is an ATTEMPT TO BE concise about application delivery TOPICS AND just get straight to the point. NO DILLY DALLYING AROUND. Using "X-Forwarded-For" in Apache or PHP SNAT Translation Overflow Working around client-side limitations on custom HTTP headers WILS: Why Does Load Balancing Improve Application Performance? WILS: The Concise Guide to *-Load Balancing WILS: Network Load Balancing versus Application Load Balancing All WILS Topics on DevCentral If Load Balancers Are Dead Why Do We Keep Talking About Them?477Views0likes2CommentsDire (Load Balancing) Straits: I Want My Client IP
load balancing fu for developers to avoid losing what is vital business data I want my client IP Now read the manuals that's the way you do it Give the IP to the SLB That ain't workin' that’s the way to do it Set the gateway on the server to your SLB (many apologies to Dire Straits) My brother called me last week with a load balancing emergency. See, for most retailers the “big day” of the year is the Friday after Thanksgiving. In Wisconsin, particularly for retailers that focus on water sports, the “big day” of the year is Memorial Day because it’s the first long weekend of the summer. My brother was up late trying to help one of those retailers prepare for the weekend and the inevitable spike in traffic to the websites of people looking for boats and water-skis and other water-related sports “things”. They’d just installed a pair of load balancers and though they worked just fine they had a huge problem on their hands: the application wasn’t getting the client IP but instead was logging the IP address of the load balancers. This is a common problem that crops up the first time an application is horizontally scaled out using a Load balancer. The business (sales and marketing) analysts need the IP address of the client in order to properly slice and dice data collected. The application must log the correct IP address or the data is virtually useless to the business. There are a couple of solutions to this problem; which one you choose will depend on the human resources you have available and whether or not you can change your network architecture. The latter would be particularly challenging for applications deployed in a cloud computing environment that are taking advantage of a load balancing solution deployed as a virtual network appliance or “softADC” along with the applications, because you only have control over the configuration of your virtual images and the virtual network appliance-hosted load balancer. THE PROBLEM The problem is that often times load balancers in small-medium sized datacenters and cloud computing environments are deployed in a flat network architecture, with the load balancer essentially in a one-armed (or side-armed) configuration. This means the load balancer needs only one interface, and there’s no complex network routing necessary because the load balancer and the servers are on the same network. It’s also often used as a transparent option for a new load balancing implementation when applications are already live and in use and disrupting service is not an option. But herein lies the source of the problem: because the load balancer and the servers are on the same network the load balancer must pretend to be the client in order to force the responses back through the load balancer (This is also true of larger and more complex configurations in which the load balancer acts as a full proxy). What happens generally is the load balancer replaces the client IP address in the network layer with its own IP address to force responses to come back to it, then rewrites the Ethernet header on the way back out. When the web server logs the transaction, the client IP address is the IP address of the load balancer, and that’s what gets written into the logs. This is also the case in larger deployments when the load balancer is a full proxy; the requests appear to come from the load balancer in order to apply the appropriate optimization and security policies on the responses, e.g. content scrubbing and TCP multiplexing. The load balancer/ADC cannot provide any kind of TCP connection optimization (such as is used to increase VM density) and application acceleration capabilities in such a mode because it never sees the responses. The result is a DSR (Direct Server Return) mode of operation, meaning that while requests traverse the load balancer, the responses don’t. This is the situation my brother found himself in (on his birthday, no less, what a dedicated guy) when he called me. It was Friday afternoon of Memorial Day weekend and the site absolutely had to be up and logging client IP addresses accurately or the client – a retailer of water-related goods - was going to be very, very, very angry. THE SOLUTION(s) You might be thinking the obvious solution to this problem is to turn on the spoofing option on the load balancer. It is, but it isn’t. Or at least it isn’t without a configuration change to the servers. See, the real problem in these scenarios is that the servers are still configured with a default gateway other than the load balancer. Enabling spoofing, i.e. the ability of the load balancer to pretend to be the client IP address, in this situation would have resulted in the web server’s responses still being routed around the load balancer because the servers, having no static route to the client IP, would automatically send them to the default gateway (the router). The solution is to change the network configuration of the web servers to use a default gateway that is the load balancer’s IP address, thus insuring that responses are routed back through the load balancer regardless of the client IP. Once that’s accomplished then you can enable SNAT and/or spoofing or “preserve client IP” or whatever the vendor refers to the functionality as and the web server logs will show the actual IP address of the client instead of the load balancer. Another option is to enable the load balancer to insert the “X-Forwarded-For” custom HTTP header. Most modern load balancers offer this as a checkbox configuration feature. Every HTTP request is modified by the load balancer to include the custom HTTP header with a value that is the client’s true IP address. It is then the application’s responsibility to extract that value instead of the default for logging and per-client personalization/authorization. The extent of the modification of the application depends on how reliant on the client IP address is. If it’s only for logging then a web/app server configuration change is likely the only change necessary. If the application makes use of that IP to apply policies or make application-specific decisions, then modification to the application itself will be necessary. THIS is WHY DEVOPS is NECESSARY As more developers take to the cloud as a deployment option for applications this very basic – but very frustrating – scenario is likely to occur more often than not as developers become more familiar with load balancers and deploy them in an IaaS environment. IaaS providers offering “auto-scaling” or “load balancing” services as a capability will have taken this problem into consideration when they architected their underlying infrastructure and thus the customer is relieved of responsibility for the nitty-gritty details. Customers deploying a separate virtualized load balancing solution – to take advantage of more robust load balancing algorithms and application delivery features like network-side scripting – will need to be aware of the impact of network configuration on the application’s ability to “see” the client IP address. Even if the customer is not concerned about the ability to extract the real client IP address it is still important to understand the impact to the return data path as well as the loss of functionality (optimization, security, acceleration) from routing around the load balancing solution on the application response. This particular knowledge falls into the responsibility demesne of what folks like James Urquhart and Damon Edwards and Shlomo Swidler (among many others) calls “devops”; developers whose skill set and knowledge expands into operations in order to successfully take advantage of cloud computing. Developers must be involved with and understand the implications of what has before been primarily an operational concern because it impacts their application and potentially impacts the ability of their applications to function properly. Development and operations must collaborate on just such concerns because it impacts the business and its ability to succeed in its functions, such as understanding the demographics of its customer base by applying GeoLocation technologies to client IP addresses. Without the real client IP address such demographic data mining fails and the business cannot make the adjustments necessary to its products/services. Business, operations, and development must collaborate to understand the requirements and then implement them in the most operationally efficient manner possible. Devops promises to enable that collaboration by tearing down the wall that has long existed between developers and operations.570Views0likes0CommentsWindows Vista Performance Issue Illustrates Importance of Context
Decisions about routing at every layer require context A friend forwarded a blog post to me last week mainly because it contained a reference to F5, but upon reading it (a couple of times) I realized that this particular post contained some very interesting information that needed to be examined further. The details of the problems being experienced by the poster (which revolve around a globally load-balanced site that was for some reason not being distributed very equally) point to an interesting conundrum: just how much control over site decisions should a client have? Given the scenario described, and the conclusion that it is primarily the result of an over-eager client implementation in Windows Vista of a fairly obscure DNS-focused RFC, the answer to how much control a client should have over site decisions seems obvious: none. The problem (which you can read about in its full detail here) described is that Microsoft Vista, when presented with multiple A records from a DNS query, will select an address “which shares the most prefix bits with the source address is selected, presumably on the basis that it's in some sense "closer" in the network.” This is not a bad thing. This implementation was obviously intended to aid in the choice of a site closer to the user, which is one of the many ways in which application network architects attempt to improve end-user performance: reducing the impact of physical distance on the transfer of application data. The problem is, however, that despite the best intentions of those who designed IP, it is not guaranteed that having an IP address that is numerically close to yours means the site is physically close to you. Which kind of defeats the purpose of implementing the RFC in the first place. Now neither solution (choosing random addresses versus one potentially physically closer) is optimal primarily because neither option assures the client that the chosen site is actually (a) available and (b) physically closer. Ostensibly the way this should work is that the DNS resolution process would return a single address (the author’s solution) based on the context in which the request was made. That means the DNS resolver needs to take into consideration the potential (in)accuracy of the physical location when derived from an IP address, the speed of the link over which the client is making the request (which presumably will not change between DNS resolution and application request) and any other information it can glean from the client. The DNS resolver needs to return the IP address of the site that at the time the request is made appears best able to serve the user’s request quickly. That means the DNS resolver (usually a global load balancer) needs to be contextually aware of not only the client but the sites as well. It needs to know (a) which sites are currently available to serve the request and (b) how well each is performing and (c) where they are physically located. That requires collaboration between the global load balancer and the local application delivery mechanisms that serve as an intermediary between the data center and the clients that interact with it. Yes, I know. A DNS request doesn’t carry information regarding which service will be accessed. A DNS lookup could be querying for an IP address for Skype, or FTP, or HTTP. Therein lies part of the problem, doesn’t it? DNS is a fairly old, in technical terms, standard. It is service agnostic and unlikely to change. But providing even basic context would help – if the DNS resolver knows a site is unreachable, likely due to routing outages, then it shouldn’t return that IP address to the client if another is available. Given the ability to do so, a DNS resolution solution could infer service based on host name – as long as the site were architected in such a way as to remain consistent with such conventions. For example, ensuring that www.example.com is used only for HTTP, and ftp.example.com is only used for FTP would enable many DNS resolvers to make better decisions. Host-based service mappings, inferred or codified, would aid in adding the context necessary to make better decisions regarding which IP address is returned during a DNS lookup – without changing a core standard and potentially breaking teh Internets. The problem with giving the client control over which site it accesses when trying to use an application is that it lacks the context necessary to make an intelligent decision. It doesn’t know whether a site is up or down or whether it is performing well or whether it is near or at capacity. It doesn’t know where the site is physically located and it certainly can’t ascertain the performance of those sites because it doesn’t even know where they are yet, that’s why it’s performing a DNS lookup. A well-performing infrastructure is important to the success of any web-based initiative, whether that’s cloud-based applications or locally hosted web sites. Part of a well-performing infrastructure is having the ability to route requests intelligently, based on the context in which those requests are made. Simply returning IP addresses – and choosing which one to use – in a vacuum based on little or no information about the state of those sites is asking for poor performance and availability problems. Context is critical.223Views0likes1Comment