disaster recovery

36 Topics

Multiple Certs, One VIP: TLS Server Name Indication via iRules
An age old question that we’ve seen time and time again in the iRules forums here on DevCentral is “How can I use iRules to manage multiple SSL certs on one VIP"?”. The answer has always historically been “I’m sorry, you can’t.”. The reasoning is sound. One VIP, one cert, that’s how it’s always been. You can’t do anything with the connection until the handshake is established and decryption is done on the LTM. We’d like to help, but we just really can’t. That is…until now. The TLS protocol has somewhat recently provided the ability to pass a “desired servername” as a value in the originating SSL handshake. Finally we have what we’ve been looking for, a way to add contextual server info during the handshake, thereby allowing us to say “cert x is for domain x” and “cert y is for domain y”. Known to us mortals as "Server Name Indication" or SNI (hence the title), this functionality is paramount for a device like the LTM that can regularly benefit from hosting multiple certs on a single IP. We should be able to pull out this information and choose an appropriate SSL profile now, with a cert that corresponds to the servername value that was sent. Now all we need is some logic to make this happen. Lucky for us, one of the many bright minds in the DevCentral community has whipped up an iRule to show how you can finally tackle this challenge head on. Because Joel Moses, the shrewd mind and DevCentral MVP behind this example has already done a solid write up I’ll quote liberally from his fine work and add some additional context where fitting. Now on to the geekery: First things first, you’ll need to create a mapping of which servernames correlate to which certs (client SSL profiles in LTM’s case). This could be done in any manner, really, but the most efficient both from a resource and management perspective is to use a class. Classes, also known as DataGroups, are name->value pairs that will allow you to easily retrieve the data later in the iRule. Quoting Joel: Create a string-type datagroup to be called "tls_servername". Each hostname that needs to be supported on the VIP must be input along with its matching clientssl profile. For example, for the site "testsite.site.com" with a ClientSSL profile named "clientssl_testsite", you should add the following values to the datagroup. String: testsite.site.com Value: clientssl_testsite Once you’ve finished inputting the different server->profile pairs, you’re ready to move on to pools. It’s very likely that since you’re now managing multiple domains on this VIP you'll also want to be able to handle multiple pools to match those domains. To do that you'll need a second mapping that ties each servername to the desired pool. This could again be done in any format you like, but since it's the most efficient option and we're already using it, classes make the most sense here. Quoting from Joel: If you wish to switch pool context at the time the servername is detected in TLS, then you need to create a string-type datagroup called "tls_servername_pool". You will input each hostname to be supported by the VIP and the pool to direct the traffic towards. For the site "testsite.site.com" to be directed to the pool "testsite_pool_80", add the following to the datagroup: String: testsite.site.com Value: testsite_pool_80 If you don't, that's fine, but realize all traffic from each of these hosts will be routed to the default pool, which is very likely not what you want. Now then, we have two classes set up to manage the mappings of servername->SSLprofile and servername->pool, all we need is some app logic in line to do the management and provide each inbound request with the appropriate profile & cert. This is done, of course, via iRules. Joel has written up one heck of an iRule which is available in the codeshare (here) in it's entirety along with his solid write-up, but I'll also include it here in-line, as is my habit. Effectively what's happening is the iRule is parsing through the data sent throughout the SSL handshake process and searching for the specific TLS servername extension, which are the bits that will allow us to do the profile switching magic. He's written it up to fall back to the default client SSL profile and pool, so it's very important that both of these things exist on your VIP, or you may likely find yourself with unhappy users. One last caveat before the code: Not all browsers support Server Name Indication, so be careful not to implement this unless you are very confident that most, if not all, users connecting to this VIP will support SNI. For more info on testing for SNI compatibility and a list of browsers that do and don't support it, click through to Joel's awesome CodeShare entry, I've already plagiarized enough. So finally, the code. Again, my hat is off to Joel Moses for this outstanding example of the power of iRules. Keep at it Joel, and thanks for sharing! 1: when CLIENT_ACCEPTED { 2: if { [PROFILE::exists clientssl] } { 3: 4: # We have a clientssl profile attached to this VIP but we need 5: # to find an SNI record in the client handshake. To do so, we'll 6: # disable SSL processing and collect the initial TCP payload. 7: 8: set default_tls_pool [LB::server pool] 9: set detect_handshake 1 10: SSL::disable 11: TCP::collect 12: 13: } else { 14: 15: # No clientssl profile means we're not going to work. 16: 17: log local0. "This iRule is applied to a VS that has no clientssl profile." 18: set detect_handshake 0 19: 20: } 21: 22: } 23: 24: when CLIENT_DATA { 25: 26: if { ($detect_handshake) } { 27: 28: # If we're in a handshake detection, look for an SSL/TLS header. 29: 30: binary scan [TCP::payload] cSS tls_xacttype tls_version tls_recordlen 31: 32: # TLS is the only thing we want to process because it's the only 33: # version that allows the servername extension to be present. When we 34: # find a supported TLS version, we'll check to make sure we're getting 35: # only a Client Hello transaction -- those are the only ones we can pull 36: # the servername from prior to connection establishment. 37: 38: switch $tls_version { 39: "769" - 40: "770" - 41: "771" { 42: if { ($tls_xacttype == 22) } { 43: binary scan [TCP::payload] @5c tls_action 44: if { not (($tls_action == 1) && ([TCP::payload length] > $tls_recordlen)) } { 45: set detect_handshake 0 46: } 47: } 48: } 49: default { 50: set detect_handshake 0 51: } 52: } 53: 54: if { ($detect_handshake) } { 55: 56: # If we made it this far, we're still processing a TLS client hello. 57: # 58: # Skip the TLS header (43 bytes in) and process the record body. For TLS/1.0 we 59: # expect this to contain only the session ID, cipher list, and compression 60: # list. All but the cipher list will be null since we're handling a new transaction 61: # (client hello) here. We have to determine how far out to parse the initial record 62: # so we can find the TLS extensions if they exist. 63: 64: set record_offset 43 65: binary scan [TCP::payload] @${record_offset}c tls_sessidlen 66: set record_offset [expr {$record_offset + 1 + $tls_sessidlen}] 67: binary scan [TCP::payload] @${record_offset}S tls_ciphlen 68: set record_offset [expr {$record_offset + 2 + $tls_ciphlen}] 69: binary scan [TCP::payload] @${record_offset}c tls_complen 70: set record_offset [expr {$record_offset + 1 + $tls_complen}] 71: 72: # If we're in TLS and we've not parsed all the payload in the record 73: # at this point, then we have TLS extensions to process. We will detect 74: # the TLS extension package and parse each record individually. 75: 76: if { ([TCP::payload length] >= $record_offset) } { 77: binary scan [TCP::payload] @${record_offset}S tls_extenlen 78: set record_offset [expr {$record_offset + 2}] 79: binary scan [TCP::payload] @${record_offset}a* tls_extensions 80: 81: # Loop through the TLS extension data looking for a type 00 extension 82: # record. This is the IANA code for server_name in the TLS transaction. 83: 84: for { set x 0 } { $x < $tls_extenlen } { incr x 4 } { 85: set start [expr {$x}] 86: binary scan $tls_extensions @${start}SS etype elen 87: if { ($etype == "00") } { 88: 89: # A servername record is present. Pull this value out of the packet data 90: # and save it for later use. We start 9 bytes into the record to bypass 91: # type, length, and SNI encoding header (which is itself 5 bytes long), and 92: # capture the servername text (minus the header). 93: 94: set grabstart [expr {$start + 9}] 95: set grabend [expr {$elen - 5}] 96: binary scan $tls_extensions @${grabstart}A${grabend} tls_servername 97: set start [expr {$start + $elen}] 98: } else { 99: 100: # Bypass all other TLS extensions. 101: 102: set start [expr {$start + $elen}] 103: } 104: set x $start 105: } 106: 107: # Check to see whether we got a servername indication from TLS. If so, 108: # make the appropriate changes. 109: 110: if { ([info exists tls_servername] ) } { 111: 112: # Look for a matching servername in the Data Group and pool. 113: 114: set ssl_profile [class match -value [string tolower $tls_servername] equals tls_servername] 115: set tls_pool [class match -value [string tolower $tls_servername] equals tls_servername_pool] 116: 117: if { $ssl_profile == "" } { 118: 119: # No match, so we allow this to fall through to the "default" 120: # clientssl profile. 121: 122: SSL::enable 123: } else { 124: 125: # A match was found in the Data Group, so we will change the SSL 126: # profile to the one we found. Hide this activity from the iRules 127: # parser. 128: 129: set ssl_profile_enable "SSL::profile $ssl_profile" 130: catch { eval $ssl_profile_enable } 131: if { not ($tls_pool == "") } { 132: pool $tls_pool 133: } else { 134: pool $default_tls_pool 135: } 136: SSL::enable 137: } 138: } else { 139: 140: # No match because no SNI field was present. Fall through to the 141: # "default" SSL profile. 142: 143: SSL::enable 144: } 145: 146: } else { 147: 148: # We're not in a handshake. Keep on using the currently set SSL profile 149: # for this transaction. 150: 151: SSL::enable 152: } 153: 154: # Hold down any further processing and release the TCP session further 155: # down the event loop. 156: 157: set detect_handshake 0 158: TCP::release 159: } else { 160: 161: # We've not been able to match an SNI field to an SSL profile. We will 162: # fall back to the "default" SSL profile selected (this might lead to 163: # certificate validation errors on non SNI-capable browsers. 164: 165: set detect_handshake 0 166: SSL::enable 167: TCP::release 168: 169: } 170: } 171: }
Colin_Walker_12
Aug 03, 2021 Place Technical Articles
3.8KViews
0likes
18Comments
Virtual server address space on disaster recovery F5 instance
I am working on setting up a disaster recovery instance of an existing HA BIGIP pair. I would like to ConfigSync (sync only) my local devices to the new device which is located in a different data center. The issue is that if/when I needed to "fail over" (manually) to the new disaster recovery device, the IP space of the virtual servers are different. So for example if my virtual server "A" has a destination address of 192.168.1.10 in my existing data center, the new destination address might be 10.1.1.10. (Note: I am less worried about pool member IP space, because I can use priority groups and have the disaster recovery pool member IPs pre-configured but disabled). My first thought was a "DR go live" script which would search & replace the config file and reload it, but is there a more "elegant" way to handle this dilemma, without having them have the same IP space?
patonbike
Sep 24, 2020 Place Technical Forum
557Views
1like
1Comment
Multi-Site Redundancy
We have an active site and a disaster recovery site and have recently incorporated NSX-T which stretches the L2/L3 domain to both sites. My question is in a DR test our vip can have the same ip address in our active site then it does in our DR site, how can this be accomplished? ive read up on here and disabling ARP in the WPO Virtual Server may be a solution is there another way to accomplish this without using F5 DNS?
Thomson_Thomas
Jul 01, 2020 Place Technical Forum
378Views
0likes
0Comments
Deploying BIG-IP VE in VMware vCloud Director
Beginning with BIG-IP version 11.2, you may have noticed a new package in the Virtual Edition downloads folder for vCloud Director 1.5. VMware’s vCloud Director is a software solution enabling enterprises to build multi-tenant private clouds. Each virtual datacenter has its own resource set of cpu, memory, and disk that the vDC owner can allocate as necessary. F5 DevCentral is now running in these virtual datacenter configurations (as announced June 13th, 2012), with full BIG-IP VE infrastructure in place. This article will describe the deployment process to get BIG-IP VE installed and running in the vCloud Director environment. Uploading the vCloud Image The upload process is fairly simple, but it does take a while. First, after logging in to the vCloud interface, click catalogs, then select your private catalog. Once in the private catalog, click the upload button highlighted below. This will launch a pop up. Make sure the vCloud zip file has been extracted. When the .ovf is selected in this screen, it will grab that as well as the disk file after clicking upload. Now get a cup of coffee. Or a lot of them, this takes a while. Deploying the BIG-IP VE OVF Template Now that the image is in place, click on my cloud at the top navigation, select vApps, then select the plus sign, which will create a new vApp. (Or, the BIG-IP can be deployed into an existing vApp as well.) Select the BIG-IP VE template (bigip11_2 in the screenshot below) and click next. Give the vApp a name and click next. Accept the F5 EULA and click next. At this point, give the VM a full name and a computer name and click finish. I checked the network adapter box to show the network adapter type. It is not configurable at this point, and the flexible NIC is not the right one. After clicking finish, the system will create the vApp and build the VM, so maybe it’s time for another cup of coffee. Once the build is complete, click into the vapp_test vApp. Right-click on the testbigip-11-2 VM and select properties. Do NOT power on the VM yet! CPU and memory should not be altered. More CPU won’t help TMM, there is no CMP yet in the virtual edition and one extra CPU for system stuff is sufficient. TMM can’t schedule more than 4G of RAM either. Click the “Show network adapter type” and again you’ll notice the NICs are not correct. Delete all the network interfaces, then re-add one at a time as many (up to 10 in vCloud Director) NICs as is necessary for your infrastructure. To add a NIC, just click the add button and then select the network dropdown and select Add Network. At this point, you’ll need to already have a plan for your networking infrastructure. Organizational networks are usable in and between all vApps, whereas vApp networks are isolated to just that instance. I’ll show organizational network configuration in this article. Click Organization network and then click next. Select the appropriate network and click next. I’ve selected the Management network. For the management NIC I’ll leave the adapter type as E1000. The IP Mode is useful for systems where guest customization is enabled, but is still a required setting. I set it to Static-Manual and enter the self IP addresses assigned to those interfaces. This step is still required within the F5, it will not auto-configure the vlans and self IPs for you. For the remaining NICs that you add, make sure to set the adapter type to VMXNET 3. Then click OK to apply the new NIC configurations. *Note that adding more than 5 NICs in VE might cause the interfaces to re-order internally. If this happens, you’ll need to map the mac address in vCloud to the mac addresses reported in tmsh and adjust your vlans accordingly. Powering Up! After the configuration is updated, right-click on the testbigip-11-2 VM and select power on. After the VM powers on, BIG-IP VE will boot. Login with root/default credentials and type config at the prompt to set the management ip and netmask. Select No on auto-configuration Set the IP address. Then set the netmask. I selected no on the default route, but it might be necessary depending on the infrastructure you have in place. Finally, accept the settings. At this point, the system should be available on the management network. I have a linux box on that network as well so I can ssh into the BIG-IP VE to perform the licensing steps as the vCloud Director console does not support copy/paste.
JRahm
Nov 09, 2015 Place Technical Articles
303Views
0likes
2Comments
DR Site Failover with GTM
Hi, We are a company and we have one data center. We want to make another data center which will act as a DR site for our current data center. Can anybody please share some information on how GTM can help us in site failover when one site goes down and other becomes active. How will user traffic from the internet will be processed ? Any help will really be appreciated. Thanks.
F5-User_203510
Sep 20, 2015 Place Technical Forum
354Views
0likes
1Comment
That Other Single Point of Failure
When you’re a kid at the beach, you spend a lot of time and effort building a sand castle. It’s cool, a lot of fun, and doomed to destruction. When high tide, or random kids, or hot sun come along, the castle is going to fall apart. It doesn’t matter, kids build them every year by the thousands, probably by the millions across the globe. Each is special and unique, each took time and effort, and each will fall apart. The thing is, they’re all over the globe, and seasons are different all over the globe, so it is conceivable that there is a sand castle built or being built every minute of every day. Not easily provable, but doesn’t need to be for this discussion. when it is night and middle of winter in the northern reaches of North America, it is summer and daytime in Australia. The opportunity for continuation of sand castles is amazing. Unless you’re in publishing or high-tech, it is likely that our entire organization is a single point of failure. Distributed applications make sense so that you can minimize risk and maximize uptime, right? The cloud is often billed as more resistant to downtime precisely because it is distributed. And your organization? Is it distributed? Really, spread out so that it can’t be impacted by something like Sandy? There are a good number of organizations that are nearly 100% off-line right now because there is no power in the Northeast. That was not a possibility, it was an inevitability. Power outages happen, and they sometimes happen on a grand scale (remember the cascading midwest/northeast/Canada outage a couple years back – that was not natural disaster, it was design and operator error). And yet, even companies with a presence in the cloud clustered their employees in one geographic area. There is a tendency amongst some to want face-to-face meetings, assuming those are more productive, which leads to desiring everyone be on-site. With increasing globalization, and meetings held around the world - long before I became a remote worker, I held meetings with staff in Africa, Russia, and California, all on the same (very long) day, and all from my home in Green Bay – one would think this tendency would be minimized, but it does not seem to be. The result is predictable. I once worked as a Strategic Architect for a life insurance company. They had a complete replica of the datacenter in a different geographic region, on the grounds that a disaster so horrible as to take out the datacenter would be exactly the scenario in which that backup would be needed. But guess where the staff was? Yeah, at the primary. The systems would have been running fine, but the IT knowledge, business knowledge, and claims adjustment would all have been in the middle of a disaster. Don’t make that mistake. Today, most organizations with multiple datacenters have DR plans that cover shifting all the load away from one of them should there be a problem, but those organizations with a single datacenter don’t have that leisure, and neither of them necessarily have a plan for continuation of actual work. Consider your options, consider how you will get actual business up to speed as quickly as possible. Losing their jobs because the business was not viable for weeks is not a great plan for helping people recover from disaster. Even with the cloud, there is critical corporate knowledge out there that makes your organization tick. It needs to be geographically distributed. It matter not what systems are in the cloud if all of the personnel to make them work are in the middle of a blackout zone. In short, think sand castles. If you have multiple datacenters, make certain your IT and business knowledge is split between them well enough to continue operations in a bad scenario. If you don’t have multiple locations, consider remote workers. Some people are just not cut out for telecommuting (I hate that phrase, since telecomm has little to do with the daily work, but it’s what we have), others do fine at it. Find some fine ones that have, or can be trained to have, the knowledge required to keep the organizations’ doors open. It could save the company a lot of money, and people a lot of angst. And your customers will be pleased too. The key is putting the right people and the right skills out there. Spread them across datacenters or geographies, so you’re distributed as well as your apps. And while you’re at it, broadening the pool of available talent means you can get some hires you might never have gotten if relocation was required. And all of that is a good thing. Like sand castles. Meanwhile, keep America’s northern east coast in your thoughts, that’s a lot of people in a little space without the amenities they’re accustomed to. Related Articles and Blogs: When Planning Disaster Recovery, Don't Forget the Small Stuff When The Walls Come Tumbling Down. First American Improves Performance and Mobility with F5 and ... Quick! The Data Center Just Burned Down, What Do You Do? First American: an F5 customer Fast DNS - DevCentral Wiki Let me tell you Where To Go. Migrating DevCentral to the Cloud Virtualize Absolutely Everything! Part II Deploying Viprion 2400 with ... DevCentral Top5 05/29/2012
Don_MacVittie_1
Nov 01, 2012 Place Technical Articles
177Views
0likes
0Comments
In the Cloud, It's the Little Things That Get You. Here are nine of them.
#F5 Eight things you need to consider very carefully when moving apps to the cloud. Moving to a model that utilizes the cloud is a huge proposition. You can throw some applications out there without looking back – if they have no ties to the corporate datacenter and light security requirements, for example – but most applications require quite a bit of work to make them both mobile and stable. Just connections to the database raise all sorts of questions, and most enterprise level applications require connections to DC databases. But these are all problems people are talking about. There are ways to resolve them, ugly though some may be. The problems that will get you are the ones no one is talking about. So of course, I’m happy to dive into the conversation with some things that would be keeping me awake were I still running a datacenter with a lot of interconnections and getting beat up with demands for cloudy applications. The last year has proven that cloud services WILL go down, you can’t plan like it won’t, regardless of the hype. When they do, your databases must be 100% in synch, or business will be lost. 100%. Your DNS infrastructure will need attention, possibly for the first time since you installed it. Serving up addresses from both local and cloud providers isn’t so simple. Particularly during downtimes. Security – both network and app - will have to be centralized. You can implement separate security procedures for each deployment environment, but you are only as strong as your weakest link, and your staff will have to remember which policies apply where if you go that route. Failure plans will have to be flexible. What if part of your app goes down? What if the database is down, but the web pages are fine – except for that “failed to connect to database” error? No matter what the hype says, the more places you deploy, the more likelihood that you’ll have an outage. The IT Managers’ role is to minimize that increase. After a failure, recovery plans will also need to be flexible. What if part of your app comes up before the rest? What if the database spins up, but is now out of synch with your backup or alternate database? When (not if) a security breech occurs on a cloud hosted server, how much responsibility does the cloud provider have to help you clean up? Sometimes it takes more than spinning down your server to clean up a mess, after all. If you move mission-critical data to the cloud, how are you protecting it? Contrary to the wild claims of the clouderati, your data is in a location you do not have 100% visibility into, you’re going to have to take extra steps to protect it. If you’re opening connections back to the datacenter from the cloud, how are you protecting those connections? They’re trusted server to trusted server, but “trusted” is now relative. Of course there are solutions brewing for most of these problems. Here are the ones I am aware of, I guarantee that, since I do not “read all of the Internets” each day (Lori does), I’m missing some, but it can get you started. Just include cloud in your DR plans, what will you do if service X disappears? Is the information on X available somewhere else? Can you move the app elsewhere and update DNS quickly enough? Global Server Load Balancing (GSLB) will help with this problem and others on the list – it will eliminate the DNS propagation lag at least. But beware, for many cloud vendors it is harder to do DR. Check what capabilities your provider supports. There are tools available that just don’t get their fair share of thunder, IMO – like Oracle GoldenGate – that replicate each SQL command to a remote database. These systems create a backup that exactly mirrors the original. As long as you don’t get a database modifying attack that looks valid to your security systems, these architectures and products are amazing. People generally don’t care where you host apps, as long as when they type in the URL or click on the URL, it takes them to the correct location. Global DNS and GSLB will take care of this problem for you. Get policy-based security that can be deployed anywhere, including the cloud, or less attractively (and sometimes impractically), code security into the app so the security moves with it. Application availability will have to go through another round like it did when we went distributed and then SOA. Apps will have to be developed with an eye to “is critical service X up?” where service X might well be in a completely different location from the app. If not, remedial steps will have to occur before the App can claim to be up. Or local Load Balancing can buffer you by making service X several different servers/virtuals. What goes down (hopefully) must come back up. But the same safety steps implemented in #5 will cover #6 nicely, for the most part. Database consistency checks are the big exception, do those on recovery. Negotiate this point if you can. Lots of cloud providers don’t feel the need to negotiate anything, but asking the questions will give you more information. Perhaps take your business to someone who will guarantee full cooperation in fixing your problems. If you actually move critical databases to the cloud, encrypt them. Yeah, I do know it’s expensive in processing power, but they’re outside the area you can 100% protect. So take the necessary step. Secure tunnels are your friend. Really. Don’t just open a hole in your firewall and let “trusted” servers in, because it is possible to masquerade as a trusted server. Create secure tunnels, and protect the keys. That’s it for now. The cloud has a lot of promise, but like everything else in mid hype cycle, you need to approach the soaring commentary with realistic expectations. Protect your data as if it is your personal charge, because it is. The cloud provider is not the one (or not the only one) who will be held accountable when things go awry. So use it to keep doing what you do – making your organization hum with daily business – and avoid the pitfalls where ever possible. In my next installment I’ll be trying out the new footer Lori is using, looking forward to your feedback. And yes, I did put nine in the title to test the “put an odd number list in, people love that” theory. I think y’all read my stuff because I’m hitting relatively close to the mark, but we’ll see now, won’t we?
Don_MacVittie_1
Sep 17, 2012 Place Technical Articles
206Views
0likes
0Comments
Let me tell you Where To Go.
One thing in life, whether you are using a Garmin to go to a friend’s party or planning your career, you need to know where you’re going. Failure to have a destination in mind makes it very difficult to get directions. Even when you know where you’re going, you will have a terrible time getting there if your directions are bad. Take, for example, using a GPS to navigate between when they do major road construction and when you next update your GPS device’s maps. On a road by my house, I can actually drive down the road and be told that I’m on the highway 100 feet (30 meters) distant. Because I haven’t updated my device since they built this new road, it maps to the nearest one it can find going in the same direction. It is misinformed. And, much like the accuracy of a GPS, we take DNS for granted until it goes horribly wrong. Unfortunately, with both you can be completely lost in the wild before you figure out that something is wrong. The number of ways that DNS can go wrong is limited – it is a pretty simple system – but when it does, there is no way to get where you need to go. Just like when construction dead-ends a road. Like a road not too far from my house. Notice in the attached screenshot taken from Google Maps, how the satellite data doesn’t match the road data. The roads pictured by the satellite actually intersect. The ones pictured in roadway data do not. That is because they did intersect until about eight months ago. Now the roadway data is accurate, and one road has a roundabout, while the other passes over it. As you can plainly see, a GPS is going to tell you “go up here and turn right on road X”, when in reality it is not possible to do that any more. You don’t want your DNS doing the same thing. Really don’t. There are a couple of issues that could make your DNS either fail to respond or misdirect people. I’ll probably talk about them off-n-on over the next few months, because that’s where my head is at the moment, but we’ll discuss the two obvious ones today, just to keep this blog to blog length. First is failure to respond – either because it is overloaded, or down, or whatever. This one is easy to resolve with redundancy and load balancing. Add in Global Load Balancing, and you can distribute traffic between datacenters, internal clouds, external clouds, whatever, assuming you have the right gear for the job. But if you’re a single datacenter shop, simple redundancy is pretty straight-forward, and the only problem that might compel you to greater measures is a DDoS attack. While a risk, as a single datacenter shop, you’re not likely to attract the attention of crowds that want to participate in DDoS unless you’re in a very controversial market space. So make sure you have redundancy in DNS servers, and test them. Amazing the amount of backup/disaster recovery infrastructure that doesn’t have a regular, formalized testing plan. It does you no good to have it in place if it doesn’t work when you need it. The other is misdirection. The whole point of DNS cache poisoning is to allow someone to masquerade as you. wget can copy the look-n-feel of your website, cache poisoning (or some other as-yet-unutilized DNS vector) can redirect your users to the attacker. They typed in your name, they got a page that looks like your page, but any information they enter goes to someone else. Including passwords and credit card numbers. Scary stuff. So DNS SEC is pretty much required. It protects DNS against known attacks, and against a ton of unexplored vectors, by utilizing authorization and encryption. Yeah, that’s a horrible overstatement, but it works for a blog aimed at IT staff as opposed to DNS uber-specialists. So implement DNS SEC, but understand that it takes CPU cycles on DNS servers – security is never free – so if your DNS system is anywhere near capacity, it’s time to upgrade that 80286 to something with a little more zing. It is a tribute to DNS that many BIND servers are running on ancient hardware, because they can, but it doesn’t hurt any to refresh the hardware and get some more cycles out of DNS. In the real world, you would not use a GPS system that might send you to the wrong place (I shut mine down when in downtown Cincinnati because it is inaccurate, for example), and you wouldn’t use one that a crook could intercept the signal from and send you to a location of his choosing for a mugging rather than to your chosen destination… So don’t use a DNS that both of these things are possible for. Reports indicate that there are still many, many out of date DNS systems running out there. upgrade, implement DNS SEC, and implement redundancy (if you haven’t already, most DNS servers seem to be set up in pairs pretty well) or DNS load balancing. Let your customers know that you’re doing it more reliable and secure – for them. And worry about one less thing while you’re grilling out over the weekend. After all, while all of our systems rely on DNS, you have to admit it gets very little of our attention… Unless it breaks. Make yours more resilient, so you can continue to give it very little attention.
Don_MacVittie_1
Aug 09, 2012 Place Technical Articles
214Views
0likes
0Comments
Cloud vs Cloud
The Battle of the Clouds Aloha! Welcome ladies and gentleman to the face off of the decade. The Battle of the Clouds. In this corner, the up and comer, the phenom that has changed the way IT works, wearing the light shorts - The Cloud! And in this corner, your reigning champ, born and bred of Mother Nature with unstoppable power, wearing the dark trunks - Storm Clouds! You’ve either read about or lived through the massive storm that hit the Mid-Atlantic coast last week. And, by the way, if you are going through a loss, damage or worse, I do hope you can recover quickly and wish you the best. The weather took out power for millions including a Virginia ‘cloud’ datacenter which hosts a number of entertainment and social media sites. Many folks looking to get thru the candle-lit evenings were without their fix. While there has been confusion and growing pains over the years as to just what ‘cloud computing’ is, this instance highlights the fact that even The Cloud is still housed in a data center, with four walls, with power pulls, air conditioning, generators and many of the features we’ve become familiar with ever since the early days of the dot com boom (and bubble). They are physical structures, like our homes, that are susceptible to natural disasters among other things. Data centers have outages all the time but a single traditional data center outage might not get attention since it may only involve a couple companies – when a ‘cloud’ data center crashes, it could impact many companies and like last week, it grabbed headlines. Business continuity and disaster recovery are one of the main concerns for organizations since they rely on their system’s information to run their operations. Many companies use multiple data centers for DR and most cloud providers offer multiple cloud ‘locations’ as a service to protect against the occasional failure. But it is still a data center and most IT professionals have come to accept that a data center will have an outage – it’s just a question of how long and what impact or risk is introduced. In addition, you need the technology in place to be able to swing users to other resources when a outage occurs. A good number of companies don’t have a disaster recovery plan however, especially when backing up their virtual infrastructure in multiple locations. This can be understandable for a smaller start ups if backing up data means doubling their infrastructure (storage) costs but can be double disastrous for a large multi-national corporation. While most of the data center services have been restored and the various organizations are sifting through the ‘what went wrong’ documents, it is an important lesson in redundancy….or the risk of lack of. It might be an acceptable risk and a conscious decision since redundancy comes with a cost – dollars and complexity. A good read about this situation is Ben Coe’s My Friday Night With AWS. The Cloud has been promoting (and proven to some extent) it’s resilience, DR capabilities and it’s ability to technologically recover quickly yet Storm Clouds have proven time and again, that it’s power is unmatched…especially when you need power to turn on a data center. ps Resources Virginia Storm Knocks Out Popular Websites Millions without power as heat wave hammers eastern US Amazon Power Outage Exposes Risks Of Cloud Computing My Friday Night With AWS Modern life halted as Netflix, Pinterest, Instagram go down Storm Blamed for Instagram, Netflix, and Foursquare Outages (Real) Storm Crushes Amazon Cloud, Knocks out Netflix, Pinterest, Instagram
PSilva
Jul 02, 2012 Place Technical Articles
237Views
0likes
0Comments
WILS: Virtualization, Clustering, and Disaster Recovery
#virtualization Clustering is local. Disaster recovery is global. There are two levels of reliability for an application. There’s local and there’s global. We might want to consider it more simply as “inside” and “outside” reliability. Virtualization enables local reliability – the inside kind of reliability. Whether you’re relying upon clustering or load balancing (each has advantages and disadvantages, but for purposes of reliability and this discussion we’ll assume equal capabilities) to provide the abstraction isn’t as important as recognizing that in terms of reliability you’re acting at the local, i.e. inside, level. A cluster or pool, in load balancing parlance, is able to maintain local reliability by distributing load across multiple instances of the application. We can transparently add or remove instances to achieve the elasticity necessary to meet demand, thus ensuring reliability. In the event of a local disaster, such as the failure of a virtual machine, we can take the failed instance out of the rotation and even provision another to replace it. What clustering (load balancing) can’t do is address global reliability, i.e. outside reliability. Global reliability must be addressed using a different technology, normally referred to as Global Server Load Balancing (GLSB). The terminology grew out of the days when global reliability was achieved by load balancing individual servers across the globe to ensure a failure in the network or at a specific location could not interrupt the service. As demand grew, GSLB performed the same functions, but did so at a site level, essentially load balancing sites instead of individual servers. The name remains, however confusing that may be to the uninitiated. To achieve global reliability you need GSLB. To avoid the detrimental effects of a disaster in the network or at the site level, you must be able to direct users to an active location. This is realized in most implementations through simple DNS load balancing techniques; i.e. when a user makes a request the GSLB service responds with the IP address of an appropriate, active site. GLSB is capable of much more complex decision making, however, and decisions can be based on a variety of business and operational parameters, at the discretion of the organization. The GSLB service monitors each of the local sites, and is able to detect an outage within seconds and begin directing users elsewhere. At the local level, clustering and load balancing also monitor the “health” of individual instances and can react similarly in the event of a failure, but do so only at the local level. If the site fails, as might be the case in the event of a disaster, the local service is unable to do anything about it. It can’t redirect globally, it can’t notify other components. It’s just gone. For disaster recovery purposes, this is important stuff. When cloud first drifted onto the scene is was postulated that the cheaper compute would make implementing secondary data centers specifically for disaster recovery purposes more financially feasible for a wider variety of organizations. While that’s true in the sense that it’s way cheaper than building a secondary data center, many of the technological foundations remain the same: GSLB and a replicated environment. Some folks balk at the replication and point to transparent migration as a solution. After all, why pay even pennies on the hour instances that may never be put into commission? The problem is that transparent migration of virtual machines is only useful while the VMs are live and running. If they aren’t, such as might be the case in the event of a disaster, the site can’t be replicated and global reliability fails. A cluster-to-cluster failover via a bridged network to the cloud might sound like a good idea, but it isn’t practical when applied to a disaster recovery scenario. Too much depends on the availability of the site, of the network, and of the clustering/load balancing mechanism itself. If any one of the components has failed, global reliability is unrealizable. To achieve true global reliability regardless of the involvement of cloud computing , you’re going to need to implement a good old-fashioned GSLB architecture, complete with the network components and replicated application infrastructure. Local reliability (inside) may be achievable with virtual clustering solutions, but global reliability requires a very different architecture and set of technologies. Disaster recovery strategies cannot rely on local reliability, they must be based on global reliability. WILS: Write It Like Seth. Seth Godin always gets his point across with brevity and wit. WILS is an ATTEMPT TO BE concise about application delivery TOPICS AND just get straight to the point. NO DILLY DALLYING AROUND. Back to Basics: Load balancing Virtualized Applications The Cost of Ignoring ‘Non-Human’ Visitors Cloud Bursting: Gateway Drug for Hybrid Cloud The HTTP 2.0 War has Just Begun Why Layer 7 Load Balancing Doesn’t Suck Network versus Application Layer Prioritization WILS: The Many Faces of TCP WILS: WPO versus FEO
Lori_MacVittie
May 23, 2012 Place Technical Articles
226Views
0likes
0Comments