The Challenges of SQL Load Balancing
#infosec #iam load balancing databases is fraught with many operational and business challenges. While cloud computing has brought to the forefront of our attention the ability to scale through duplication, i.e. horizontal scaling or “scale out” strategies, this strategy tends to run into challenges the deeper into the application architecture you go. Working well at the web and application tiers, a duplicative strategy tends to fall on its face when applied to the database tier. Concerns over consistency abound, with many simply choosing to throw out the concept of consistency and adopting instead an “eventually consistent” stance in which it is assumed that data in a distributed database system will eventually become consistent and cause minimal disruption to application and business processes. Some argue that eventual consistency is not “good enough” and cite additional concerns with respect to the failure of such strategies to adequately address failures. Thus there are a number of vendors, open source groups, and pundits who spend time attempting to address both components. The result is database load balancing solutions. For the most part such solutions are effective. They leverage master-slave deployments – typically used to address failure and which can automatically replicate data between instances (with varying levels of success when distributed across the Internet) – and attempt to intelligently distribute SQL-bound queries across two or more database systems. The most successful of these architectures is the read-write separation strategy, in which all SQL transactions deemed “read-only” are routed to one database while all “write” focused transactions are distributed to another. Such foundational separation allows for higher-layer architectures to be implemented, such as geographic based read distribution, in which read-only transactions are further distributed by geographically dispersed database instances, all of which act ultimately as “slaves” to the single, master database which processes all write-focused transactions. This results in an eventually consistent architecture, but one which manages to mitigate the disruptive aspects of eventually consistent architectures by ensuring the most important transactions – write operations – are, in fact, consistent. Even so, there are issues, particularly with respect to security. MEDIATION inside the APPLICATION TIERS Generally speaking mediating solutions are a good thing – when they’re external to the application infrastructure itself, i.e. the traditional three tiers of an application. The problem with mediation inside the application tiers, particularly at the data layer, is the same for infrastructure as it is for software solutions: credential management. See, databases maintain their own set of users, roles, and permissions. Even as applications have been able to move toward a more shared set of identity stores, databases have not. This is in part due to the nature of data security and the need for granular permission structures down to the cell, in some cases, and including transactional security that allows some to update, delete, or insert while others may be granted a different subset of permissions. But more difficult to overcome is the tight-coupling of identity to connection for databases. With web protocols like HTTP, identity is carried along at the protocol level. This means it can be transient across connections because it is often stuffed into an HTTP header via a cookie or stored server-side in a session – again, not tied to connection but to identifying information. At the database layer, identity is tightly-coupled to the connection. The connection itself carries along the credentials with which it was opened. This gives rise to problems for mediating solutions. Not just load balancers but software solutions such as ESB (enterprise service bus) and EII (enterprise information integration) styled solutions. Any device or software which attempts to aggregate database access for any purpose eventually runs into the same problem: credential management. This is particularly challenging for load balancing when applied to databases. LOAD BALANCING SQL To understand the challenges with load balancing SQL you need to remember that there are essentially two models of load balancing: transport and application layer. At the transport layer, i.e. TCP, connections are only temporarily managed by the load balancing device. The initial connection is “caught” by the Load balancer and a decision is made based on transport layer variables where it should be directed. Thereafter, for the most part, there is no interaction at the load balancer with the connection, other than to forward it on to the previously selected node. At the application layer the load balancing device terminates the connection and interacts with every exchange. This affords the load balancing device the opportunity to inspect the actual data or application layer protocol metadata in order to determine where the request should be sent. Load balancing SQL at the transport layer is less problematic than at the application layer, yet it is at the application layer that the most value is derived from database load balancing implementations. That’s because it is at the application layer where distribution based on “read” or “write” operations can be made. But to accomplish this requires that the SQL be inline, that is that the SQL being executed is actually included in the code and then executed via a connection to the database. If your application uses stored procedures, then this method will not work for you. It is important to note that many packaged enterprise applications rely upon stored procedures, and are thus not able to leverage load balancing as a scaling option. Depending on your app or how your organization has agreed to protect your data will determine which of these methods are used to access your databases. The use of inline SQL affords the developer greater freedom at the cost of security, increased programming(to prevent the inherent security risks), difficulty in optimizing data and indices to adapt to changes in volume of data, and deployment burdens. However there is lively debate on the values of both access methods and how to overcome the inherent risks. The OWASP group has identified the injection attacks as the easiest exploitation with the most damaging impact. This also requires that the load balancing service parse MySQL or T-SQL (the Microsoft Transact Structured Query Language). Databases, of course, are designed to parse these string-based commands and are optimized to do so. Load balancing services are generally not designed to parse these languages and depending on the implementation of their underlying parsing capabilities, may actually incur significant performance penalties to do so. Regardless of those issues, still there are an increasing number of organizations who view SQL load balancing as a means to achieve a more scalable data tier. Which brings us back to the challenge of managing credentials. MANAGING CREDENTIALS Many solutions attempt to address the issue of credential management by simply duplicating credentials locally; that is, they create a local identity store that can be used to authenticate requests against the database. Ostensibly the credentials match those in the database (or identity store used by the database such as can be configured for MSSQL) and are kept in sync. This obviously poses an operational challenge similar to that of any distributed system: synchronization and replication. Such processes are not easily (if at all) automated, and rarely is the same level of security and permissions available on the local identity store as are available in the database. What you generally end up with is a very loose “allow/deny” set of permissions on the load balancing device that actually open the door for exploitation as well as caching of credentials that can lead to unauthorized access to the data source. This also leads to potential security risks from attempting to apply some of the same optimization techniques to SQL connections as is offered by application delivery solutions for TCP connections. For example, TCP multiplexing (sharing connections) is a common means of reusing web and application server connections to reduce latency (by eliminating the overhead associated with opening and closing TCP connections). Similar techniques at the database layer have been used by application servers for many years; connection pooling is not uncommon and is essentially duplicated at the application delivery tier through features like SQL multiplexing. Both connection pooling and SQL multiplexing incur security risks, as shared connections require shared credentials. So either every access to the database uses the same credentials (a significant negative when considering the loss of an audit trail) or we return to managing duplicate sets of credentials – one set at the application delivery tier and another at the database, which as noted earlier incurs additional management and security risks. YOU CAN’T WIN FOR LOSING Ultimately the decision to load balance SQL must be a combination of business and operational requirements. Many organizations successfully leverage load balancing of SQL as a means to achieve very high scale. Generally speaking the resulting solutions – such as those often touted by e-Bay - are based on sound architectural principles such as sharding and are designed as a strategic solution, not a tactical response to operational failures and they rarely involve inspection of inline SQL commands. Rather they are based on the ability to discern which database should be accessed given the function being invoked or type of data being accessed and then use a traditional database connection to connect to the appropriate database. This does not preclude the use of application delivery solutions as part of such an architecture, but rather indicates a need to collaborate across the various application delivery and infrastructure tiers to determine a strategy most likely to maintain high-availability, scalability, and security across the entire architecture. Load balancing SQL can be an effective means of addressing database scalability, but it should be approached with an eye toward its potential impact on security and operational management. What are the pros and cons to keeping SQL in Stored Procs versus Code Mission Impossible: Stateful Cloud Failover Infrastructure Scalability Pattern: Sharding Streams The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month… SQL injection – past, present and future True DDoS Stories: SSL Connection Flood Why Layer 7 Load Balancing Doesn’t Suck Web App Performance: Think 1990s.2.2KViews0likes1CommentF5 Friday: Load Balancing MySQL with F5 BIG-IP
Scaling MySQL just got a whole lot easier load balancing MySQL – any database, really – is not a trivial task. Generally speaking one does not simply round robin your way through a cluster of MySQL databases as a means to achieve scalability. It is databases, in fact, that have driven a wide variety of scalability patterns such as sharding and partitioning to achieve the ultimate goal of high-performance and scalability simultaneously. Unfortunately, most folks don’t architect their applications with scalability in mind. A single database is all that’s necessary at first, and because of the way in which the application interacts with the database, it doesn’t make sense to code in support for multiple database instances, such as is often implemented with a MySQL master-slave cluster. That’s because the application has to actually open a connection to the database in question. If you’re only starting with one database, you really can’t code in a connection to a separate instance. Eventually that application’s usage grows and the demands upon the database require a more scalable approach. Enter the MySQL master/slave relationship. A typical configuration is to maintain the master as the “write” database, i.e. all updates and/or inserts must use the master, while the slave instance is used as a “read only” instance. Obviously this means the application code must be changed to support this kind of functional sharding. Unless you leverage network server virtualization from a load balancing service capable of acting as a full-proxy at layer 7 (application) like BIG-IP. This solution leverages iRules to implement database load balancing. While this specific example is designed to perform the common functional sharding pattern of read-write separation for a master-slave MySQL cluster, the flexibility of iRules is such that other architectural solutions can easily be designed using the same basic functions. Location based sharding is another popular means of scaling databases, and using the GeoLocation capabilities of BIG-IP along with iRules to inspect and route database requests, it should be a fairly trivial architectural task to implement. The ability to further extend sharding or other distribution methodologies for scaling databases without modifying the application itself is a huge bonus for both developers and operations. By decoupling the application from the database, it provides a more flexibility set of scalability domains in which technology targeted scalability strategies can be leveraged independent of the other layers. This is an important facet of agile infrastructure architecture and should not be underestimated as a benefit of network server virtualization. MySQL Load Balancing Resources: MySQL Proxy iRule MySQL Proxy iApp (deployment package for BIG-IP v11) The Full-Proxy Data Center Architecture Infrastructure Scalability Pattern: Sharding Streams Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type IT as a Service: A Stateless Infrastructure Architecture Model F5 Friday: Platform versus Product At the Intersection of Cloud and Control… What is a Strategic Point of Control Anyway? All F5 Friday Posts on DevCentral Why Single-Stack Infrastructure Sucks2.6KViews0likes0CommentsDatabases in the Cloud Revisited
A few of us were talking on Facebook about high speed rail (HSR) and where/when it makes sense the other day, and I finally said that it almost never does. Trains lost out to automobiles precisely because they are rigid and inflexible, while population densities and travel requirements are highly flexible. That hasn’t changed since the early 1900s, and isn’t likely to in the future, so we should be looking at different technologies to answer the problems that HSR tries to address. And since everything in my universe is inspiration for either blogging or gaming, this lead me to reconsider the state of cloud and the state of cloud databases in light of synergistic technologies (did I just use “synergistic technologies in a blog? Arrrggghhh…). There are several reasons why your organization might be looking to move out of a physical datacenter, or to have a backup datacenter that is completely virtual. Think of the disaster in Japan or hurricane Katrina. In both cases, having even the mission critical portions of your datacenter replicated to the cloud would keep your organization online while you recovered from all of the other very real issues such a disaster creates. In other cases, if you are a global organization, the cost of maintaining your own global infrastructure might well be more than utilizing a global cloud provider for many services… Though I’ve not checked, if I were CIO of a global organization today, I would be looking into it pretty closely, particularly since this option should continue to get more appealing as technology continues to catch up with hype. Today though, I’m going to revisit databases, because like trains, they are in one place, and are rigid. If you’ve ever played with database Continuous Data Protection or near-real-time replication, you know this particular technology area has issues that are only now starting to see technological resolution. Over the last year, I have talked about cloud and remote databases a few times, talking about early options for cloud databases, and mentioning Oracle Goldengate – or praising Goldengate is probably more accurate. Going to the west in the US? HSR is not an option. The thing is that the options get a lot more interesting if you have Goldengate available. There are a ton of tools, both integral to database systems and third-party that allow you to encrypt data at rest these days, and while it is not the most efficient access method, it does make your data more protected. Add to this capability the functionality of Oracle Goldengate – or if you don’t need heterogeneous support, any of the various database replication technologies available from Oracle, Microsoft, and IBM, you can seamlessly move data to the cloud behind the scenes, without interfering with your existing database. Yes, initial configuration of database replication will generally require work on the database server, but once configured, most of them run without interfering with the functionality of the primary database in any way – though if it is one that runs inside the RDBMS, remember that it will use up CPU cycles at the least, and most will work inside of a transaction so that they can insure transaction integrity on the target database, so know your solution. Running inside the primary transaction is not necessary, and for many uses may not even be desirable, so if you want your commits to happen rapidly, something like Goldengate that spawns a separate transaction for the replica are a good option… Just remember that you then need to pay attention to alerts from the replication tool so that you don’t end up with successful transactions on the primary not getting replicated because something goes wrong with the transaction on the secondary. But for DBAs, this is just an extension of their daily work, as long as someone is watching the logs. With the advent of Goldengate, advanced database encryption technology, and products like our own BIG-IPWOM, you now have the ability to drive a replica of your database into the cloud. This is certainly a boon for backup purposes, but it also adds an interesting perspective to application mobility. You can turn on replication from your data center to the cloud or from cloud provider A to cloud provider B, then use VMotion to move your application VMS… And you’re off to a new location. If you think you’ll be moving frequently, this can all be configured ahead of time, so you can flick a switch and move applications at will. You will, of course, have to weigh the impact of complete or near-complete database encryption against the benefits of cloud usage. Even if you use the adaptability of the cloud to speed encryption and decryption operations by distributing them over several instances, you’ll still have to pay for that CPU time, so there is a balancing act that needs some exploration before you’ll be certain this solution is a fit for you. And at this juncture, I don’t believe putting unencrypted corporate data of any kind into the cloud is a good idea. Every time I say that, it angers some cloud providers, but frankly, cloud being new and by definition shared resources, it is up to the provider to prove it is safe, not up to us to take their word for it. Until then, encryption is your friend, both going to/from the cloud and at rest in the cloud. I say the same thing about Cloud Storage Gateways, it is just a function of the current state of cloud technology, not some kind of unreasoning bias. So the key then is to make sure your applications are ready to be moved. This is actually pretty easy in the world of portable VMs, since the entire VM will pick up and move. The only catch is that you need to make sure users can get to the application at the new location. There are a ton of Global DNS solutions like F5’s BIG-IP Global Traffic Manager that can get your users where they need to be, since your public-facing IPs will be changing when moving from organization to organization. Everything else should be set, since you can use internal IP addresses to communicate between your application VMs and database VMs. Utilizing a some form of in-flight encryption and some form of acceleration for your database replication will round out the solution architecture, and leave you with a road map that looks more like a highway map than an HSR map. More flexible, more pervasive.367Views0likes0CommentsDefense in Depth in Context
In the days of yore, a military technique called Defense-in-Depth was used to protect kingdoms, castles, and other locations where you might be vulnerable to attack. It's a layered defense strategy where the attacker would have to breach several layers of protection to finally reach the intended target. It allows the defender to spread their resources and not put all of the protection in one location. It's also a multifaceted approach to protection in that there are other mechanisms in place to help; and it's redundant so if a component failed or is compromised, there are others that are ready to step in to keep the protection in tack. Information technology also recognizes this technique as one of the 'best practices' when protecting systems. The infrastructure and systems they support are fortified with a layered security approach. There are firewalls at the edge and often, security mechanisms at every segment of the network. Circumvent one, the next layer should net them. There is one little flaw with the Defense-in-Depth strategy - it is designed to slow down attacks, not necessarily stop them. It gives you time to mobilize a counter-offensive and it's an expensive and complex proposition if you are an attacker. It's more of a deterrent than anything and ultimately, the attacker could decide that the benefits of continuing the attack outweigh the additional costs. In the digital world, it is also interpreted as redundancy. Place multiple iterations of a defensive mechanism within the path of the attacker. The problem is that the only way to increase the cost and complexity for the attacker is to raise the cost and complexity of your own defenses. Complexity is the kryptonite of good security and what you really need is security based on context. Context takes into account the environment or conditions surrounding an event to make an informed decision about how to apply security. This is especially true when protecting a database. Database firewalls are critical components to protecting your valuable data and can stop a SQL Injection attack, for instance, in an instant. What they lack is the ability to decipher contextual data like userid, session, cookie, browser type, IP address, location and other meta-data of who or what actually performed the attack. While it can see that a particular SQL query is invalid, it cannot decipher who made the request. Web Application Firewalls on the other hand can gather user side information since many of its policy decisions are based on the user's context. A WAF monitors every request and response from the browser to the web application and consults a policy to determine if the action and data are allowed. It uses such information as user, session, cookie and other contextual data to decide if it is a valid request. Independent technologies that protect against web attacks or database attacks are available, but they have not been linked to provide unified notification and reporting. Now imagine if your database was protected by a layered, defense-in-depth architecture along with the contextual information to make informed, intelligent decisions about database security incidents. The integration of BIG-IP ASM with Oracle's Database Firewall offers the database protection that Oracle is known for and the contextual intelligence that is baked into every F5 solution. Unified reporting for both the application firewall and database firewall provides more convenient and comprehensive security monitoring. Integration between the two security solutions offers a holistic approach to protecting web and database tiers from SQL injection type of attacks. The integration gives you the layered protection many security professionals recognize as a best practice, plus the contextual information needed to make intelligent decisions about what action to take. This solution provides improved SQL injection protection to F5 customers and correlated reporting for richer forensic information on SQL injection attacks to Oracle database customers. It’s an end-to-end web application and database security solution to protect data, customers, and their businesses. ps Resources: F5 Joins with Oracle to Offer Enhanced Security for Web-Based Database Applications Security for Web-Based Database Applications Enhanced With F5 and Oracle Product Integration Using Oracle Database Firewall with BIG-IP ASM F5 Networks Adds To Oracle Database Oracle Database Firewall BIG-IP Application Security Manager The “True Security Company” Red Herring F5 Friday: Two Heads are Better Than One263Views0likes0CommentsWhat Do Database Connectivity Standards and the Pirate’s Code Have in Common?
A: They’re both more what you’d call “guidelines” than actual rules. An almost irrefutable fact of application design today is the need for a database, or at a minimum a data store – i.e. a place to store the data generated and manipulated by the application. A second reality is that despite the existence of database access “standards”, no two database solutions support exactly the same syntax and protocols. Connectivity standards like JDBC and ODBC exist, yes, but like SQL they are variable, resulting in just slightly different enough implementations to effectively cause vendor lock-in at the database layer. You simply can’t take an application developed to use an Oracle database and point it at a Microsoft or IBM database and expect it to work. Life’s like that in the development world. Database connectivity “standards” are a lot like the pirate’s Code, described well by Captain Barbossa in Pirates of the Carribbean as “more what you’d call ‘guidelines’ than actual rules.” It shouldn’t be a surprise, then, to see the rise of solutions that address this problem, especially in light of an increasing awareness of (in)compatibility at the database layer and its impact on interoperability, particularly as it relates to cloud computing . Forrester Analyst Noel Yuhanna recently penned a report on what is being called Database Compatibility Layers (DCL). The focus of DCL at the moment is on migration across database platforms because, as pointed out by Noel, they’re expensive, time consuming and very costly. Database migrations have always been complex, time-consuming, and costly due to proprietary data structures and data types, SQL extensions, and procedural languages. It can take up to several months to migrate a database, depending on database size, complexity, and usage of these proprietary features. A new technology has recently emerged for solving this problem: the database compatibility layer, a database access layer that supports another database management system’s (DBMS’s) proprietary extensions natively, allowing existing applications to access the new database transparently. -- Simpler Database Migrations Have Arrived (Forrester Research Report) Anecdotally, having been on the implementation end of such a migration I can’t disagree with the assessment. Whether the right answer is to sit down and force some common standards on database connectivity or build a compatibility layer is a debate for another day. Suffice to say that right now the former is unlikely given the penetration and pervasiveness of existing database connectivity, so the latter is probably the most efficient and cost-effective solution. After all, any changes in the core connectivity would require the same level of application modification as a migration; not an inexpensive proposition at all. According to Forrester a Database Compatibility Layer (DCL) is a “database layer that supports another DBMS’s proprietary SQL extensions, data types, and data structures natively. Existing applications can transparently access the newly migrated database with zero or minimal changes.” By extension, this should also mean that an application could easily access one database and a completely different one using the same code base (assuming zero changes, of course). For the sake of discussion let’s assume that a DCL exists that exhibits just that characteristic – complete interoperability at the connectivity layer. Not just for migration, which is of course the desired use, but for day to day use. What would that mean for cloud computing providers – both internal and external? ENABLING IT as a SERVICE Based on our assumption that a DCL exists and is implemented by multiple database solution vendors, a veritable cornucopia of options becomes a lot more available for moving enterprise architectures toward IT as a Service than might be at first obvious. Consider that applications have variable needs in terms of performance, redundancy, disaster recovery, and scalability. Some applications require higher performance, others just need a nightly or even weekly backup and some, well, some are just not that important that you can’t use other IT operations backups to restore if something goes wrong. In some cases the applications might have varying needs based on the business unit deploying them. The same application used by finance, for example, might have different requirements than the same one used by developers. How could that be? Because the developers may only be using that application for integration or testing while finance is using it for realz. It happens. What’s more interesting, however, is how a DCL could enable a more flexible service-oriented style buffet of database choices, especially if the organization used different database solutions to support different transactional, availability, and performance goals. If a universal DCL (or near universal at least) existed, business stakeholders – together with their IT counterparts – could pick and choose the database “service” they wished to employ based on not only the technical characteristics and operational support but also the costs and business requirements. It would also allow them to “migrate” over time as applications became more critical, without requiring a massive investment in upgrading or modifying the application to support a different back-end database. Obviously I’m picking just a few examples that may or may not be applicable to every organization. The bigger thing here, I think, is the flexibility in architecture and design that is afforded by such a model that balances costs with operational characteristics. Monitoring of database resource availability, too, could be greatly simplified from such a layer, providing solutions that are natively supported by upstream devices responsible for availability at the application layer, which ultimately depends on the database but is often an ignored component because of the complexity currently inherent in supporting such a varied set of connectivity standards. It should also be obvious that this model would work for a PaaS-style provider who is not tied to any given database technology. A PaaS-style vendor today must either invest effort in developing and maintaining a services layer for database connectivity or restrict customers to a single database service. The latter is fine if you’re creating a single-stack environment such as Microsoft Azure but not so fine if you’re trying to build a more flexible set of offerings to attract a wider customer base. Again, same note as above. Providers would have a much more flexible set of options if they could rely upon what is effectively a single database interface regardless of the specific database implementation. More importantly for providers, perhaps, is the migration capability noted by the Forrester report in the first place, as one of the inhibitors of moving existing applications to a cloud computing provider is support for the same database across both enterprise and cloud computing environments. While services layers are certainly a means to the same end, such layers are not universally supported. There’s no “standard” for them, not even a set of best practice guidelines, and the resulting application code suffers exactly the same issues as with the use of proprietary database connectivity: lock in. You can’t pick one up and move it to the cloud, or another database without changing some code. Granted, a services layer is more efficient in this sense as it serves as an architectural strategic point of control at which connectivity is aggregated and thus database implementation and specifics are abstracted from the application. That means the database can be changed without impacting end-user applications, only the services layer need be modified. But even that approach is problematic for packaged applications that rely upon database connectivity directly and do not support such service layers. A DCL, ostensibly, would support packaged and custom applications if it were implemented properly in all commercial database offerings. CONNECTIVITY CARTEL And therein lies the problem – if it were implemented properly in all commercial database offerings. There is a risk here of a connectivity cartel arising, where database vendors form alliances with other database vendors to support a DCL while “locking out” vendors whom they have decided do not belong. Because the DCL depends on supporting “proprietary SQL extensions, data types, and data structures natively” there may be a need for database vendors to collaborate as a means to properly support those proprietary features. If collaboration is required, it is possible to deny that collaboration as a means to control who plays in the market. It’s also possible for a vendor to slightly change some proprietary feature in order to “break” the others’ support. And of course the sheer volume of work necessary for a database vendor to support all other database vendors could overwhelm smaller database vendors, leaving them with no real way to support everyone else. The idea of a DCL is an interesting one, and it has its appeal as a means to forward compatibility for migration – both temporary and permanent. Will it gain in popularity? For the latter, perhaps, but for the former? Less likely. The inherent difficulties and scope of supporting such a wide variety of databases natively will certainly inhibit any such efforts. Solutions such as a REST-ful interface, a la PHP REST SQL or a JSON-HTTP based solution like DBSlayer may be more appropriate in the long run if they were to be standardized. And by standardized I mean standardized with industry-wide and agreed upon specifications. Not more of the “more what you’d call ‘guidelines’ than actual rules” that we already have. Database Migrations are Finally Becoming Simpler Enterprise Information Integration | Data Without Borders Review: EII Suites | Don't Fear the Data The Database Tier is Not Elastic Infrastructure Scalability Pattern: Sharding Sessions F5 Friday: THE Database Gets Some Love The Impossibility of CAP and Cloud Sessions, Sessions Everywhere Cloud-Tiered Architectural Models are Bad Except When They Aren’t285Views0likes1CommentThe Database Tier is Not Elastic
It is the database tier and its unique characteristics that ultimate determine where an application will be deployed. cloud computing is mostly about “elasticity.” The extraction and contraction of resources based on demand. It is the contraction of resources which is oft times forgotten but without it, cloud computing and highly dynamic, virtualized infrastructures are little more than seamless capacity growth engines. For web and application architectural tiers, the contraction of resources is as much a requirement to realize the benefits of shared, dynamic capacity as the ability to rapidly expand. But in the database tier, the application data layer, contraction is more a contradiction than anything else. WHAT COMES UP USUALLY COMES DOWN Elasticity in applications is a good thing. It is important to the overall success rate of cloud computing and dynamic infrastructure initiatives to remember that “what comes up, must come down” – especially in relation to provisioned compute resources. Applications should expand their resource consumption to meet demand, but when demand wanes, so too should their resource consumption rates. By spreading compute resources around the various applications that need them in a dynamic way, based on demand, we achieve peak efficiency and make the most of our capital expenditures. Such architectural approaches allow us to allocate “temporary” compute resources when necessary from cloud computing environments external to the organization, and release them when not necessary. This is all well and good, except when we’re talking about the database. Databases employ a number of techniques by which they can improve their performance, and most of them involve complex caching and pooling strategies that make use of lots and lots of RAM. At the database tier, RAM may increase, but it rarely decreases. It’s a different kind of workload than web and application servers, which can easily be scaled out using parallel processing strategies. Many, many copies of the same code can execute in isolated chunks around the data center because they do not need access to a centralized store of information about all the sessions that may be occurring at the same time. In order to maintain consistency, databases use indexes and locks and other computational techniques to manage access to data, especially in the case of modification. This means that even though the code to perform such tasks can be ostensibly executed on multiple copies of a database, the especial data required to ensure consistent operations is contained in a single, contiguous data structure. That data cannot be easily transferred or replicated in real-time to other copies. There is a single data overlord that must maintain a holistic view of the data and therefore must (today) run on a single machine (virtual or iron). That means all access is through a single gateway, and scaling that gateway is generally only possible through the expansion of resources available to the database application. Scale up is the traditional strategy, and until we learn how to share memory blocks across the network in a way that assures consistency we can either bow to the belief that eventual consistency is good enough or that there will be one, ginormous system that continually expands along with data growth. YOU CAN SCALE OUT READ but NOT WRITE It is the unique characteristics of data that result in a quirky architecture that allows us to scale out read but only scale up write. This makes the database tier a lot more complex than perhaps it once was. In the past, a single ginormous server housed a database and it was the only path to data. Today, however, the need for better performance and support for hyperscaling of applications has led to a functional partitioning scheme that separates reads from writes and assumes that eventual consistency is better than non-availability. This does not mean it’s impossible to put a database into an external cloud computing environment. It just means that it’s going to run, 24x7, and scalability cannot necessarily be achieved by scaling out – the traditional means by which a cloud computing environment enables scale. It means that scaling up will require migration, if you haven’t adjusted for future growth to begin with, and that there may be, depending on the cloud computing environment you choose, an upper bounds to your data growth. If you’ve only got X amount of disk and memory available, at some point your database will hit that upper bound and either it will begin to drag down performance or availability or simply be unable to continue growing. Or you’ll need to consider the use of distributed database systems which can scale out by distributing data across multiple database nodes (local or remote) either using replication or duplication. When used over a LAN – low latency, high performing, high bandwidth – the replication and/or duplication required for the master database to manage and maintain its minion databases can be successful. One would assume, then, that the use of distributed database systems in a cloud computing environment would be the appropriate marriage of the two architectural approaches to scalability. However, most enterprise applications existing today – both developed in-house and packaged – do not take advantage of such technology and there exist no standardized means by which a traditional DBMS can be morphed into a DDBMS. Additionally, the replication/duplication of database systems over a WAN – high latency, lower performing, low bandwidth – is problematic for maintaining consistency. Which often means a closed-system, LAN connected only approach to application architecture is the only feasible option. Which puts us right back where we were – with the database tier being upward-bound only, not elastic, and potentially outgrowing the ability of a provider to offer an appropriate level of compute resources to maintain performance and capacity, effectively limiting data growth. Which is not a good thing, because limiting data growth means limiting business growth. DATA GROWTH is AN INDICATOR of BUSINESS SUCCESS It is almost universally true that the growth of data is an indicator of business success. As business grows, so does the customer data. As business grows, so does the user-generated content. As business grows, so do the financial and employee records and e-mail. And, of course, the gigabytes of Power Point presentations and standard operating procedure documents that grow, morph, and are ultimately discarded – but maintained for posterity/reference in the future grow along with the business. Data grows, it doesn’t shrink. There is nothing that so accurately lives up to the “pack rat” mentality as a business. And much of it is stored in databases, which live in the data tier and are increasingly web (and mobile client) enabled. So when we talk about elastic applications we’re really talking just about the applications, not necessarily the data tier. Unless you have employed a sharded architectural approach to enabling long-term growth, you have “THE database” and it’s going to grow and grow and grow and never shrink. It isn’t elastic; the parts of an application that are are the applications that access THE database. It is this “nut” that needs to be cracked for cloud computing to truly become “the” standard for data center architectures. Until we either see DDBMS become the standard for database systems or figure out how to really share compute resources across the LAN such that RAM from multiple machines appears to be a contiguous, locally accessible chunk of memory, the database tier will be the limiting – and deciding – factor in determining how an application is architected and where it might end up residing. Related blogs & articles: Data Center Feng Shui: Normalizing Phased Deployment with Virtualized Network Appliances The Multi-Generational Datacenter: From Toddlers to Teenagers Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type Applying Scalability Patterns to Infrastructure Architecture Sessions, Sessions Everywhere I, Cloud Infrastructure 2.0 + Cloud + IT as a Service = An Architectural Parfait The Battle of Economy of Scale versus Control and Flexibility199Views0likes1CommentOracle RMAN Replication with F5's BIG-IP WOM
In this side by side demo, you will see an Oracle database duplicated using the RMAN process, both with and without Wan acceleration from F5. How to accelerate Oracle Recovery Manager database duplication using F5's Wan Optimization technology.166Views0likes1CommentF5 Friday: THE Database Gets Some Love
The database has long been the black sheep of application infrastructure; oft dismissed with a casual hand-wave in discussions involving acceleration and scalability. Finally, the database gets some much deserved application delivery love. THE database. We don’t really capitalize it but when we talk about it we do use an implied emphasis on “the” because the database is, regardless of how you look at it, the core of business and datacenter architectures. Mess with the database and, well, you mess with everything. The database is the gatekeeper, the record-keeper, the storage solution for critical data without which applications and the users that rely upon them would simply stop being productive. Without it, apps are really not all that useful because the point of an application (from a technical perspective) is to provide an interface to the data. Yet we have traditionally excluded THE database from general discussions of application delivery because it’s such a different beast from other applications and it’s really not a good area for experimentation because, after all, it’s THE database. Indeed the primary method of scaling a database has been either vertical (scale up) or distribution – managed by the database itself. With the exception of load balancing read versus writes to different instances, there’s been very little interaction between databases and load balancing solutions to date. It’s generally just too risky. Of late, cloud computing has raised awareness of the problem of data and in particular the problem of transferring “big data” across bandwidth-constrained networks, namely THE WAN. That and the more general synchronization of data across disparate database instances (the “consistency” in Brewer’s CAP Theorem) has subsequently reawakened awareness and interest in the problems associated with database replication and synchronization across less-than-optimal network connections. That’s the Internet, in case you were wondering. THE PROBLEM The problem, interestingly enough, is one shared by other plus-sized applications such as virtual machine images. VMware’s VMotion, for example, often fails to transfer virtual machine images across long-distance WAN links in the required timeframe because there’s simply too much latency. Whether that’s caused by congestion or a constrained amount of bandwidth or just inefficient protocols isn’t nearly as important as the fact that, well, it fails. Which makes it very hard to get excited about the ability to migrate virtual machines across data centers or clouds. After all, if it’s going to fail more often than not, it’s just not reliable enough to form the basis for an IT strategy around scalability. Similarly, the inability to perform database replication and synchronization reliably has continued to be a source of frustration for many attempting to formulate a strategy that includes applications distributed across clouds and data centers. Applications need their data, and users need consistent data. An application that spans multiple sites either has to distribute THE database at both sites to provide the requisite performance and risk consistency or configure all application instances to leverage a central database and risk performance and availability. Neither is really an acceptable solution. THE SOLUTION Assuming that databases aren’t going to get smaller and the ability to reliably perform replication and synchronization across a WAN the only option left is to improve the WAN conditions such that it can be made to reliably perform such transfers. Well, okay, that’s not the only option – organizations could probably choose a solution that includes a direct link to the Internet backbone and thus eliminate the entire WAN problem for every application in their datacenter. But the costs associated with that make it an unlikely option. Improving the performance characteristics and reliability of THE WAN is the best option we have because we can control that, we can impact that, we can do something about it. Being an F5 Friday post you’ve been waiting for the kool-aid, so here it comes – we have a solution that delivers a simplified, optimized WAN connection solution that allows reliable, secure transfer of “big data” on what are traditionally unreliable WAN connections. We’ve integrated BIG-IP® Local Traffic Manager™ and WAN Optimization Module™ with Oracle Database, providing optimized performance for joint customers. In much the same way as we integrated with VMware to provide the reliable, speedier transfer of virtual images across unreliable WAN connections, now we’re providing the same reliable exchange of data across the WAN for Oracle database. Integration, she is a beautiful thing, is she not?165Views0likes0CommentsSessions, Sessions Everywhere
If you’re replicating session state across application servers you probably need to rethink your strategy. There’s other options – more efficient options – than wasting RAM and, ultimately, money. Although the discussion of Oracle’s “cloud in a box” announcement at OpenWorld dominated much of the tweet-stream this week there were other discussions going on that proved to not only interesting but a good reminder of how cloud computing has brought to the fore the importance of architecture. Foremost in my mind was what started as a lamentation on the fact that Amazon EC2 does not support multicasting that evolved into a discussion on why that would cause grief for those deploying applications in the environment. Remember that multicast is essentially spraying the same data to a group of endpoints and is usually leveraged for streaming media topologies: In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source creating copies automatically in other network elements, such as routers, only when the topology of the network requires it. -- Wikipedia, multicast As it turns out, a primary reason behind the need for multicasting in the application architecture revolves around the mirroring of session state across a pool of application servers. Yeah, you heard that right – mirroring session state across a pool of application servers. The first question has to be: why? What is it about an application that requires this level of duplication? MULTICASTING for SESSIONS There are three reasons why someone would want to use multicasting to mirror session state across a pool of application servers. There may be additional reasons that aren’t as common and if so, feel free to share. The application relies on session state and, when deployed in a load balanced environment, broke because the tight-coupling between user and session state was not respected by the Load balancer. This is a common problem when moving from dev/qa to production and is generally caused by using a load balancing algorithm without enabling persistence, a.k.a. sticky sessions. The application requires high-availability that necessitates architecting a stateful-failover architecture. By mirroring sessions to all application servers if one fails (or is decommissioned in an elastic environment) another can easily re-establish the coupling between the user and their session. This is not peculiar to application architecture – load balancers and application delivery controllers mirror their own “session” state across redundant pairs to achieve a stateful failover architecture as well. Some applications, particularly those that are collaborative in nature (think white-boarding and online conferences) “spray” data across a number of sessions in order to enable the sharing in real time aspect of the application. There are other architectural choices that can achieve this functionality, but there are tradeoffs to all of them and in this case it is simply one of several options. THE COST of REPLICATING SESSIONS With the exception of addressing the needs of collaborative applications (and even then there are better options from an architectural point of view) there are much more efficient ways to handle the tight-coupling of user and session state in an elastic or scaled-out environment. The arguments against multicasting session state are primarily around resource consumption, which is particularly important in a cloud computing environment. Consider that the typical session state is 3-200 KB in size (Session State: Beyond Soft State ). Remember that if you’re mirroring every session across an entire cluster (pool) of application servers, that each server must use memory to store that session. Each mirrored session, then, is going to consume resources on every application server. Every application server has, of course, a limited amount of memory it can utilize. It needs that memory for more than just storing session state – it must also store connection tables, its own configuration data, and of course it needs memory in which to execute application logic. If you consume a lot of the available memory storing the session state from every other application server, you are necessarily reducing the amount of memory available to perform other important tasks. This reduces the capacity of the server in terms of users and connections, it reduces the speed with which it can execute application logic (which translates into reduced response times for users), and it operates on a diminishing returns principle. The more application servers you need to scale – and you’ll need more, more frequently, using this technique – the less efficient each added application server becomes because a good portion of its memory is required simply to maintain session state of all the other servers in the pool. It is exceedingly inefficient and, when leveraging a public cloud computing environment, more expensive. It’s a very good example of the diseconomy of scale associated with traditional architectures – it results in a “throw more ‘hardware’ at the problem, faster” approach to scalability. BETTER ARCHITECTURAL SOLUTIONS There are better architectural solutions to maintaining session state for every user. SHARED DATABASE Storing session state in a shared database is a much more efficient means of mirroring session state and allows for the same guarantees of consistency when experiencing a failure. If session state is stored in a database then regardless of which application server instance a user is directed to that application server has access to its session state. The interaction between the user and application becomes: User sends request Clustering/load balancing solution routes to application server Application server receives request, looks up session in database Application server processes request, creates response Application server stores updated session in database Application server returns response If a single database is problematic (because it is a single point of failure) then multicasting or other replication techniques can be used to implement a dual-database architecture. This is somewhat inefficient, but far less so than doing the same at the application server layer. PERSISTENCE-BASED LOAD BALANCING It is often the case that the replication of session state is implemented in response to wonky application behavior occurring only when the application is deployed in a scalable environment, a.k.a a load balancing solution is introduced into the architecture. This is almost always because the application requires tight-coupling between user and session and the load balancing is incorrectly configured to support this requirement. Almost every load balancing solution – hardware, software, virtual network appliance, infrastructure service – is capable of supporting persistence, a.k.a. sticky sessions. This solution requires, however, that the load balancing solution of choice be configured to support the persistence. Persistence (also sometimes referred to as “server affinity” when implemented by a clustering solution) can be configured in a number of ways. The most common configuration is to leverage the automated session IDs generated by application servers, e.g. PHPSESSIONID, ASPSESSIONID. These ids are contained in the HTTP headers and are, as a matter of fact, how the application server “finds” the appropriate session for any given user’s request. The load balancer intercepts every request (it does anyway) and performs the same type of lookup on its own session table (which is much, much higher capacity than an application server and leverages the same high-performance lookups used to store connection and network session tables) and routes the user to the appropriate application server based on the session ID. The interaction between the user and application becomes: User sends request Clustering/load balancing solution finds, if existing, the session-app server mapping. If it does not, it chooses the application server based on the load balancing algorithm and configured parameters Application server receives request, Application server processes request, creates response Application server returns response Clustering/load balancing solution creates the session-app server mapping if it did not already exist Persistence can generally be based on any data in the HTTP header or payload, but using the automatically generated session ids tends to be the most common implementation. YOUR INFRASTRUCTURE, GIVE IT TO ME Now, it may be the case when the multicasting architecture is the right one. It is impossible to say it’s never the right solution because there are always applications and specific scenarios in which an architecture that may not be a good idea in general is, in fact, the right solution. It is likely the case, however, in most situations that it is not the right solution and has more than likely been implemented as a workaround in response to problems with application behavior when moving through a staged development environment. This is one of the best reasons why the use of a virtual edition of your production load balancing solution should be encouraged in development environments. The earlier a holistic strategy to application design and architecture can be employed the fewer complications will be experienced when the application moves into the production environment. Leveraging a virtual version of your load balancing solution during the early stages of the development lifecycle can also enable developers to become familiar with production-level infrastructure services such that they can employ a holistic, architectural approach to solving application issues. See, it’s not always because developers don’t have the know how, it’s because they don’t have access to the tools during development and therefore can’t architect a complete solution. I recall a developer’s plaintive query after a keynote at [the now defunct] SD West conference a few years ago that clearly indicated a reluctance to even ask the network team for access to their load balancing solution to learn how to leverage its services in application development because he knew he would likely be denied. Network and application delivery network pros should encourage the use of and tinkering with virtual versions of application delivery controllers/load balancers in the application development environment as much as possible if they want to reduce infrastructure and application architectural-related issues from cropping up during production deployment. A greater understanding of application-infrastructure interaction will enable more efficient, higher performing applications in general and reduce the operational expenses associated with deploying applications that use inefficient methods such as replication of session state to address application architectural constraints. Related blogs & articles: Applying Scalability Patterns to Infrastructure Architecture Scalability Only One Half the Reliability Equation Service Virtualization Helps Localize Impact of Elastic Scalability Web 2.0: Integration, APIs, and Scalability Automating scalability and high availability services To Take Advantage of Cloud Computing You Must Unlearn, Luke. Scalability with multiple networks for Virtual Servers ... Cloud Lets You Throw More Hardware at the Problem Faster And That, Young Cloudwalker, Is Why You Fail594Views0likes0Comments