database

15 Topics

F5 Friday: Load Balancing MySQL with F5 BIG-IP
Scaling MySQL just got a whole lot easier load balancing MySQL – any database, really – is not a trivial task. Generally speaking one does not simply round robin your way through a cluster of MySQL databases as a means to achieve scalability. It is databases, in fact, that have driven a wide variety of scalability patterns such as sharding and partitioning to achieve the ultimate goal of high-performance and scalability simultaneously. Unfortunately, most folks don’t architect their applications with scalability in mind. A single database is all that’s necessary at first, and because of the way in which the application interacts with the database, it doesn’t make sense to code in support for multiple database instances, such as is often implemented with a MySQL master-slave cluster. That’s because the application has to actually open a connection to the database in question. If you’re only starting with one database, you really can’t code in a connection to a separate instance. Eventually that application’s usage grows and the demands upon the database require a more scalable approach. Enter the MySQL master/slave relationship. A typical configuration is to maintain the master as the “write” database, i.e. all updates and/or inserts must use the master, while the slave instance is used as a “read only” instance. Obviously this means the application code must be changed to support this kind of functional sharding. Unless you leverage network server virtualization from a load balancing service capable of acting as a full-proxy at layer 7 (application) like BIG-IP. This solution leverages iRules to implement database load balancing. While this specific example is designed to perform the common functional sharding pattern of read-write separation for a master-slave MySQL cluster, the flexibility of iRules is such that other architectural solutions can easily be designed using the same basic functions. Location based sharding is another popular means of scaling databases, and using the GeoLocation capabilities of BIG-IP along with iRules to inspect and route database requests, it should be a fairly trivial architectural task to implement. The ability to further extend sharding or other distribution methodologies for scaling databases without modifying the application itself is a huge bonus for both developers and operations. By decoupling the application from the database, it provides a more flexibility set of scalability domains in which technology targeted scalability strategies can be leveraged independent of the other layers. This is an important facet of agile infrastructure architecture and should not be underestimated as a benefit of network server virtualization. MySQL Load Balancing Resources: MySQL Proxy iRule MySQL Proxy iApp (deployment package for BIG-IP v11) The Full-Proxy Data Center Architecture Infrastructure Scalability Pattern: Sharding Streams Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type IT as a Service: A Stateless Infrastructure Architecture Model F5 Friday: Platform versus Product At the Intersection of Cloud and Control… What is a Strategic Point of Control Anyway? All F5 Friday Posts on DevCentral Why Single-Stack Infrastructure Sucks
Lori_MacVittie
Dec 09, 2011 Place Technical Articles
3KViews
0likes
0Comments
The Challenges of SQL Load Balancing
#infosec #iam load balancing databases is fraught with many operational and business challenges. While cloud computing has brought to the forefront of our attention the ability to scale through duplication, i.e. horizontal scaling or “scale out” strategies, this strategy tends to run into challenges the deeper into the application architecture you go. Working well at the web and application tiers, a duplicative strategy tends to fall on its face when applied to the database tier. Concerns over consistency abound, with many simply choosing to throw out the concept of consistency and adopting instead an “eventually consistent” stance in which it is assumed that data in a distributed database system will eventually become consistent and cause minimal disruption to application and business processes. Some argue that eventual consistency is not “good enough” and cite additional concerns with respect to the failure of such strategies to adequately address failures. Thus there are a number of vendors, open source groups, and pundits who spend time attempting to address both components. The result is database load balancing solutions. For the most part such solutions are effective. They leverage master-slave deployments – typically used to address failure and which can automatically replicate data between instances (with varying levels of success when distributed across the Internet) – and attempt to intelligently distribute SQL-bound queries across two or more database systems. The most successful of these architectures is the read-write separation strategy, in which all SQL transactions deemed “read-only” are routed to one database while all “write” focused transactions are distributed to another. Such foundational separation allows for higher-layer architectures to be implemented, such as geographic based read distribution, in which read-only transactions are further distributed by geographically dispersed database instances, all of which act ultimately as “slaves” to the single, master database which processes all write-focused transactions. This results in an eventually consistent architecture, but one which manages to mitigate the disruptive aspects of eventually consistent architectures by ensuring the most important transactions – write operations – are, in fact, consistent. Even so, there are issues, particularly with respect to security. MEDIATION inside the APPLICATION TIERS Generally speaking mediating solutions are a good thing – when they’re external to the application infrastructure itself, i.e. the traditional three tiers of an application. The problem with mediation inside the application tiers, particularly at the data layer, is the same for infrastructure as it is for software solutions: credential management. See, databases maintain their own set of users, roles, and permissions. Even as applications have been able to move toward a more shared set of identity stores, databases have not. This is in part due to the nature of data security and the need for granular permission structures down to the cell, in some cases, and including transactional security that allows some to update, delete, or insert while others may be granted a different subset of permissions. But more difficult to overcome is the tight-coupling of identity to connection for databases. With web protocols like HTTP, identity is carried along at the protocol level. This means it can be transient across connections because it is often stuffed into an HTTP header via a cookie or stored server-side in a session – again, not tied to connection but to identifying information. At the database layer, identity is tightly-coupled to the connection. The connection itself carries along the credentials with which it was opened. This gives rise to problems for mediating solutions. Not just load balancers but software solutions such as ESB (enterprise service bus) and EII (enterprise information integration) styled solutions. Any device or software which attempts to aggregate database access for any purpose eventually runs into the same problem: credential management. This is particularly challenging for load balancing when applied to databases. LOAD BALANCING SQL To understand the challenges with load balancing SQL you need to remember that there are essentially two models of load balancing: transport and application layer. At the transport layer, i.e. TCP, connections are only temporarily managed by the load balancing device. The initial connection is “caught” by the Load balancer and a decision is made based on transport layer variables where it should be directed. Thereafter, for the most part, there is no interaction at the load balancer with the connection, other than to forward it on to the previously selected node. At the application layer the load balancing device terminates the connection and interacts with every exchange. This affords the load balancing device the opportunity to inspect the actual data or application layer protocol metadata in order to determine where the request should be sent. Load balancing SQL at the transport layer is less problematic than at the application layer, yet it is at the application layer that the most value is derived from database load balancing implementations. That’s because it is at the application layer where distribution based on “read” or “write” operations can be made. But to accomplish this requires that the SQL be inline, that is that the SQL being executed is actually included in the code and then executed via a connection to the database. If your application uses stored procedures, then this method will not work for you. It is important to note that many packaged enterprise applications rely upon stored procedures, and are thus not able to leverage load balancing as a scaling option. Depending on your app or how your organization has agreed to protect your data will determine which of these methods are used to access your databases. The use of inline SQL affords the developer greater freedom at the cost of security, increased programming(to prevent the inherent security risks), difficulty in optimizing data and indices to adapt to changes in volume of data, and deployment burdens. However there is lively debate on the values of both access methods and how to overcome the inherent risks. The OWASP group has identified the injection attacks as the easiest exploitation with the most damaging impact. This also requires that the load balancing service parse MySQL or T-SQL (the Microsoft Transact Structured Query Language). Databases, of course, are designed to parse these string-based commands and are optimized to do so. Load balancing services are generally not designed to parse these languages and depending on the implementation of their underlying parsing capabilities, may actually incur significant performance penalties to do so. Regardless of those issues, still there are an increasing number of organizations who view SQL load balancing as a means to achieve a more scalable data tier. Which brings us back to the challenge of managing credentials. MANAGING CREDENTIALS Many solutions attempt to address the issue of credential management by simply duplicating credentials locally; that is, they create a local identity store that can be used to authenticate requests against the database. Ostensibly the credentials match those in the database (or identity store used by the database such as can be configured for MSSQL) and are kept in sync. This obviously poses an operational challenge similar to that of any distributed system: synchronization and replication. Such processes are not easily (if at all) automated, and rarely is the same level of security and permissions available on the local identity store as are available in the database. What you generally end up with is a very loose “allow/deny” set of permissions on the load balancing device that actually open the door for exploitation as well as caching of credentials that can lead to unauthorized access to the data source. This also leads to potential security risks from attempting to apply some of the same optimization techniques to SQL connections as is offered by application delivery solutions for TCP connections. For example, TCP multiplexing (sharing connections) is a common means of reusing web and application server connections to reduce latency (by eliminating the overhead associated with opening and closing TCP connections). Similar techniques at the database layer have been used by application servers for many years; connection pooling is not uncommon and is essentially duplicated at the application delivery tier through features like SQL multiplexing. Both connection pooling and SQL multiplexing incur security risks, as shared connections require shared credentials. So either every access to the database uses the same credentials (a significant negative when considering the loss of an audit trail) or we return to managing duplicate sets of credentials – one set at the application delivery tier and another at the database, which as noted earlier incurs additional management and security risks. YOU CAN’T WIN FOR LOSING Ultimately the decision to load balance SQL must be a combination of business and operational requirements. Many organizations successfully leverage load balancing of SQL as a means to achieve very high scale. Generally speaking the resulting solutions – such as those often touted by e-Bay - are based on sound architectural principles such as sharding and are designed as a strategic solution, not a tactical response to operational failures and they rarely involve inspection of inline SQL commands. Rather they are based on the ability to discern which database should be accessed given the function being invoked or type of data being accessed and then use a traditional database connection to connect to the appropriate database. This does not preclude the use of application delivery solutions as part of such an architecture, but rather indicates a need to collaborate across the various application delivery and infrastructure tiers to determine a strategy most likely to maintain high-availability, scalability, and security across the entire architecture. Load balancing SQL can be an effective means of addressing database scalability, but it should be approached with an eye toward its potential impact on security and operational management. What are the pros and cons to keeping SQL in Stored Procs versus Code Mission Impossible: Stateful Cloud Failover Infrastructure Scalability Pattern: Sharding Streams The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month… SQL injection – past, present and future True DDoS Stories: SSL Connection Flood Why Layer 7 Load Balancing Doesn’t Suck Web App Performance: Think 1990s.
Lori_MacVittie
Apr 02, 2012 Place Technical Articles
2.6KViews
0likes
1Comment
Sessions, Sessions Everywhere
If you’re replicating session state across application servers you probably need to rethink your strategy. There’s other options – more efficient options – than wasting RAM and, ultimately, money. Although the discussion of Oracle’s “cloud in a box” announcement at OpenWorld dominated much of the tweet-stream this week there were other discussions going on that proved to not only interesting but a good reminder of how cloud computing has brought to the fore the importance of architecture. Foremost in my mind was what started as a lamentation on the fact that Amazon EC2 does not support multicasting that evolved into a discussion on why that would cause grief for those deploying applications in the environment. Remember that multicast is essentially spraying the same data to a group of endpoints and is usually leveraged for streaming media topologies: In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source creating copies automatically in other network elements, such as routers, only when the topology of the network requires it. -- Wikipedia, multicast As it turns out, a primary reason behind the need for multicasting in the application architecture revolves around the mirroring of session state across a pool of application servers. Yeah, you heard that right – mirroring session state across a pool of application servers. The first question has to be: why? What is it about an application that requires this level of duplication? MULTICASTING for SESSIONS There are three reasons why someone would want to use multicasting to mirror session state across a pool of application servers. There may be additional reasons that aren’t as common and if so, feel free to share. The application relies on session state and, when deployed in a load balanced environment, broke because the tight-coupling between user and session state was not respected by the Load balancer. This is a common problem when moving from dev/qa to production and is generally caused by using a load balancing algorithm without enabling persistence, a.k.a. sticky sessions. The application requires high-availability that necessitates architecting a stateful-failover architecture. By mirroring sessions to all application servers if one fails (or is decommissioned in an elastic environment) another can easily re-establish the coupling between the user and their session. This is not peculiar to application architecture – load balancers and application delivery controllers mirror their own “session” state across redundant pairs to achieve a stateful failover architecture as well. Some applications, particularly those that are collaborative in nature (think white-boarding and online conferences) “spray” data across a number of sessions in order to enable the sharing in real time aspect of the application. There are other architectural choices that can achieve this functionality, but there are tradeoffs to all of them and in this case it is simply one of several options. THE COST of REPLICATING SESSIONS With the exception of addressing the needs of collaborative applications (and even then there are better options from an architectural point of view) there are much more efficient ways to handle the tight-coupling of user and session state in an elastic or scaled-out environment. The arguments against multicasting session state are primarily around resource consumption, which is particularly important in a cloud computing environment. Consider that the typical session state is 3-200 KB in size (Session State: Beyond Soft State ). Remember that if you’re mirroring every session across an entire cluster (pool) of application servers, that each server must use memory to store that session. Each mirrored session, then, is going to consume resources on every application server. Every application server has, of course, a limited amount of memory it can utilize. It needs that memory for more than just storing session state – it must also store connection tables, its own configuration data, and of course it needs memory in which to execute application logic. If you consume a lot of the available memory storing the session state from every other application server, you are necessarily reducing the amount of memory available to perform other important tasks. This reduces the capacity of the server in terms of users and connections, it reduces the speed with which it can execute application logic (which translates into reduced response times for users), and it operates on a diminishing returns principle. The more application servers you need to scale – and you’ll need more, more frequently, using this technique – the less efficient each added application server becomes because a good portion of its memory is required simply to maintain session state of all the other servers in the pool. It is exceedingly inefficient and, when leveraging a public cloud computing environment, more expensive. It’s a very good example of the diseconomy of scale associated with traditional architectures – it results in a “throw more ‘hardware’ at the problem, faster” approach to scalability. BETTER ARCHITECTURAL SOLUTIONS There are better architectural solutions to maintaining session state for every user. SHARED DATABASE Storing session state in a shared database is a much more efficient means of mirroring session state and allows for the same guarantees of consistency when experiencing a failure. If session state is stored in a database then regardless of which application server instance a user is directed to that application server has access to its session state. The interaction between the user and application becomes: User sends request Clustering/load balancing solution routes to application server Application server receives request, looks up session in database Application server processes request, creates response Application server stores updated session in database Application server returns response If a single database is problematic (because it is a single point of failure) then multicasting or other replication techniques can be used to implement a dual-database architecture. This is somewhat inefficient, but far less so than doing the same at the application server layer. PERSISTENCE-BASED LOAD BALANCING It is often the case that the replication of session state is implemented in response to wonky application behavior occurring only when the application is deployed in a scalable environment, a.k.a a load balancing solution is introduced into the architecture. This is almost always because the application requires tight-coupling between user and session and the load balancing is incorrectly configured to support this requirement. Almost every load balancing solution – hardware, software, virtual network appliance, infrastructure service – is capable of supporting persistence, a.k.a. sticky sessions. This solution requires, however, that the load balancing solution of choice be configured to support the persistence. Persistence (also sometimes referred to as “server affinity” when implemented by a clustering solution) can be configured in a number of ways. The most common configuration is to leverage the automated session IDs generated by application servers, e.g. PHPSESSIONID, ASPSESSIONID. These ids are contained in the HTTP headers and are, as a matter of fact, how the application server “finds” the appropriate session for any given user’s request. The load balancer intercepts every request (it does anyway) and performs the same type of lookup on its own session table (which is much, much higher capacity than an application server and leverages the same high-performance lookups used to store connection and network session tables) and routes the user to the appropriate application server based on the session ID. The interaction between the user and application becomes: User sends request Clustering/load balancing solution finds, if existing, the session-app server mapping. If it does not, it chooses the application server based on the load balancing algorithm and configured parameters Application server receives request, Application server processes request, creates response Application server returns response Clustering/load balancing solution creates the session-app server mapping if it did not already exist Persistence can generally be based on any data in the HTTP header or payload, but using the automatically generated session ids tends to be the most common implementation. YOUR INFRASTRUCTURE, GIVE IT TO ME Now, it may be the case when the multicasting architecture is the right one. It is impossible to say it’s never the right solution because there are always applications and specific scenarios in which an architecture that may not be a good idea in general is, in fact, the right solution. It is likely the case, however, in most situations that it is not the right solution and has more than likely been implemented as a workaround in response to problems with application behavior when moving through a staged development environment. This is one of the best reasons why the use of a virtual edition of your production load balancing solution should be encouraged in development environments. The earlier a holistic strategy to application design and architecture can be employed the fewer complications will be experienced when the application moves into the production environment. Leveraging a virtual version of your load balancing solution during the early stages of the development lifecycle can also enable developers to become familiar with production-level infrastructure services such that they can employ a holistic, architectural approach to solving application issues. See, it’s not always because developers don’t have the know how, it’s because they don’t have access to the tools during development and therefore can’t architect a complete solution. I recall a developer’s plaintive query after a keynote at [the now defunct] SD West conference a few years ago that clearly indicated a reluctance to even ask the network team for access to their load balancing solution to learn how to leverage its services in application development because he knew he would likely be denied. Network and application delivery network pros should encourage the use of and tinkering with virtual versions of application delivery controllers/load balancers in the application development environment as much as possible if they want to reduce infrastructure and application architectural-related issues from cropping up during production deployment. A greater understanding of application-infrastructure interaction will enable more efficient, higher performing applications in general and reduce the operational expenses associated with deploying applications that use inefficient methods such as replication of session state to address application architectural constraints. Related blogs & articles: Applying Scalability Patterns to Infrastructure Architecture Scalability Only One Half the Reliability Equation Service Virtualization Helps Localize Impact of Elastic Scalability Web 2.0: Integration, APIs, and Scalability Automating scalability and high availability services To Take Advantage of Cloud Computing You Must Unlearn, Luke. Scalability with multiple networks for Virtual Servers ... Cloud Lets You Throw More Hardware at the Problem Faster And That, Young Cloudwalker, Is Why You Fail
Lori_MacVittie
Sep 22, 2010 Place Technical Articles
692Views
0likes
0Comments
Databases in the Cloud Revisited
A few of us were talking on Facebook about high speed rail (HSR) and where/when it makes sense the other day, and I finally said that it almost never does. Trains lost out to automobiles precisely because they are rigid and inflexible, while population densities and travel requirements are highly flexible. That hasn’t changed since the early 1900s, and isn’t likely to in the future, so we should be looking at different technologies to answer the problems that HSR tries to address. And since everything in my universe is inspiration for either blogging or gaming, this lead me to reconsider the state of cloud and the state of cloud databases in light of synergistic technologies (did I just use “synergistic technologies in a blog? Arrrggghhh…). There are several reasons why your organization might be looking to move out of a physical datacenter, or to have a backup datacenter that is completely virtual. Think of the disaster in Japan or hurricane Katrina. In both cases, having even the mission critical portions of your datacenter replicated to the cloud would keep your organization online while you recovered from all of the other very real issues such a disaster creates. In other cases, if you are a global organization, the cost of maintaining your own global infrastructure might well be more than utilizing a global cloud provider for many services… Though I’ve not checked, if I were CIO of a global organization today, I would be looking into it pretty closely, particularly since this option should continue to get more appealing as technology continues to catch up with hype. Today though, I’m going to revisit databases, because like trains, they are in one place, and are rigid. If you’ve ever played with database Continuous Data Protection or near-real-time replication, you know this particular technology area has issues that are only now starting to see technological resolution. Over the last year, I have talked about cloud and remote databases a few times, talking about early options for cloud databases, and mentioning Oracle Goldengate – or praising Goldengate is probably more accurate. Going to the west in the US? HSR is not an option. The thing is that the options get a lot more interesting if you have Goldengate available. There are a ton of tools, both integral to database systems and third-party that allow you to encrypt data at rest these days, and while it is not the most efficient access method, it does make your data more protected. Add to this capability the functionality of Oracle Goldengate – or if you don’t need heterogeneous support, any of the various database replication technologies available from Oracle, Microsoft, and IBM, you can seamlessly move data to the cloud behind the scenes, without interfering with your existing database. Yes, initial configuration of database replication will generally require work on the database server, but once configured, most of them run without interfering with the functionality of the primary database in any way – though if it is one that runs inside the RDBMS, remember that it will use up CPU cycles at the least, and most will work inside of a transaction so that they can insure transaction integrity on the target database, so know your solution. Running inside the primary transaction is not necessary, and for many uses may not even be desirable, so if you want your commits to happen rapidly, something like Goldengate that spawns a separate transaction for the replica are a good option… Just remember that you then need to pay attention to alerts from the replication tool so that you don’t end up with successful transactions on the primary not getting replicated because something goes wrong with the transaction on the secondary. But for DBAs, this is just an extension of their daily work, as long as someone is watching the logs. With the advent of Goldengate, advanced database encryption technology, and products like our own BIG-IP WOM, you now have the ability to drive a replica of your database into the cloud. This is certainly a boon for backup purposes, but it also adds an interesting perspective to application mobility. You can turn on replication from your data center to the cloud or from cloud provider A to cloud provider B, then use VMotion to move your application VMS… And you’re off to a new location. If you think you’ll be moving frequently, this can all be configured ahead of time, so you can flick a switch and move applications at will. You will, of course, have to weigh the impact of complete or near-complete database encryption against the benefits of cloud usage. Even if you use the adaptability of the cloud to speed encryption and decryption operations by distributing them over several instances, you’ll still have to pay for that CPU time, so there is a balancing act that needs some exploration before you’ll be certain this solution is a fit for you. And at this juncture, I don’t believe putting unencrypted corporate data of any kind into the cloud is a good idea. Every time I say that, it angers some cloud providers, but frankly, cloud being new and by definition shared resources, it is up to the provider to prove it is safe, not up to us to take their word for it. Until then, encryption is your friend, both going to/from the cloud and at rest in the cloud. I say the same thing about Cloud Storage Gateways, it is just a function of the current state of cloud technology, not some kind of unreasoning bias. So the key then is to make sure your applications are ready to be moved. This is actually pretty easy in the world of portable VMs, since the entire VM will pick up and move. The only catch is that you need to make sure users can get to the application at the new location. There are a ton of Global DNS solutions like F5’s BIG-IP Global Traffic Manager that can get your users where they need to be, since your public-facing IPs will be changing when moving from organization to organization. Everything else should be set, since you can use internal IP addresses to communicate between your application VMs and database VMs. Utilizing a some form of in-flight encryption and some form of acceleration for your database replication will round out the solution architecture, and leave you with a road map that looks more like a highway map than an HSR map. More flexible, more pervasive.
Don_MacVittie_1
May 24, 2011 Place Technical Articles
455Views
0likes
0Comments
Cloud Computing: Will data integration be its Achilles Heel?
Wesley: Now, there may be problems once our app is in the cloud. Inigo: I'll say. How do I find the data? Once I do, how do I integrate it with the other apps? Once I integrate it, how do I replicate it? If you remember this somewhat altered scene from the Princess Bride, you also remember that no one had any answers for Inigo. That's apropos of this discussion, because no one has any good answers for this version of Inigo either. And no, a holocaust cloak is not going to save the day this time. If you've been considering deploying applications in a public cloud, you've certainly considered what must be the Big Hairy Question regarding cloud computing: how do I get at my data? There's very little discussion about this topic, primarily because at this point there's no easy answer. Data stored in the cloud is not easily accessible for integration with applications not residing in the cloud, which can definitely be a roadblock to adopting public cloud computing. Stacey Higginbotham at GigaOM had a great post on the topic of getting data into the cloud, and while the conclusion that bandwidth is necessary is also applicable to getting your data out of the cloud, the details are left in your capable hands. We had this discussion when SaaS (Software as a Service) first started to pick up steam. If you're using a service like salesforce.com to store business critical data, how do you integrate that back into other applications that may need it? Web services were the first answer, followed by integration appliances and solutions that included custom-built adapters for salesforce.com to more easily enable access and integration to data stored "out there", in the cloud. Amazon offers URL-based and web services access to data stored in its SimpleDB offering, but that doesn't help folks who are using Oracle, SQL Server, or MySQL offerings in the cloud. And SimpleDB is appropriately named; it isn't designed to be an enterprise class service - caveat emptor is in full force if you rely upon it for critical business data. RDBMS' have their own methods of replication and synchronization, but mirroring and real-time replication methods require a lot of bandwidth and very low latency connections - something not every organization can count on having. Of course you can always deploy custom triggers and services that automatically replicate back into the local data center, but that, too, is problematic depending on bandwidth availability and accessibility of applications and databases inside the data center. The reverse scenario is much more likely, with a daemon constantly polling the cloud computing data and pulling updates back into the data center. You can also just leave that data out there in the cloud, implement, or take advantage of if they exist, service-based access to the data and integrate it with business processes and applications inside the data center. You're relying on the availability of the cloud, the Internet, and all the infrastructure in between, but like the solution for integrating with salesforce.com and other SaaS offerings, this is likely the best of a set of "will have to do" options. The issue of data and its integration has not yet raised its ugly head, mostly because very few folks are moving critical business applications into the cloud and admittedly, cloud computing is still in its infancy. But even non-critical applications are going to use or create data, and that data will, invariably, become important or need to be accessed by folks in the organization, which means access to that data will - probably sooner rather than later - become a monkey on the backs of IT. The availability of and ease of access to data stored in the public cloud for integration, data mining, business intelligence, and reporting - all common enterprise application use of data - will certainly affect adoption of cloud computing in general. The benefits of saving dollars on infrastructure (management, acquisition, maintenance) aren't nearly as compelling a reason to use the cloud when those savings would quickly be eaten up by the extra effort necessary to access and integrate data stored in the cloud. Related articles by Zemanta SQL-as-a-Service with CloudSQL bridges cloud and premises Amazon SimpleDB ready for public use Blurring the functional line - Zoho CloudSQL merges on-site and on-cloud As a Service: The many faces of the cloud A comparison of major cloud-computing providers (Amazon, Mosso, GoGrid) Public Data Goes on Amazon's Cloud
Lori_MacVittie
Dec 09, 2008 Place Technical Articles
413Views
0likes
2Comments
What Do Database Connectivity Standards and the Pirate’s Code Have in Common?
A: They’re both more what you’d call “guidelines” than actual rules. An almost irrefutable fact of application design today is the need for a database, or at a minimum a data store – i.e. a place to store the data generated and manipulated by the application. A second reality is that despite the existence of database access “standards”, no two database solutions support exactly the same syntax and protocols. Connectivity standards like JDBC and ODBC exist, yes, but like SQL they are variable, resulting in just slightly different enough implementations to effectively cause vendor lock-in at the database layer. You simply can’t take an application developed to use an Oracle database and point it at a Microsoft or IBM database and expect it to work. Life’s like that in the development world. Database connectivity “standards” are a lot like the pirate’s Code, described well by Captain Barbossa in Pirates of the Carribbean as “more what you’d call ‘guidelines’ than actual rules.” It shouldn’t be a surprise, then, to see the rise of solutions that address this problem, especially in light of an increasing awareness of (in)compatibility at the database layer and its impact on interoperability, particularly as it relates to cloud computing . Forrester Analyst Noel Yuhanna recently penned a report on what is being called Database Compatibility Layers (DCL). The focus of DCL at the moment is on migration across database platforms because, as pointed out by Noel, they’re expensive, time consuming and very costly. Database migrations have always been complex, time-consuming, and costly due to proprietary data structures and data types, SQL extensions, and procedural languages. It can take up to several months to migrate a database, depending on database size, complexity, and usage of these proprietary features. A new technology has recently emerged for solving this problem: the database compatibility layer, a database access layer that supports another database management system’s (DBMS’s) proprietary extensions natively, allowing existing applications to access the new database transparently. -- Simpler Database Migrations Have Arrived (Forrester Research Report) Anecdotally, having been on the implementation end of such a migration I can’t disagree with the assessment. Whether the right answer is to sit down and force some common standards on database connectivity or build a compatibility layer is a debate for another day. Suffice to say that right now the former is unlikely given the penetration and pervasiveness of existing database connectivity, so the latter is probably the most efficient and cost-effective solution. After all, any changes in the core connectivity would require the same level of application modification as a migration; not an inexpensive proposition at all. According to Forrester a Database Compatibility Layer (DCL) is a “database layer that supports another DBMS’s proprietary SQL extensions, data types, and data structures natively. Existing applications can transparently access the newly migrated database with zero or minimal changes.” By extension, this should also mean that an application could easily access one database and a completely different one using the same code base (assuming zero changes, of course). For the sake of discussion let’s assume that a DCL exists that exhibits just that characteristic – complete interoperability at the connectivity layer. Not just for migration, which is of course the desired use, but for day to day use. What would that mean for cloud computing providers – both internal and external? ENABLING IT as a SERVICE Based on our assumption that a DCL exists and is implemented by multiple database solution vendors, a veritable cornucopia of options becomes a lot more available for moving enterprise architectures toward IT as a Service than might be at first obvious. Consider that applications have variable needs in terms of performance, redundancy, disaster recovery, and scalability. Some applications require higher performance, others just need a nightly or even weekly backup and some, well, some are just not that important that you can’t use other IT operations backups to restore if something goes wrong. In some cases the applications might have varying needs based on the business unit deploying them. The same application used by finance, for example, might have different requirements than the same one used by developers. How could that be? Because the developers may only be using that application for integration or testing while finance is using it for realz. It happens. What’s more interesting, however, is how a DCL could enable a more flexible service-oriented style buffet of database choices, especially if the organization used different database solutions to support different transactional, availability, and performance goals. If a universal DCL (or near universal at least) existed, business stakeholders – together with their IT counterparts – could pick and choose the database “service” they wished to employ based on not only the technical characteristics and operational support but also the costs and business requirements. It would also allow them to “migrate” over time as applications became more critical, without requiring a massive investment in upgrading or modifying the application to support a different back-end database. Obviously I’m picking just a few examples that may or may not be applicable to every organization. The bigger thing here, I think, is the flexibility in architecture and design that is afforded by such a model that balances costs with operational characteristics. Monitoring of database resource availability, too, could be greatly simplified from such a layer, providing solutions that are natively supported by upstream devices responsible for availability at the application layer, which ultimately depends on the database but is often an ignored component because of the complexity currently inherent in supporting such a varied set of connectivity standards. It should also be obvious that this model would work for a PaaS-style provider who is not tied to any given database technology. A PaaS-style vendor today must either invest effort in developing and maintaining a services layer for database connectivity or restrict customers to a single database service. The latter is fine if you’re creating a single-stack environment such as Microsoft Azure but not so fine if you’re trying to build a more flexible set of offerings to attract a wider customer base. Again, same note as above. Providers would have a much more flexible set of options if they could rely upon what is effectively a single database interface regardless of the specific database implementation. More importantly for providers, perhaps, is the migration capability noted by the Forrester report in the first place, as one of the inhibitors of moving existing applications to a cloud computing provider is support for the same database across both enterprise and cloud computing environments. While services layers are certainly a means to the same end, such layers are not universally supported. There’s no “standard” for them, not even a set of best practice guidelines, and the resulting application code suffers exactly the same issues as with the use of proprietary database connectivity: lock in. You can’t pick one up and move it to the cloud, or another database without changing some code. Granted, a services layer is more efficient in this sense as it serves as an architectural strategic point of control at which connectivity is aggregated and thus database implementation and specifics are abstracted from the application. That means the database can be changed without impacting end-user applications, only the services layer need be modified. But even that approach is problematic for packaged applications that rely upon database connectivity directly and do not support such service layers. A DCL, ostensibly, would support packaged and custom applications if it were implemented properly in all commercial database offerings. CONNECTIVITY CARTEL And therein lies the problem – if it were implemented properly in all commercial database offerings. There is a risk here of a connectivity cartel arising, where database vendors form alliances with other database vendors to support a DCL while “locking out” vendors whom they have decided do not belong. Because the DCL depends on supporting “proprietary SQL extensions, data types, and data structures natively” there may be a need for database vendors to collaborate as a means to properly support those proprietary features. If collaboration is required, it is possible to deny that collaboration as a means to control who plays in the market. It’s also possible for a vendor to slightly change some proprietary feature in order to “break” the others’ support. And of course the sheer volume of work necessary for a database vendor to support all other database vendors could overwhelm smaller database vendors, leaving them with no real way to support everyone else. The idea of a DCL is an interesting one, and it has its appeal as a means to forward compatibility for migration – both temporary and permanent. Will it gain in popularity? For the latter, perhaps, but for the former? Less likely. The inherent difficulties and scope of supporting such a wide variety of databases natively will certainly inhibit any such efforts. Solutions such as a REST-ful interface, a la PHP REST SQL or a JSON-HTTP based solution like DBSlayer may be more appropriate in the long run if they were to be standardized. And by standardized I mean standardized with industry-wide and agreed upon specifications. Not more of the “more what you’d call ‘guidelines’ than actual rules” that we already have. Database Migrations are Finally Becoming Simpler Enterprise Information Integration | Data Without Borders Review: EII Suites | Don't Fear the Data The Database Tier is Not Elastic Infrastructure Scalability Pattern: Sharding Sessions F5 Friday: THE Database Gets Some Love The Impossibility of CAP and Cloud Sessions, Sessions Everywhere Cloud-Tiered Architectural Models are Bad Except When They Aren’t
Lori_MacVittie
Feb 23, 2011 Place Technical Articles
364Views
0likes
1Comment
Defense in Depth in Context
In the days of yore, a military technique called Defense-in-Depth was used to protect kingdoms, castles, and other locations where you might be vulnerable to attack. It's a layered defense strategy where the attacker would have to breach several layers of protection to finally reach the intended target. It allows the defender to spread their resources and not put all of the protection in one location. It's also a multifaceted approach to protection in that there are other mechanisms in place to help; and it's redundant so if a component failed or is compromised, there are others that are ready to step in to keep the protection in tack. Information technology also recognizes this technique as one of the 'best practices' when protecting systems. The infrastructure and systems they support are fortified with a layered security approach. There are firewalls at the edge and often, security mechanisms at every segment of the network. Circumvent one, the next layer should net them. There is one little flaw with the Defense-in-Depth strategy - it is designed to slow down attacks, not necessarily stop them. It gives you time to mobilize a counter-offensive and it's an expensive and complex proposition if you are an attacker. It's more of a deterrent than anything and ultimately, the attacker could decide that the benefits of continuing the attack outweigh the additional costs. In the digital world, it is also interpreted as redundancy. Place multiple iterations of a defensive mechanism within the path of the attacker. The problem is that the only way to increase the cost and complexity for the attacker is to raise the cost and complexity of your own defenses. Complexity is the kryptonite of good security and what you really need is security based on context. Context takes into account the environment or conditions surrounding an event to make an informed decision about how to apply security. This is especially true when protecting a database. Database firewalls are critical components to protecting your valuable data and can stop a SQL Injection attack, for instance, in an instant. What they lack is the ability to decipher contextual data like userid, session, cookie, browser type, IP address, location and other meta-data of who or what actually performed the attack. While it can see that a particular SQL query is invalid, it cannot decipher who made the request. Web Application Firewalls on the other hand can gather user side information since many of its policy decisions are based on the user's context. A WAF monitors every request and response from the browser to the web application and consults a policy to determine if the action and data are allowed. It uses such information as user, session, cookie and other contextual data to decide if it is a valid request. Independent technologies that protect against web attacks or database attacks are available, but they have not been linked to provide unified notification and reporting. Now imagine if your database was protected by a layered, defense-in-depth architecture along with the contextual information to make informed, intelligent decisions about database security incidents. The integration of BIG-IP ASM with Oracle's Database Firewall offers the database protection that Oracle is known for and the contextual intelligence that is baked into every F5 solution. Unified reporting for both the application firewall and database firewall provides more convenient and comprehensive security monitoring. Integration between the two security solutions offers a holistic approach to protecting web and database tiers from SQL injection type of attacks. The integration gives you the layered protection many security professionals recognize as a best practice, plus the contextual information needed to make intelligent decisions about what action to take. This solution provides improved SQL injection protection to F5 customers and correlated reporting for richer forensic information on SQL injection attacks to Oracle database customers. It’s an end-to-end web application and database security solution to protect data, customers, and their businesses. ps Resources: F5 Joins with Oracle to Offer Enhanced Security for Web-Based Database Applications Security for Web-Based Database Applications Enhanced With F5 and Oracle Product Integration Using Oracle Database Firewall with BIG-IP ASM F5 Networks Adds To Oracle Database Oracle Database Firewall BIG-IP Application Security Manager The “True Security Company” Red Herring F5 Friday: Two Heads are Better Than One
PSilva
Mar 16, 2011 Place Technical Articles
346Views
0likes
0Comments
The Database Tier is Not Elastic
It is the database tier and its unique characteristics that ultimate determine where an application will be deployed. cloud computing is mostly about “elasticity.” The extraction and contraction of resources based on demand. It is the contraction of resources which is oft times forgotten but without it, cloud computing and highly dynamic, virtualized infrastructures are little more than seamless capacity growth engines. For web and application architectural tiers, the contraction of resources is as much a requirement to realize the benefits of shared, dynamic capacity as the ability to rapidly expand. But in the database tier, the application data layer, contraction is more a contradiction than anything else. WHAT COMES UP USUALLY COMES DOWN Elasticity in applications is a good thing. It is important to the overall success rate of cloud computing and dynamic infrastructure initiatives to remember that “what comes up, must come down” – especially in relation to provisioned compute resources. Applications should expand their resource consumption to meet demand, but when demand wanes, so too should their resource consumption rates. By spreading compute resources around the various applications that need them in a dynamic way, based on demand, we achieve peak efficiency and make the most of our capital expenditures. Such architectural approaches allow us to allocate “temporary” compute resources when necessary from cloud computing environments external to the organization, and release them when not necessary. This is all well and good, except when we’re talking about the database. Databases employ a number of techniques by which they can improve their performance, and most of them involve complex caching and pooling strategies that make use of lots and lots of RAM. At the database tier, RAM may increase, but it rarely decreases. It’s a different kind of workload than web and application servers, which can easily be scaled out using parallel processing strategies. Many, many copies of the same code can execute in isolated chunks around the data center because they do not need access to a centralized store of information about all the sessions that may be occurring at the same time. In order to maintain consistency, databases use indexes and locks and other computational techniques to manage access to data, especially in the case of modification. This means that even though the code to perform such tasks can be ostensibly executed on multiple copies of a database, the especial data required to ensure consistent operations is contained in a single, contiguous data structure. That data cannot be easily transferred or replicated in real-time to other copies. There is a single data overlord that must maintain a holistic view of the data and therefore must (today) run on a single machine (virtual or iron). That means all access is through a single gateway, and scaling that gateway is generally only possible through the expansion of resources available to the database application. Scale up is the traditional strategy, and until we learn how to share memory blocks across the network in a way that assures consistency we can either bow to the belief that eventual consistency is good enough or that there will be one, ginormous system that continually expands along with data growth. YOU CAN SCALE OUT READ but NOT WRITE It is the unique characteristics of data that result in a quirky architecture that allows us to scale out read but only scale up write. This makes the database tier a lot more complex than perhaps it once was. In the past, a single ginormous server housed a database and it was the only path to data. Today, however, the need for better performance and support for hyperscaling of applications has led to a functional partitioning scheme that separates reads from writes and assumes that eventual consistency is better than non-availability. This does not mean it’s impossible to put a database into an external cloud computing environment. It just means that it’s going to run, 24x7, and scalability cannot necessarily be achieved by scaling out – the traditional means by which a cloud computing environment enables scale. It means that scaling up will require migration, if you haven’t adjusted for future growth to begin with, and that there may be, depending on the cloud computing environment you choose, an upper bounds to your data growth. If you’ve only got X amount of disk and memory available, at some point your database will hit that upper bound and either it will begin to drag down performance or availability or simply be unable to continue growing. Or you’ll need to consider the use of distributed database systems which can scale out by distributing data across multiple database nodes (local or remote) either using replication or duplication. When used over a LAN – low latency, high performing, high bandwidth – the replication and/or duplication required for the master database to manage and maintain its minion databases can be successful. One would assume, then, that the use of distributed database systems in a cloud computing environment would be the appropriate marriage of the two architectural approaches to scalability. However, most enterprise applications existing today – both developed in-house and packaged – do not take advantage of such technology and there exist no standardized means by which a traditional DBMS can be morphed into a DDBMS. Additionally, the replication/duplication of database systems over a WAN – high latency, lower performing, low bandwidth – is problematic for maintaining consistency. Which often means a closed-system, LAN connected only approach to application architecture is the only feasible option. Which puts us right back where we were – with the database tier being upward-bound only, not elastic, and potentially outgrowing the ability of a provider to offer an appropriate level of compute resources to maintain performance and capacity, effectively limiting data growth. Which is not a good thing, because limiting data growth means limiting business growth. DATA GROWTH is AN INDICATOR of BUSINESS SUCCESS It is almost universally true that the growth of data is an indicator of business success. As business grows, so does the customer data. As business grows, so does the user-generated content. As business grows, so do the financial and employee records and e-mail. And, of course, the gigabytes of Power Point presentations and standard operating procedure documents that grow, morph, and are ultimately discarded – but maintained for posterity/reference in the future grow along with the business. Data grows, it doesn’t shrink. There is nothing that so accurately lives up to the “pack rat” mentality as a business. And much of it is stored in databases, which live in the data tier and are increasingly web (and mobile client) enabled. So when we talk about elastic applications we’re really talking just about the applications, not necessarily the data tier. Unless you have employed a sharded architectural approach to enabling long-term growth, you have “THE database” and it’s going to grow and grow and grow and never shrink. It isn’t elastic; the parts of an application that are are the applications that access THE database. It is this “nut” that needs to be cracked for cloud computing to truly become “the” standard for data center architectures. Until we either see DDBMS become the standard for database systems or figure out how to really share compute resources across the LAN such that RAM from multiple machines appears to be a contiguous, locally accessible chunk of memory, the database tier will be the limiting – and deciding – factor in determining how an application is architected and where it might end up residing. Related blogs & articles: Data Center Feng Shui: Normalizing Phased Deployment with Virtualized Network Appliances The Multi-Generational Datacenter: From Toddlers to Teenagers Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type Applying Scalability Patterns to Infrastructure Architecture Sessions, Sessions Everywhere I, Cloud Infrastructure 2.0 + Cloud + IT as a Service = An Architectural Parfait The Battle of Economy of Scale versus Control and Flexibility
Lori_MacVittie
Dec 01, 2010 Place Technical Articles
302Views
0likes
1Comment
CloudFucius’ Road Trip: Oracle Open World
You might not know (not that you really care) that I moved recently. I used to live in San Jose and visiting Moscone for the variety of technology shows held there was an easy drive up the 280 and a nightmare on the 101. Now that I live in Sothern California, it’s a ‘6 of 1, half dozen of the other’ scenario when it comes to ‘car vs. air’ travel time and cost – at least from where I am. Couple weeks ago I did the trip for VMworld and I have to say, it wasn’t too bad and I had a vehicle! So I’ll be doing it again this weekend to attend Oracle OpenWorld 2010 in the City by the bayeeeeaaay. Oracle OpenWorld is the largest conference for Oracle technologists, business users, and partners and F5 Networks will, of course, be there. The exhibits run Monday thru Wednesday and I encourage you to visit Booth #1427 to learn more about all the amazing solutions and deep integration F5 has with Oracle. We’ll be highlighting our Oracle Access Manager integration with BIG-IP, Oracle Data Guard sync over the WAN with F5 BIG-IP, Oracle RMAN replication with F5's BIG-IP WOM, PeopleSoft with F5's BIG-IP Web Acceleration & WAN Optimization, and a few others. We will also be shooting more video including some whiteboard episodes along with the various interviews I like to conduct with partners, customers and F5 employees. Those should be posted throughout the week so either check back here or visit our YouTube Channel for news hot off the floor. There are also a few sessions that have a F5 flavor to it: Maximum Availability Architecture Best Practices: Oracle E-Business Suite R12.1 - This session presents Oracle E-Business Suite maximum availability architecture configuration considerations and best practices for deploying high-availability and disaster recovery solutions. 21Tues – 3:30p Oracle Fusion Middleware 11g: Architecting for Continuous Availability - This session discusses continuous availability features and options available in the Oracle Fusion Middleware infrastructure, how these features can be used to provide continuous availability. 21Tues – 3:30p Top Oracle Database 11g High-Availability Best Practices That DBAs Need to Know - This session shows how Oracle maximum availability architecture best practices have evolved in Oracle Database 11g. Topics include grid infrastructure, single client access name (SCAN), and others. 22Wed – 1:00pm Hope to see you there, if not you’ll see me here. And one from Confucius: The firm, the enduring, the simple, and the modest are near to virtue. ps The CloudFucius Series: Intro, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22
PSilva
Sep 17, 2010 Place Technical Articles
282Views
0likes
0Comments
CloudFucius’ Road Trip: Oracle Open World
You might not know (not that you really care) that I moved recently. I used to live in San Jose and visiting Moscone for the variety of technology shows held there was an easy drive up the 280 and a nightmare on the 101. Now that I live in Sothern California, it’s a ‘6 of 1, half dozen of the other’ scenario when it comes to ‘car vs. air’ travel time and cost – at least from where I am. Couple weeks ago I did the trip for VMworld and I have to say, it wasn’t too bad and I had a vehicle! So I’ll be doing it again this weekend to attend Oracle OpenWorld 2010 in the City by the bayeeeeaaay. Oracle OpenWorld is the largest conference for Oracle technologists, business users, and partners and F5 Networks will, of course, be there. The exhibits run Monday thru Wednesday and I encourage you to visit Booth #1427 to learn more about all the amazing solutions and deep integration F5 has with Oracle. We’ll be highlighting our Oracle Access Manager integration with BIG-IP, Oracle Data Guard sync over the WAN with F5 BIG-IP, Oracle RMAN replication with F5's BIG-IP WOM, PeopleSoft with F5's BIG-IP Web Acceleration & WAN Optimization, and a few others. We will also be shooting more video including some whiteboard episodes along with the various interviews I like to conduct with partners, customers and F5 employees. Those should be posted throughout the week so either check back here or visit our YouTube Channel for news hot off the floor. There are also a few sessions that have a F5 flavor to it: Maximum Availability Architecture Best Practices: Oracle E-Business Suite R12.1 - This session presents Oracle E-Business Suite maximum availability architecture configuration considerations and best practices for deploying high-availability and disaster recovery solutions. 21Tues – 3:30p Oracle Fusion Middleware 11g: Architecting for Continuous Availability - This session discusses continuous availability features and options available in the Oracle Fusion Middleware infrastructure, how these features can be used to provide continuous availability. 21Tues – 3:30p Top Oracle Database 11g High-Availability Best Practices That DBAs Need to Know - This session shows how Oracle maximum availability architecture best practices have evolved in Oracle Database 11g. Topics include grid infrastructure, single client access name (SCAN), and others. 22Wed – 1:00pm Hope to see you there, if not you’ll see me here. And one from Confucius: The firm, the enduring, the simple, and the modest are near to virtue. ps The CloudFucius Series: Intro, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22
PSilva
Sep 17, 2010 Place Technical Articles
254Views
0likes
0Comments