Internal processes may be the best answer to mitigating risks associated with third-party virtual appliances
The enterprise data center is, in most cases, what aquarists would call a “closed system.” This is to say that from a systems and application perspective, the enterprise has control over what goes in.
The problem is, of course, those pesky parasites (viruses, trojans, worms) that find their way in. This is the result of allowing external data or systems to enter the data center without proper security measures. For web applications we talk about things like data scrubbing and web application firewalls, about proper input validation codified by developers, and even anti-virus scans of incoming e-mail.
But when we start looking at virtual appliances, at virtual machines, being hosted in “vm stores” much in the same manner as mobile applications are hosted in “app stores” today, the process becomes a little more complicated. Consider Stuxnet as a good example of the difficulty in completely removing some of these nasty contagions. Now imagine public AMIs or other virtual appliances downloaded from a “virtual appliance store”.
Hoff first raised this as a potential threat vector a while back, and reintroduced it when it was tangentially raised by Google’s announcement it had “pulled 21 popular free apps from the Android Market” because “the apps are malware aimed at getting root access to the user’s device.” Hoff continues to say:
This is going to be a big problem in the mobile space and potentially just as impacting in cloud/virtual datacenters as people routinely download and put into production virtual machines/virtual appliances, the provenance and integrity of which are questionable. Who’s going to police these stores?
-- Christofer Hoff, “App Stores: From Mobile Platforms To VMs – Ripe For Abuse”
Even if someone polices these stores, are you going to run the risk, ever so slight as it may be, that a dangerous pathogen may be lurking in that appliance? We had some similar scares back in the early days of open source, when a miscreant introduced a trojan into a popular open source daemon that was subsequently downloaded, compiled, and installed by a lot of people. It’s not a concept with which the enterprise is unfamiliar.
The QT provides a transitory stop for fish destined for the display tank that offers a chance for the fish to become acclimated to the water and light parameters of the system while simultaneously allowing the hobbyist to observe the fish for possible signs of infection.
Interestingly, the QT is used before an infection is discovered, not just afterwards as is the case with people infected with highly contagious diseases. The reason fish are placed into quarantine even though they may be free of disease or parasites is because they will ultimately be placed into a closed system and it is nearly impossible to eradicate disease and parasites in a closed system without shutting it all down first. To avoid that catastrophic event, fish go into QT first and then, when it’s clear they are healthy, they can join their new friends in the display tank.
Now, the data center is very similar to a closed system. Once a contagion gets into its systems, it can be very difficult to eradicate it. While there are many solutions to preventing contagion, one of the best solutions is to use a quarantine “tank” to ensure health of any virtual appliance prior to deployment.
Virtualization affords organizations the ability to create a walled-garden, an isolated network environment, that is suitable for a variety of uses. Replicating production environments for testing and validation of topology and architecture is often proposed as the driver for such environments, but use as a quarantine facility is also an option. Quarantine is vital to evaluating the “health” of any virtual network appliance because you aren’t looking just for the obvious – worms and trojans that are detectable using vulnerability scans – but you’re looking for the stealth infection. The one that only shows itself at certain times of the day or week and which isn’t necessarily as interested in propagating itself throughout your network but is instead focused on “phoning home” for purposes of preparing for a future attack.
It’s necessary to fire up that appliance in a constrained environment and then watch it. Monitor its network and application activity over time to determine whether or not it’s been infected with some piece of malware that only rears its ugly head when it thinks you aren’t looking. Within the confines of a quarantined environment, within the ‘turn it off and start it over clean’ architecture comprised of virtual machines, you have the luxury of being able to better evaluate the health of any third-party virtual machine (or application for that matter) before turning it loose in your data center.
Generally we’ve used that information to quarantine the end-user on a specific network with limited access to data center resources – usually just enough to clean their environment or install the proper software necessary to protect them. We’ve used a style of quarantine to aid in the application lifecycle progression from development to deployment in production in the QA or ‘test’ phase wherein applications are deployed into an environment closely resembling the production environment as a means to ensure that configurations, dependencies and integrations are properly implemented and the application works as expected.
So the concept is not new, it’s more the need to recognize the benefits of a ‘quarantine first’ policy and subsequently implementing such a process in the data center to support the use of third-party virtual network appliances. As with many cloud and virtualization-related challenges, part of the solution almost always involves process. It is in recognizing the challenges and applying the right mix of process, product and people to mitigate operational risks associated with the deployment of new technology and architectures.