What Infrastructure Should Learn from NPM JavaScript Debacle

If you hadn’t seen (and you might not have, it was a developer kind of thing after all) there was a big uproar caused by the removal of 11 lines of JavaScript from the popular NPM repository. You can read more about it but the TL;DR is this: dude removed some JavaScript that was relied on by thousands of other projects and basically broke them all. Because those projects not only relied on that code, but relied on code stored in an online repository.

Now before we go further, I have to say that despite the millions (yes, millions) of projects that rely on JavaScript and CSS stored in web-hosted repositories, this is one of the first incidences of something like this happening. And there’s a lot more to this story than just the ramifications. But as I’m not a lawyer (and don’t play one on the Internet nor did I stay at a Holiday Inn Express last night) I’m just going to focus on the impact rather than who is right/wrong/etc… After all, you probably don’t go more than an hour without loading a web app that in turn loads a script from Yahoo or Google or some other online repository so the discussion is really about how relying on online repositories can really mess up your day.

While this story really hits developers hardest right now, in the future this is something that could impact networkers, so we should talk about it now, rather than in some future post mortem blog.

If we’re going to treat infrastructure as code (and perhaps critical infrastructure, at that) then we need to carefully consider where we’re going to manage the artifacts (templates, scripts, etc…) that make up the metastructure of our modern, automated infrastructure architecture.

First and foremost, we shouldn’t store these on individual creators laptops. Really, if Bob has the most current version of the “deploy critical app A” script and something happens to his system, you’re hosed. If Alice has been vigilantly maintaining the base corporate security template and malware or an inadvertently hit “delete” key gets rid of it, what do you do? There are a hundred and one different mishaps that could occur that suddenly halt the continuous deployment of an application.

Developers learned that lesson long ago. And to avoid that they moved to network-hosted repositories. Most of them include (or are) version control systems with extra layers of authentication and authorization around them to ensure that only those people (and systems) that should have access, do have access. These are, because of the very nature of storing the corporate secret sauce (application code), usually on-premises. They are replicated, backed-up, and treated with the importance they deserve.

At some point developers started directly grabbing scripts and stylesheets and other dependencies from the web. UI elements and frameworks were automatically included in web apps and downloaded in real time from the web. This had the advantage of making sure they were always up to date. It’s like automagical updates and patching, no work required. And it was good, until one piece of the web was suddenly inaccessible (rarely happens, but it does).

This is much like what recently happened with NPM. A whole lot of other projects depended on one smaller project that was loaded from the web in real time. And when it disappeared, wham! Everything that depended on it being there broke.

If you browse around online repositories of DevOps frameworks like those that host Chef or Puppet fragments, you might find that a lot of them are hosted on Git. Git – and similar systems like Puppet Forge - offer everything you need (including a great API) that would let you pull, in real time, the latest and greatest versions of these infrastructure-related artifacts. Right into your CI/CD process. Cause you are extending that to your (network and app service) infrastructure, right?

This is where we need to stop, take a deep breath, and consider we might not want to do that directly.

Git – and similar systems – also offer local repository support, with the ability to synchronize with the “central” repository online. Yes, that means another piece of infrastructure you have to install, manage, and maintain locally, but if you consider that you’re not only worried about availability but also the potential for an update to “break” your process, you really want to think about this now, rather than later.

I wholly recommend the use of version controlled repositories as you’re embarking on this journey to transform the network (all of it) into a more agile and automated environment. Treating infrastructure as code can make roll-backs and replacements a whole lot easier to manage if the latest and greatest configurations can just be pulled out of a controlled repository. But that controlled repository should be just that: controlled. It should be local to the environment because otherwise it’s not controlled at all. Not really. Availability issues, unexpected changes, or outright removal of dependencies are a risk that should be considered well in advance of implementing the system.

One of the hardest cultural impacts that will be wrought on networking by shifting left (into the CI/CD process) will be the need to implement the kinds of controls and processes on changes to infrastructure as are currently expected on the app code itself. Code reviews, version control, and the use of repositories will be mandatory in the future if we’re to actually treat infrastructure as code. That’s a marked difference from right now, where scripts and artifacts may or may not be required to submit to a review / approval process, and where the latest artifacts are often stored “on that server” or “on Bob’s laptop.”

The good news is that developers have suffered through this journey already. We can learn a lot from the development organization about how to best go about transitioning to an approach like DevOps that treats infrastructure as code. Don’t let their lessons go to waste.

Published Mar 31, 2016

Version 1.0