About 3 years ago I was working at a small startup and had a conversation with one of my coworkers about using BitTorrent for distributed deployments in our datacenter. We got on the whiteboard, drew up some preliminary ideas, and then got our boss in the room to show him our idea. Being the “dark days” of BitTorrent, most of you can probably guess what came next. He said absolutely not, we are a legitimate company doing legitimate business and if anyone found out we were running BitTorrent internally it could tarnish our image. While that blocked us from rolling it out in our environment, we honestly didn’t really need that much throughput. We had a very heterogeneous environment where there were only double digits of any particular server class therefore we stayed with the central repository distribution model.
Fast forward a few years, BitTorrent becomes a staple in the open source software distribution arena. Almost any Linux distribution imaginable can be had via BitTorrent these days and at a fraction of the cost of what it would cost to host them centrally. I would call this the “transitional period” when BitTorrent started to receive something other than negative press.
I hadn’t really heard of anyone using BitTorrent in the capacity that we had originally discussed until a few weeks ago when Twitter’s Engineering group posted a blog on their implementation. They were using a single Git server to host all of their software packages and then instructing their application servers to all download from this one server. This was sufficient in the beginning, but as we all know Twitter has grown by leaps and bounds since its inception in 2006. Hitting a single Git server with thousands of application servers just didn’t work. Enter their new system of distributed deployments: Murder.
Murder has nothing to do with the nightly news, it is also defined as a “flock of crows,” which segues nicely into Twitter’s bird theme. It was written by Larry Gagea who is an infrastructure engineer for Twitter. Murder is deployed using Python and Capistrano. Python doing the heavy lifting for the BitTorrent traffic and Capistrano instructing the application servers. Given that BitTorrent was originally designed to run on the Internet with limited throughput and relatively high latencies, there had to be some modifications to the standard BitTorrent options. They decreased the timeouts on chunk transfers in order to not have machines hang waiting for a chunk that may not be there. Encryption was not needed to bypass ISP gateway, so it was disabled to reduce the CPU overhead. Distributed Hash Tables were also turned off in order to encourage a more linear distribution, which is discussed in length in Larry’s presentation. Lastly, UPnP was disabled as it was not needed for NAT traversal and makes traffic patterns less predictable.
If you are interested in playing with Murder, it can be downloaded from GitHub: http://github.com/lg. If you have the time, I would also encourage you to watch Larry’s half hour talk on the system. He outlines why they did what they did and what tools are available to build a similar distributed deployment system that isn’t Ruby or Python-centric. It is very cool to see such a neat and innovative protocol finally get some good press after all these years.