Programmability in the Network: Your Errors, Do not Show Them to Me
#devops Errors happen, but your users should never see them. Ever.
Every once in a while things happen, like errors. They are as inevitable as winter in Wisconsin, rain in Seattle, and that today someone will post a picture of a cat that shows up on your Facebook news feed. Admit it, you looked, didn't you?
The inevitability of 404 errors launched an entire "best practice" of web design to include a fun or amusing error page to present to users. Because looking at a standard 404 error page is really pretty ... boring.
We should, of course, be building systems to fail - or more precisely to handle failure gracefully. But at some point, the system is going to fail so spectacularly that it's going to cascade back toward the user. At some point, the application can't address an error because, well, it is the cause of the error.
At that point, it becomes the responsibility of the network - of the infrastructure - to handle the error gracefully. That means without splashing a really ugly 503, text-based error on the screen that's going to confuse 99% of a site or application's users.
The ability to "catch" errors on the egress (outbound) data path is not new, nor is the ability to programmatically deal with that error in order to either (a) reactively attempt to redress the situation or (b) present the user with something that makes sense (translated that means it does not include the HTTP error or error codes).
Even something as simple as providing as much information as possible about the failed transaction can be valuable, but requires the ability to recognize and subsequently act upon the error. That means visibility and programmability somewhere in the network.
This is where devops lives, in the network but at the application layers. This is where the value of devops is illustrated outside of monthly reports and charts indicating number of releases and length of time to deploy. It is in the network at the application layer where devops is able to bridge the gap between operations and applications and provide the insight, information, and services necessary to deliver applications smoothly. That means, in part, addressing errors that inevitably occur in a meaningful way. Logs, notifications, and presenting consumable information about the state of the application to the user are all part and parcel of dealing with failure.
There is no reason a user should ever be presented with raw error messages emanating from a misbehaving application given the breadth and depth of programmable services available today unless, of course, a solution which does not provide such capabilities has been made the foundation for an application's delivery infrastructure.