DevOps is one of those things that are hard to wrangle down to a definition, and if you’ve read anything on the topic you probably didn’t walk away with a concrete understanding of what it is. Most definitions seem incomplete if not even a little misleading. In particular, the main thing you hear is that DevOps is about increased collaboration between dev and ops teams. However many apps in the cloud are supported entirely by the dev team, with minimal or no ops involvement, but still very much have need for DevOps. In this context, DevOps is more about new tools, methods and practices of building and running apps in the cloud.
“I Would Spend 55 Minutes Defining the Problem and then Five Minutes Solving It”
Okay Einstein, let’s start there. The problem is we need to run and continually update our software in a scalable, reliable, and affordable manner. For this we look to the cloud. But running applications in the cloud is fundamentally different, and to do it effectively we need to adapt to a new way of doing things.
DevOps is the “stuff” that solves that problem.
So what is DevOps? The predominant, and tip of the iceberg answer is that DevOps is about delivering software faster, which in turn is about automation — having a robust continuous integration and testing process, automating the deployment process, and even automating the creation of infrastructure. It’s getting from “I just finished writing code” to getting it into an environment to test.
Before the cloud, there was often a sense of permanence and consummation in both the infrastructure and the application itself. The new reality is that both are ephemeral, subject to constant change and updating.
We do DevOps because we’re not moving into a house, we’re getting on a treadmill.
So sure, DevOps is about building a deployment pipeline, but that’s really just the beginning. It’s the “day two” stuff, the operations, the scaling and monitoring, that I think is often overlooked as part of DevOps. To consider why we need new tools and methods, let’s reflect on what many of us were doing prior to the cloud. We had been running our applications on a fixed number of servers which we spent weeks to procure and configure, mostly by hand with little to no documentation or automation. We gave these servers cutesy names like they’re pets. Deployments often involved scheduled downtime. Servers were too often in a single geographic area.
Not surprisingly, a first endeavor to deploy in the cloud often involves as little adaptation as possible — we’re doing it a lot like we used to “back on Earth” by hand configuring servers, imaging them, and launching VM’s from those images. So we’re technically “in the cloud”, and there are some powerful benefits already, but we’re only halfway to solving our problem.
In the cloud we’re deploying on to VM’s or containers which may launch and terminate in as little as a few minutes. And they might be behind a load balancer and firewall that was just created a moment prior with an API call. Scaling is elastic, measured, and operates automatically based on real-time conditions.
Launching an application into the cloud is a lot like launching a satellite into space.
Your app is running on instances on the east coast, then it’s on the west. Next time you look it might be in Asia. Getting information from it is like telemetry. It’s a helpful mindset to consider servers in the cloud as being inaccessible remote locations because it reminds you to plan ahead what stats, data, and logs you want to get back from it and build an automated means of transmitting that data to a central place. You can’t wait until something goes wrong and then expect to ssh into the server and look around, it may well be terminated by then. In fact, just to remind you how short lived they are, we don’t even call them “servers” anymore, they’re “instances”.
Before cloud architectures, the assumption was the server was a “going concern”. It was going to be there and continue to operate unless something catastrophic happened. With cloud, this assumption is turned on its head. For reasons like auto-scaling, health check failure, or even getting your hourly price out bid in the spot market, your instance could be terminated in a moments notice. All of this means we now have to build our apps in a way that is tolerant to sudden shutdown and can resume by another node. We can’t even rely on short term storing of any state in memory or on disk. Consider them literally as they are called, “compute instances”.
We have to have a strategy for health checking our applications. Sure we’ve had monitoring before, but now our infrastructure will automatically terminate any instances that fail to meet the health check requirements — and if you’re not careful, can terminate your entire cluster for something that might have been a momentary hiccup, leaving you completely down for 5 to 10 minutes until new VMs boot up.
These are among the many fascinating DevOps problems you will need to think through and solve as you devise your own way of deploying and scaling apps in the cloud.
All of a sudden you realize you’re building your own platform.
A more complete picture is that DevOps isn’t just automating deployments, it’s building or updating applications to 12 factor apps with scaling and ephemerality in mind, which implement your established health monitoring and logging standards, and building a pipeline and set of practices for deploying and operating those apps. We don’t just get that out of the box with our IaaS provider. And so I think a lot of the work which is currently done out of necessity under the umbrella of DevOps will increasingly be provided by platforms that sit a layer on top of IaaS. Knowing how much work goes into building all of this on your own, I believe that PaaS is the future, for the same kinds of reasons why IaaS has value over building your own data center.
We need DevOps because we need the cloud, and the reality is that DevOps is just necessary to deploy and operate modern apps in the cloud, weather you are doing all of it yourself or leveraging a platform for some portion of these functions.