DevOps teams that have a lot of team autonomy should think about the way they deploy their applications. Because they are now responsible from start to finish: you break it, you fix it. Whatever they deliver to production has a direct impact on themselves. There is no operations team anymore to take care of the correct functioning of their application and its dependencies. Nor is there a team is responsible for the technical health of the infrastructure on which the app is deployed. Because choosing the right application deployment strategy is not easy, this article will highlight different strategies and explains when to use each of them.
Goals and guidelines
Deployment strategies in the cloud serve several goals. Most of them are focused on one or more of the following arguments. Use this list as guidance to select which strategy fits your type of application, your team, and organization and SLAs.
- When uptime is critical, think of minimizing downtime (e.g. chose for rolling updates).
- Think of rollbacks in case of errors upon deployment or during (stability or performance) tests.
- Put everything (all scripts and configuration) in version control so you can use CI/CD pipelines to deploy new versions of your application quickly and you can track and trace them.
- Deployments should be consistent and repeated across environments to offer smooth promotion of an application from DEV to TEST to ACC to PROD environment.
- Have you thought of backwards compatibility? For example: serve one database to which multiple versions of an application are connected.
- Carry out a root cause analysis (RCA) in case of a deployment failure. This way you can learn and improve for the next iteration. Document it and share it with other teams.
With these in mind, let’s explore the different deployment strategies themselves. Business considerations help you to judge various non technical aspects.
The tried-and-true strategy of upgrading in-place is simple and effective.
Takedown the system. Replace version X of the application and swap it with version Y. Then boot the system again and let version Y of the application come up again.
This is one of the easiest deployment strategies and perhaps one of the cheapest. It is rather inexpensive in terms of required knowledge for the team and the amount of infrastructure components (e.g Virtual Machines) you need. Although this strategy has several drawbacks in terms of downtime, limitations to scale out, and a lot of surprises if things go really wrong, there are still business justifications to choose for this strategy.
Consider a highly specialized team that only needs to “keep the lights on” for a specific application. They don’t require high uptime since this application is only needed for non-critical operations carried out by an internal team. When good agreements are made with the internal users, they are aware of the downtime and thus it does not impact them. Even if more modern deployment strategies are becoming more popular, from a business perspective, this deployment pattern still proves it’s value in many cases, especially when the application is older and on the list to be replaced with a modern equivalent.
The blue/green deployment strategy focuses on deploying the next version of an application alongside the previous version.
A blue/green deployment strategy is like a big bang with multiple ‘universes’. Suppose you want to upgrade version X of an application to version Y, you deploy the exact same amount of instances for this new version. As soon as the (extended) testing phase for version Y is finished, you switch the load balancer to the new set of instances.
This deployment strategy is an expensive one since it requires you to bring up the same amount of instances for the new version of the application. If testing procedures take long, you still need to keep up, maintain and pay those instances. Besides testing the application you also need to test the infrastructure itself.
As soon as serious things go wrong, you need to redo everything and this can be time consuming. If the deployment pattern is not using IaC it is very time consuming to setup and thus redo in case of errors. It builds up extra costs and increases the risks if things do go wrong unexpectedly.
Instead of a “big bang” like the previous deployment strategy, a rolling update slowly rolls out a new version of an application.
In essence the rolling update strategy replaces a pool of instances (e.g. Virtual Machines or containers) one by one. For example a pool consists of 3 instances. As soon as one instance is being replaced, the next one is taken down and removed from the pool. This process continues until all instances are replaced and the new version of the application is served by the new instances.
A big advantage of this strategy is “zero downtime”. With a minimum amount of instances set to 3, 5 or even 7, there is almost always at least one which is operational. In case of 3 instances, 1 can go into service and there are still 2 instances left. 1 instance acts as the main instance, while the other one acts as a fail-over. With this setup in mind there is an extra layer of protection.
Some other characteristics of this strategy are:
- Max number of unavailable instances can be defined. This makes sure that there is always a minimum amount of instances operational.
- Parallelism: the number of instances in parallel to replace during the deployment.
One of the downsides of this strategy is the lack of control about the traffic. It is (nearly) impossible to control the traffic flow. You don’t know if it flows to the old instances or the new ones. Besides this, it’s difficult to handle multiple APIs, since different versions of the application can’t run in parallel without changing the configuration (e.g. endpoints).
As said the main business benefit is zero downtime. Imagine an application processes a huge number of transactions per minute, those do not have to be served with another system and processed after the actual upgrade. Instead, the application just continues to operate. There are no service windows needed and customers do not notice the upgrade. Minimize the risk of data loss.
No service windows means no costs to pay overtime. Developers feel powerful if they do not carry the burden to deploy new releases during the weekends or other off-office hours. In addition to that, it is possible to “track/record” deployments so you can also rollback your deployment in case of a severe issue.
In a “common situation”, users are routed to a number of instances to evenly spread the load over those instances. A/B testing differs from that principle. It is used to validate a new version of an application. Practical use cases to validate are the number of (valid) transactions or customer revenue gained by the new feature. If the new feature brings positive benefits, the DevOps team can decide to roll it out on a larger scale.
A/B testing requires an intelligent load balancer to serve another version of the application to run in parallel. Traffic should either being served by the old version of the application or the new one. Several criteria can be used to route the traffic: location or language of the user, browser or Operating System specific info or query parameters.
From a business perspective, it’s very good to try new things out quickly and safely. Although this might be an expensive setup, it gives full control over the traffic distribution and you can track the results (both positive or negative) in terms of customer behavior and revenue. It’s easy to revert back to the older version in case the new version does not provide significant benefits: change the load balancer back to the original configuration, delete the new deployment and move on.
Canary deployments are often meant to test the stability and/or reliability of a new feature in which the team has rather low confidence.
It’s procedure is to redirect a certain percentage of the traffic which enters the load balancer to the new application version. Say 25% of the traffic is routed to the new version of the application and the remaining 75% of the traffic is routed to the original version. If things go well, you can gradually increase the percentage of traffic which is redirected to the new version until, at the end, all traffic is served by the new version.
An alternative to this procedure would be to include a feature toggle. This way you handle the traffic shaping on the application layer which gives more options to select the traffic itself. Think of using meta-data of a user (e.g. gender, age, etc) to split the traffic.
This deployment pattern is not 100% fail-save in case the new version gives significant problems. However, you can switch the traffic back to the original version pretty quick. Given these considerations, it’s great to try out new features and see how the application and system behaves. Extended tests for the entire application (landscape) become less relevant since you’re focusing on the new feature.
Quickly collect feedback from stakeholders and end-users. Once they are happy with the new feature, the DevOps team can proceed. In the end it accelerates a bunch of other processes.
An “cutting edge” deployment strategy which fully utilizes cloud technology and resilience is the “shadow deployment” strategy. A key feature of this strategy is to test production load on a new feature.
Deploy a new version alongside your old version. Duplicate user requests and redirect them to the new version. The new feature handles this traffic until it needs to scale out to handle the increased load. Pay special care to traffic which can cause trouble like duplicate transactions or requests which are overwriting each other (multithreading) and thus render your data useless. One way to avoid it is to use mock services for the new feature. However, then you are not completely mimicking the new feature.
Shadow deployments frees you from setting up a dedicated load test environment. It gives you the flexibility to operate one environment and focus on your applications’ functionality. Besides this, the load tests are based on actual traffic in your existing environment. This greatly enhances the usefulness of these kind of tests and answers to “how does your application behave in case of a certain load”.
Small increments, fail fast and often without disrupting the actual production version of your application are other business benefits. Teams can learn in a safe way and thus build up this knowledge gradually.
A lot has changed over the last couple of years. Cloud-native offers different deployment strategies. All of them have pros and cons. I hope this has helped you to select the one that best fits your use case.