Every great new feature of the application you push to production provides an opportunity to increase the revenue of your organization. On the other hand, every commit and with it, every release also poses a risk.
Your application can contain vulnerabilities or other bad configuration settings that can be exploited. Even worse, your application can become the point of attack which gives an unauthorized person access to your entire application landscape.
Your data is at risk. In an Agile world, things change very fast. In this article, I’d like to talk about risks and risk management in a DevOps world.
The traditional way of thinking
Imagine the following scenario. A development team wants to try out a fancy new framework to improve a certain aspect of their application. Business users are happy since it enables them to push changes to production faster than before. One big obstacle here: the security department. They control the risks that go with these kinds of changes.
Developers want to go fast, the security department has a full set of responsibilities on what is there being deployed. It’s easy for them to say “no” to every innovative new technology. In an Agile world, in which things change very fast, this slows down innovation.
But don’t avoid risks and stop every discussion immediately. It’s far better to inform yourself about the proposed and yet unknown situation of the new framework, even it if poses unknown risks. Inform, share, reduce, or accept are keywords that all contribute to proper risk management.
Risks management is all about dealing with uncertainties. You don’t know if, how, or when your application will be attacked. Most risk management methodologies help to identify, rate, and rank risk to get some kind of control about these uncertainties.
It’s good to understand the (possible) impact of a risk. A risk by itself does not tell you so much. It’s about the potential impact it can have on your organization. Think of the financial impact, damage to your image, downtime which delays a critical application being deployed just before an important marketing campaign kicks off, etc. We need to know the risk score to determine the potential impact:
Risk score = risk probability * risk impact
Simply speaking, the risk probability is the likelihood that risk occurs and the impact defines what the impact of the risk would be. Risk assessment techniques use the risk score to help you determine how much effort (if done at all) you should spend to minimize the impact of the risk even if you don’t know if something bad happens.
OWASP top 10
All security-minded people are familiar with the OWASP top 10 which defines the most common risks for web applications. Every risk on their list has the following elements, which makes it a very structured method to work with:
- Thread agents. This element describes which users to watch out for which pose a threat to your application.
- Attack vectors and scenarios. Describes which kind of attacks are to be expected.
- Detectability. Answers the question of how easy the risk/weak spot can be found by attackers.
- Exploitability. Describes how widespread the vulnerability is and how easy it is to exploit it.
- Impacts. This element describes the impact on the business and technical architecture.
Risks and threats are related terms but should not be mixed up. Threats are specific: what could go wrong with your system if a person or a process exploits it and to what and whom you have to protect your system from. Risks are abstract: it’s your systems’ exposure to threats. You should answer the question: what (given the costs and trade-offs) can/should you do to reduce the exposure of the risks to threats.
Costs to fix are not just about money. It’s also about risks in terms of performance degradation, delay in time to market, time which you need to fix a risk which cannot be spent on other more valuable things, etc.
Ways to deal with risks
Handling risks in the “traditional world” and the DevOps world are the same. The following strategies help you to deal with them:
- Accept it. Perhaps a very common strategy. Accept the risk and move on. But it’s not that simple, since you need to be prepared to deal with the consequences in case something bad does happen. You need to monitor the feature which has the known risk, react to the incident in a correct and timely manner, and be able to fix it within a short time-frame. Without these “countermeasures” in place, just accepting the risk can be too problematic. Dealing with these kinds of things does change in the Agile & DevOps world
- Avoid it. Don’t use unsafe or unproven technology . The same is true for legacy technology which is poorly maintained. Sometimes you can disable a feature or simplify it to avoid the risk. Engineers tend to “over-engineer” certain aspects, this creates more risks, so you should build simply. Creating things in a simple way also speeds up (security) reviews. Reduce your attack surface: especially useful when using serverless functions.
- Reduce it. This strategy actively reduces risks. Think of training for your personnel, testing your applications, scanning source code, and dependencies for vulnerabilities and more.
- Share or transfer it. Although a risk does not disappear magically when it is shared or outsourced to a third party, it can help to reduce it. For example: if your company does not have the time and experience to set up an audit logging feature for your applications, it’s wise to obtain that service from a vendor that does do it very well.
Make it visible
For all of these strategies, you need to make the risk visible. Not only to the DevOps teams but more important to the business. In the end, risks have an impact on the business. It’s the business that accepts a risk or not.
A couple of ways to do this:
- Create user stories on the backlog which are about reducing risks (e.g. reduce technical dept). Put them under a specific category or link them to an epic so the business can quickly get an overview of them.
- Scan your source code and dependencies in the CI/CD pipelines every time you commit something. You can scan for known vulnerabilities, exposed secrets, invalid licenses and outdated packages. Don’t deal with the outcome in isolation. The results should be made accessible to everyone in the organization, so they can learn from it and help fix problematic issues. Management reports with the proper organizational context help the management to set priorities.
- Measure how much time you need to deliver a new version of your application to production and how much time you need to recover from a broken deployment. Furthermore, be aware of how much data loss is acceptable (again: to the business) to get an indication of the real impact in case you got a problem in production.
Agile and DevOps change the field
Traditional risk management strategies are just a baseline to deal with risks. However, they have a hard time to keep up in the new world: manual testing is too slow when pushing to production with every commit, approvals by a “release manager” which don’t follow this approach miss their goal, PEN tests executed by an external company are running behind the facts of the actual situation, etc. They all do not work in an Agile world. The speed of delivery is too high to keep up, a lot of other things are different now.
Let’s name a few examples which underscores the things which are relevant in an Agile / DevOps world:
- Design documents are not created up front, documentation is considered less important compared to working code and is hardly aligned with the actual implementation.
- Formal reviews by compliance offers are too late or not even carried out at all.
- DevOps teams are primarily focused on delivering new features, they do not think in terms of security all the time. Remember the phrase DevSecOps and shift security left?
- Teams which are operating with a lot of autonomy tend to choose tools and solutions which suite them well, but which might not be the best for the organization as a whole. Let alone they have all the security related knowledge to always make an informed decision.
From this perspective, the DevOps way of working poses new risks to the organization.
Reduce new risks
It’s important to know how to deal with these in order to keep things under control. It’s like using a parachute when you jump off a cliff 🙂
Create small changes at a time and release it as often as possible. Learn how to fix things which break and improve every time. Use deployment patterns like blue-green deployments to switch back to an old version if things cannot be fixed in a timely manner. This needs to be a practice al across the team so everyone feels confident to accept this way of working. A mutual understanding of the way of working is important to avoid friction between departments.
Be sure to standardize and automate all steps related to deployments. Use IaC to ensure infrastructure is setup in a consistent way. Only then you can quickly fix things which might break. It also reduces the risks since human beings tend to do things a little different every time. A machine which follows a script instead of a lot of manual steps is much more reliable. Be sure to do quality checks for your infrastructure scripts and break your pipeline if it does not meet your compliance rules.
Create resilient infrastructure components and applications to make sure your application recovers from (external) failures. Build feedback loops from production back to development to catch (the useful) incidents in a timely manner. For example: only send high impact incidents notifications to the team, not “build succeed” notifications. You’d better avoid the “alert fatigue” syndrome: if no one responds to incidents because they receive too many alerts, this creates another risk. No news is good news in this case.
Blend security into your daily activities
Contrary to the traditional world in which risks are assessed upfront based on design documents, formal specifications and well written requirements, risks in an Agile world change every now and then. It is important to blend risk management strategies into the daily work of the DevOps teams. A couple of ways to do this:
- Refactor old legacy code and simplify overly complicated code to make it easier to understand. Security teams can’t fall back on formal documents, they need to understand the code itself to determine the risks themselves. Open door: upgrade old frameworks, libraries and other dependencies. Snyk has a new feature which creates pull requests to provide you with upgraded dependencies in case you’re not using the latest versions.
- Pay special attention to security risks when executing code reviews. A specific training might be needed to learn everyone who is involved.
- Reuse existing solutions which are proven. Don’t reinvent the wheel every time. For example: another team might have created and thoroughly tested a feature. Sharing is caring. See if you can reuse it or use the same patterns.
- During the sprint planning, record and review risks which needs to be put in the upcoming sprint.
- Ask security officers to join your sprint demos so they can give you feedback in case they see any security risk which you have missed.
- Use the retrospectives to look back and improve upon security risks.
Besides these items, there might be plenty more examples which are practiced in your organization. Collect them and spread them through the departments. Management should encourage teams to do this, otherwise people don’t see this as a benefit for themselves. All of this helps to reduce the issues which might end up in production and give you a lot of trouble.
Risk management in an Agile and DevOps world is different compared to the old way of working. With the guidelines I presented you can manage those risks in an effective way without losing the focus of your organization to deliver business features.