DevOps teams have so many things to concentrate on. Their primary focus is to deliver useful business features quickly and reliably. Feedback loops help to define if these features are worth their time and effort. Sometimes teams do not pay enough attention to it. This can lead to several problems: bad communication, a slowed-down release cycle of their software applications and other problems. For organizations to deliver quickly and reliably, they need to identify and fix the problems in their feedback loop.
Companies that welcome new ideas about the best DevOps way of working are not always actively building a true and consistent DevOps culture. While their intentions might look good, consider the following problem which relates to an open feedback loop (which is a negative one).
If there are no clear ways to bridge the gap between Ops and Dev and the other way around, everyone has his/her own definition of a positive feedback loop. Solid definitions of the various types of feedback loops and what they entail are crucial to avoid miscommunications and misunderstandings.
In highly regulated organizations, Developers are constantly surprised by new security measures that the CISO teams put on the table. They are stopped halfway in their deployment and their CI/CD pipelines keep queuing up. This slows their delivery speed down and also leads to frustration.
If the CISO team does not actively communicate their changes up-front and don’t gather (constructive) feedback from their end-users (in this case the DevOps teams), they are forced to jump into numerous discussions with the business representatives to explain why they think their security measures are needed. The risk of shadow IT grows and other people need to interfere to align both worlds.
A new initiative
Imagine a team that wants to practice Trunk Based Development. The DevOps team in charge has to practice to stop creating feature branches. Instead, every commit goes right to the trunk of their Git repository. If this is not communicated with the Quality Assurance (QA) manager, the following happens. Since the QA manager expects every feature to be traceable through its own feature branch and no new features are created anymore, his dashboard does not show alerts anymore. Clearly, there is a communication gap between the DevOps team and the QA manager.
Valuable time is lost to explain to each other which decisions were made for which reasons. Suppose the delivery manager has already approved a go-live date, he is in trouble now. Perhaps the go-live date shifts a couple of weeks. As the team progresses their efforts, both managers become more nervous. They expect business results.
As the DevOps teams hurry they make more mistakes than before. Besides their efforts to opt for Trunk Based Development, they also need to fix those mistakes. A negative feedback flow influences the mindset of the team. The Trunk Based Development initiative halts and the team needs to revert their effort. Back to feature branches. In the future, they need to start over again. Demotivated people leave the company.
Keep the feedback loops easy to understand for everyone and be clear on what to expect from everyone who is involved. If you leave feedback loops open for interpretation to everyone in the organization you get all kinds of fuzz. There is a clear need to document how feedback loops between multiple teams should work.
Technology leaders should define and embrace the defacto standard across the organization and proactively chase activities that deviate from the standard. All of this helps to generate a culture of trust and more valuable feedback for everyone. In the end, this helps to speed up and to improve the quality of whatever has to be delivered.
Business KPIs matter
End-users only care about features that matter. Organizations are already aware of that. However, when it comes to feedback loops and what to measure exactly, many teams are still concentrating on:
- How fast can I deliver?
- How many times a week does my pipeline break?
- What were my up-time levels this week?
- How many defects were introduced last week?
Of course, these are important, but from a business perspective, it’s much more valuable to focus feedback loops on the most crucial business KPIs. Think of (automated) feedback loops that are much more context-specific such as:
- Which top 10 performance features are slowest to deliver (track features by ticket number versus delivery time). Focus on features that generate revenue or measurable customer satisfaction.
- How much downtime of an application is accepted given the revenue it brings? Focus on improving the quality of those applications which matter most. You need to investigate these kinds of aspects across the entire organization, not just within a single team.
- How many more customers did we on-board after the launch a specific new feature? Was it worth spending the money on a marketing campaign?
When thinking from this perspective, you will focus on business growth instead of infrastructure-related measures which only make sense from a technical point of view. Of course these are still important, but can’t be seen in isolation. That would invalidate your business efforts.
Prevent alert Fatigue
While the shift from DevOps to DevSecOps is well on its way, some common problems tend to become more demanding across teams that have a bunch of monitoring tools and alerts. Alert fatigue is all about the overwhelming number of them thrown at DevOps teams. It does not make sense to send an alert to every team member in a channel when there is yet another successful build. They don’t care, they want to carry on and only be interrupted when something goes wrong.
Instead, focus on the following aspects to keep the number alerts limited and thus useful:
- Only alert when an action is really needed. Include contextual information like a proper error message, a proper action to take and the priority.
- To make things even more clear: critical alerts should be red, whereas less important alerts can be orange. Green colored alerts are OK and just like notifications. Make alerts easy to filter and sort (f.e. by prefixing them).
- Include the ticket number and a URL of the (original) ticket in the alert. This helps the person who reads the alert to quickly act upon it. Or do nothing. If they remember the ticket ID, they don’t even need to open the ticket system at all.
- Adding to that: don’t forget to include the status of the ticket: open or closed. If a ticket is closed and it triggers a large number of alerts – this might reveal a bigger problem. For example, a feature is perceived to be delivered while in fact, it’s not. If it’s just an open ticket, everyone might understand that things are in progress and they should just ignore it. Or help the person out who fails repeatedly.
Whatever the reasons for an alert are: always hunt down the exact root cause. It prevents the creation of more technical dept and it does not contribute to a bigger problem. Furthermore, DevOps members learn more about the system by investigating it. Domain and system knowledge grows which leads to a higher quality.
Introduce quality gates
In every organization, there is a lot of debate about the rather abstract term called quality. In turn, quality gates determine when the quality of a certain piece of code is sufficient or not. Proper feedback loops should also include quality gates that break the pipeline to avoid low-quality software from progressing to the next stage.
- Design the quality gates across the context of certain types of applications. A PoC application does not need the same quality compared to an application that is already in production and which serves a big list of highly paying customers.
- Quality gates should not be bypassed. Measure this before things go live OR measure it in lower environments. Think of run-time monitoring and reporting when there is a critical configuration flag missing. This type of feedback should trigger immediate action if things are a real concern.
- Proper governance of quality gates shows clear owners of them. Teams understand who to contact when things go wrong that might be out of their control. Communication across teams improves and becomes more consistent. No emails sent around and tickets are not created manually. All of this leads to the avoidance of misinterpretations.
Whenever a quality gate is changed, let the owner of it communicate it broadly across teams. Gradually increase the threshold of quality gates so that teams can slowly adapt to the new quality standards. It also gives them confidence about their own work. Confidence helps to speed them up. Business wins.
Stable platforms win
Nothing more important than a stable platform. As more and more feedback is gathered through the usage of information which, in turn, is based on the deployment platform itself, one can think of the importance of the stability of the platform. Without having these in place, all other metrics become less useful or even completely unreliable. Systems should not interrupt to make the other metrics a trustful source. Therefore, teams now also focus on measures to keep the platform as stable as possible: early detection of network interruptions, great deviations in latency, predict peak workloads, etc.
Two key metrics are crucial here: Mean Time To Repair (MTTR) and Change Failure Rate (CFR). The first one depicts the time it takes to repair a broken system and the second one determines the number of failures compared to the total number of changes.
All of the above is especially true for the CI/CD pipeline. As this is at the heart of everything – it needs to be rock solid. Make sure to treat the CI/CD pipeline as well as you would treat the end-user application. Happy customers won’t stay happy very long in case of a flake pipeline.
As seen before, not every piece of feedback is as valuable as the other one. Negative feedback loops actors should be kept to a minimum. Successful teams demand feedback that is based on actionable alerts. Set clear actions to invite the person who checks the alert. Besides this, those alerts should be actually acted upon and not left unattended, otherwise, they are still pretty much useless and only eat up precious time.
Be sure to check for changed processes over a certain amount of time. Zero changes highlight invalid alerts which should be clearer and more concrete. If they lack valuable information, you need to add it. Remember: if your alerts only result in stabilizing the current systems and preventing (potential) disasters – you lack the triggers to improve your business as a whole.
Feedback loops that are meaningful, well communicated, and which find their place in your CI system are very valuable for your DevOps team and the rest of your organization. Take some time to investigate what you want to measure and what the desired results are. Furthermore, positive feedback loops triggers everyone to take some action. With the tips and tricks in this article in mind, feedback loops will boost the speed and quality of your software delivery processes a lot.