When a software metric target is met, some teams will declare success. But having only a single point of data doesn’t offer much information on how the metrics are trending. It is the trends that will show you what effect any process changes have on progress.
Most sprint retrospectives happen on a weekly or bi-weekly basis in agile teams. So your metrics need to be shorter than these time frames in order for your team to have enough data to iterate at each sprint, but also within each sprint.
These metrics are the most controversial ones, because so many people learned to hate agile story points. We are listing the most used ones in the market but will spend time explaining how and when to use these.
The two most important questions for any manager is whether they will be on time for the next milestone or sprint, and what the risks are for being late or having poor quality. So this metric is essential, even for the team itself. Plus, this metric is a great way to understand how your teams work. That’s why having the ability to compare with previous releases would significantly help you better know your team.
You could use this metric to identify issues like a developer being stuck on a specific task. The point is NOT to use it as a way to evaluate the developer’s performance, or your team will just game the metrics without producing any meaningful work. Use this metric only as a way to understand how you as the manager can better help your team and initiate meaningful conversations. This metric also enables you to assess the “normal” speed of your team. Across time and team members, the discrepancies between story points and actual complexity should iron out themselves.
In the same example as for “ticket close rate”, the huge bug fix could be a change of one line of code. How can you compare that to a developer who imported a library or changed every header of every file? You cannot (or you should not). And similarly, you should never use this metric to evaluate individual performances of developers.
You can use LoC in the same way – to understand when your team is having difficulties, or maybe importing too many libraries for the sake of the quality of the project!
Some people compute an “Impact” metric, based on the amount of code in the changes, the severity of those changes and the number of files that the changes affected. The overall goal is to offer an improvement on the LoC. The issue is you still don’t know the actual content within those lines of code. So the “Impact” metric should be used in the same way as LoC. Indeed, the one-line bug fix example mentioned above still doesn’t work, as many other real-life examples show.
This is a hard question. Anything related to lines of code can’t be linked with actual developer productivity. You could use it as a secondary way to check if somebody is stuck and then to initiate conversations to help those people, but that’s it. You should NOT use it to measure velocity, even across time.
Code Churn is really useful only when its level unexpectedly moves materially above or below the individual’s or team’s ‘normal’. In that case, it may show a problem you should concern yourself with, especially nearing a deadline as the project may be at risk.
Even though it’s very hard to compute actual refactoring, having some indicator and tracking its trend is very helpful for you to ensure you understand the team’s ‘normal’ and that you put enough effort on refactoring, which is essential for any software.
One way to understand your team’s code effort is to measure code churn, refactoring and new work. Keeping an eye on this trend over time will help you understand your team’s actual code focus. Depending on the stage of your project, the breakdown between those 3 metrics will make sense or not.a
Those metrics show performance of your team processes and software development workflow. These indicators are not the output of the engineering team, but are indicators showing the health of your team’s collaboration, which will directly impact the output.
If your priority is to implement continuous delivery, or to make your process leaner and deploy in production more frequent smaller batches, these 2 metrics will be very useful. Within lead time, you could also dive a bit deeper to understand where most of the time is spent.
This metric is a good complement to lead and cycle times, in the sense that it shows their results.
Have you heard “Commit Often, Perfect Later, Publish Once”? If you fail to commit and then do something poorly thought out, you can run into trouble. Commits is the common denominator for collaboration within your team. So if you push in your workflow to commit more often than the team currently does, it might be useful to track this metric. Plus, as mentioned above, if you want to understand the impact of interruptions, this metric could be a good starting point.
There are several metrics that could be interesting to you.
These metrics could give you a sense of the constant throughput of your engineering team. For instance, if that number doesn’t grow when you hire more people, there might be a problem related to a new process in place or a technical debt that needs to be addressed. However, if it increases too quickly you might have a quality issue.
This metric helps to avoid burnout and increase efficiency, as working on one thing at a time has been shown to improve focus.
You can determine the risks in a commit or pull request by:
It shows the amount of reflection and work has been done in the commits, and therefore the potential impact on the product, if deployed without code review.
Monitoring an average commit or pull request risk helps you understand how your team works, and whether you should strive for more frequent and simpler code changes.
Quality isn’t a goal in itself. The confidence in being able to grow and change behaviors without disruption is what matters.
This metric is very helpful if your product quality is important to your business. And if so, you should constantly track it. However, if you got all P1s and P2s resolved, you might want to aim for a higher quality standard by tracking P3s for instance.
Accelerate defines a failure as a change that “results in degraded service or subsequently requires remediation (e.g., leads to service impairment or outage, require a hotfix, a rollback, a fix-forward, or a patch.)”. So this rating is the number of deploys resulting in a failure on the total number of deploys.Note that this definition does not include changes that failed to deploy. That information is useful, but not this KPI’s focus.
If you focus on turning frequent deployments into an everyday habit, in order for this to have value, you need to keep the failure rate low. As a matter of fact, this rating should decrease over time, as the experience and capabilities of the DevOps teams increase. An increasing failure rate, or one that is high and does not go down over time, is an indication of problems in the overall DevOps process. It is a good proxy metric for quality throughout the process.
Pull requests can give you great visibility on the overall complexity of the code base. The more complex the code base is, the higher the chances the following metrics will be high:
This metric is not about measuring the quality of the DevOps process, as for change failure percentage, but how your team works and collaborates. How are code reviews used and are those useful? Measuring the evolution of the merged versus rejected pull requests will help you understand if your team is improving with time. You could also drill down by team member to see if they are improving too.
You don’t need 100% testing coverage, for sure. However, knowing where you stand and keeping track of it helps to see if you are trading velocity for quality. Keep in mind that “a high quality product built on bad requirements, is a poor quality product”, especially with test coverage.
Both metrics measure how the software performs in the production environment. Since software failures are almost unavoidable, these software metrics attempt to quantify how well the software recovers and preserves data.
If the MTTR value grows smaller over time, that means developers are becoming more effective in understanding issues, such as bugs, and how to fix them.
These metrics are very interesting if used by the team with a specific goal: “We need to achieve this level of MTBF or MTTR on our product.” That will foster responsiveness from your team on important issues raised by customers and will help you keep a high-standard for your product, as well as for your team. To improve performance on these metrics, the team might understand they need to solve the real cause of issues, instead of easy patches.
Every team has its own definition of SLA. But here is the one that Airbnb uses and that you could find very interesting. The SLA is the percentage of blocker bugs that your team fixed and deployed within a certain time (e.g., 24 hours for blocker bugs and 5 days for critical bugs). What you might really like about this metric is that it gives you a great understanding of your product quality from a user’s standpoint.
This metric is very close to the MTTR, but is not limited to software failures. It extends to any types of bugs. Similarly, this metric is very interesting if used by the team with a specific goal: “We need to achieve this SLA on our product.” This metric fosters product quality ownership and responsiveness from your team. That’s why Airbnb uses it.
This metric serves a similar purpose as keeping track of the evolution for your number of bugs. It might be redundant to track both of them. We have a preference for the number of bugs as you can differentiate which bug priority matters to you now and still have a notion of the overall amount (not just the trend).
Application crash rate is calculated by dividing how many times an application fails (F) by how many times it is used (U). But there are actually several ways you can compute it.
App crashes per screen view: This number compares the total screen views that the app has received to the number of crashes. An acceptable range for this metric would be < 0.01%. This should necessarily be categorized to understand the impact of crashes on the delivery of the functionality.
Size-oriented metrics that rely on lines of code make them not useful per se. So you shouldn’t compare two different software projects with them. That’s why you might not be a big fan of using it. And function-oriented metrics are difficult to compute and agree on. You might want to introduce control measures with them, but there might be better indicators for that in the list.
Another indicator of the technical debt is how outdated the dependencies used in your code base are. It could be interesting to track this, as an average of all dependencies, possibly with a variant, so you could identify when one is very old and should require your attention.
This metric should be interesting to technical leads especially. This is too operational and linked to the code base to be used by a manager. If your projects have many dependencies, keeping track of dependency age should definitely be considered.
Please note that some metrics didn’t make the list, because they are not popular enough or too controversial. For instance, a code complexity evaluation from the visual complexity of the code, which is a language agnostic measure based on the depth of indentation of source code lines – also called the whitespace complexity – is too controversial and is not really actionable per se. What business value would you have from it, as it doesn’t mean the code quality is bad.