Software engineering productivity or velocity metrics have always been a very debated topic. Largely because they have been used the wrong way most of the time. When managers lean on using lines of code to measure their engineers’ velocity, you can understand why these same engineers have a problem with that as overly simplistic. In this article, I want to list the different velocity metrics, and see how and when you can use them, and sometimes perhaps not at all.
This article is the 2nd part of a series of articles:
- How to Use – and NOT Abuse – Software Engineering Metrics
- How to Use Software Productivity Metrics The Right Way
- How to Use Data to Improve Your Sprint Retrospectives
- Software Quality: The Top 10 Metrics to Build Confidence
Those four articles are actually excerpt of the advanced guide to software engineering metrics. Whatever metrics you choose to use, you need to follow a few rules, if you ever want to use engineering metrics in a non-toxic and efficient way. For more on that topic,I would recommend reading this article.
Let’s start with a review of what software engineering metrics are, and are not.
What are software engineering metrics?
In software, there are two categories of metrics and we use different names for those:
- Software engineering metrics, also known as software development metrics, or software delivery performance, every team has a different name for them, it seems. What is important here is that those indicators measure how software is being built and the engineering team productivity.
- Software or application performance metrics are the metrics of the software delivered, response time of the application, etc. NewRelic is typically one of the main providers of such metrics. You could also think of this in terms of customer satisfaction metrics (Net Promoter Score typically).
In this guide, we’re focusing on the first set of metrics. Those are the ones that help an organization scale and that will actually impact the second set, which is one of the end results of the work done by the team.
To understand software engineering metrics, we need to understand the main goals behind tracking and analyzing them:
- Determine the quality and productivity of the current software delivery process, and identify areas of improvement;
- Predict the quality and progress of the software development projects;
- Better manage workloads and priorities between teams and team members.
Overall, engineering metrics will help teams increase the engineering return on investment, by offering an assessment of the impact of every decision on projects, processes and performance goals.
Productivity or Velocity metrics
1. Project or Sprint Burndown
This metric is more about project status, than productivity per se. But it is related to the team output and therefore should be listed here. It is the metric all teams already track. Let’s dive a bit more into it so you can see an alternative way to look at it that will perhaps make more sense to your team and projects.
Some teams only consider the number of tasks to be done. But that would assume that all tasks are equal, which is simply not the case.
Some teams consider story points instead. That should indeed be more precise. You would need to assign points to all kinds of tasks, such as for bugs. But story points would still be limited estimations. A 5-point story might be longer to implement than 20 stories that are 1-point each!
The best way to do this is to look at your team’s history and compare the progress on the current release, to previous ones. That would give you a better indication about whether your team is on time or not.
When to use it?
The two most important questions for any manager is whether they will be on time for the next milestone or sprint, and what the risks are for being late or having poor quality. So this metric is essential, even for the team itself. Plus, this metric is a great way to understand how your teams work. That’s why having the ability to compare with previous releases would significantly help you better know your team.
2. Ticket close rate – Beware!
Ticket close rate are the amount of stories or story points your team or any contributor solved during a certain period of time (most probably sprint).
If you don’t include story points or some equivalent, this might be the most misleading metric you could use. Using this metric assumes that all tickets have approximately the same amount of work, and that is just not true. You should never use this metric to evaluate the individual performance of developers. A developer could fix one bug that nobody managed to solve, one that is impacting every aspect of your product’s performance, and it could take him or her a full week. In the meantime, another developer could fix 20 small not impactful bugs. Which one had the most impact on your team and company?
Even if you have story points, in the case of the above scenario, the bug would have a story point of 5, or the 20 bugs would have a story point of 1 each. And that is without considering that most teams don’t use points for bugs, only for features.
So when to use this metric?
You could use this metric to identify issues like a developer being stuck on a specific task. The point is NOT to use it as a way to evaluate the developer’s performance, or your team will just game the metrics without producing any meaningful work. Use this metric only as a way to understand how you as the manager can better help your team and initiate meaningful conversations. This metric also enables you to assess the “normal” speed of your team. Across time and team members, the discrepancies between story points and actual complexity should iron out themselves.
3. Lines of Code (LoC) Produced or Impact – Beware!
In the same example as for “ticket close rate”, the huge bug fix could be a change of one line of code. How can you compare that to a developer who imported a library or changed every header of every file? You cannot (or you should not). And similarly, you should never use this metric to evaluate individual performances of developers.
You can use LoC in the same way – to understand when your team is having difficulties, or maybe importing too many libraries for the sake of the quality of the project!
Some people compute an “Impact” metric, based on the amount of code in the changes, the severity of those changes and the number of files that the changes affected. The overall goal is to offer an improvement on the LoC. The issue is you still don’t know the actual content within those lines of code. So the “Impact” metric should be used in the same way as LoC. Indeed, the one-line bug fix example mentioned above still doesn’t work, as many other real-life examples show.
When to use it?
This is a hard question. Anything related to lines of code can’t be linked with actual developer productivity. You could use it as a secondary way to check if somebody is stuck and then to initiate conversations to help those people, but that’s it. You should NOT use it to measure velocity, even across time.
4. Code Churn
Code churn is typically measured as the percentage of a developer’s own code representing an edit to their own recent work. To compute it, we measure the lines of code (LoC) that were modified, added and deleted over a short period of time such as a few weeks, divided by the total number of lines of code added.
Engineers often test, rework, and explore various solutions to a problem —especially towards the beginning of a project when the problem doesn’t have a clear solution yet. Some people consider code churn as non-productive work and this is where the danger lies. Indeed, here are some common causes of high churn: unclear requirements, indecisive stakeholder, a difficult problem to solve, prototyping or polishing for a release. The code churn may also indicate that the developer is optimizing part of the code for better performance. It’s completely normal that the churn will evolve along a project.
Churn levels will vary between teams, individuals, types of projects and where those projects are in the development lifecycle. It is helpful to get a feeling for what your team’s “normal” looks like so you can identify when something is amiss. A sudden increase in churn rate may indicate that a developer is experiencing difficulty in solving a particular problem.
Gitprime could compute some average in the industry for what the “normal” should be for efficient teams and less efficient ones.
When to use it?
Code Churn is really useful only when its level unexpectedly moves materially above or below the individual’s or team’s ‘normal’. In that case, it may show a problem you should concern yourself with, especially nearing a deadline as the project may be at risk.
5. Refactoring rate
A common question for CTOs is ‘How much of your software engineering investment is spent on refactoring legacy code?’.
There could be many ways you could try to measure refactoring. One such way could be through the commits if you consider that refactoring is replacing old code – for instance older than 3 weeks. In this case, you could say that refactoring effort is the percentage ratio between the lines of code which is replacing old code, on the total number of code changed.
The issue is every way you can think of will eventually fall short of truthfully measuring refactoring. That doesn’t mean the metric described above is not useful. Consider it an indicator and track its trend.
As codebases age, some percentage of developer attention is required to maintain the code and keep things current. The challenge is that teams need to properly balance this kind of work with creating new features. Keeping note of the trend and the team’s ‘normal’ will help you do that.
When to use it?
Even though it’s very hard to compute actual refactoring, having some indicator and tracking its trend is very helpful for you to ensure you understand the team’s ‘normal’ and that you put enough effort on refactoring, which is essential for any software.
6. New Work
New work would just be defined as the lines of code added to the base, not replacing any existing code. This metric could be computed as the percentage of new code contributed on the total number of lines of code changed. In that sense, it would be complementary to code churn and refactoring.
It’s bad to have high technical debt, but it’s even worse to have a stagnant product.
When to use it?
One way to understand your team’s code effort is to measure code churn, refactoring and new work. Keeping an eye on this trend over time will help you understand your team’s actual code focus. Depending on the stage of your project, the breakdown between those 3 metrics will make sense or not.
Please note that some metrics may have not made the list because they were either not popular enough, or too far-fetched to draw any value from them. Remember software metrics should be easily understandable and should potentially lead to change and have business value. Otherwise, what’s the point?
Let me know what you think, if I’ve missed any. The end goal is to build a comprehensive list of best practices concerning software engineering metrics velocity.
Finally, if you are interested in empowering your team with software engineering data to bolster better collaboration and decision-making, have a look at the software engineering management platform we are building at Anaxi that brings your systems, data, and people together for this purpose.