Metrics | DevOps

A DevOps team wants to be in control. Therefore, the team needs to measure relevant parameters. The resulting metrics give useful information about the status and are a starting point for improvement measures. So, the team needs metrics related to what they want to control and what they want to improve.

What is not defined cannot be measured. What is not measured cannot be improved. What is not improved, always degrades.
– William Thomson (Lord Kelvin)

What are metrics and why and how do you use them? This is explained in the first section, together with the fundamentals of “good” metrics. Then, in the next section the relation between metrics and continuous improvement, and how to define a set of metrics is explained. In the sections that follow, we will present four sets of DevOps metrics.

The first is a (non-exhaustive) list of metrics for effectiveness and efficiency. The second set is part of the DORA [DORA 2019] benchmark model with which the performance level of your DevOps organization can be established. The third set contains the top 20 QA DevOps metrics as collected by Forrester [Forrester 2019]. The fourth set is a long non-exhaustive list of raw metrics you can choose from.

Keep in mind that these are just example sets, you can find other sets on the internet.

Fundamentals of good metrics

Everyone is involved in choosing “good” metrics and setting business goals; organization, team and individuals. Start at the organizational level and work your way down and repeat this step until each team member has clear metrics and goals for which they are responsible. But what are “good” metrics? “Good” can be broadly defined as metrics that show whether goals that have previously been given priority are being achieved. Choosing “good” metrics is an iterative process. Rarely are the “good” metrics chosen immediately. The most successful users of metrics know that their metrics are evolving and must actively adjust their metrics to achieve the best measure of their goals. We define three fundamentals for “good” metrics:

Good metrics are important for the growth and goals of the organization and/or team.
The most important metrics must always be closely linked to the primary goal of the organization and/or team, whereby it is important to choose metrics that clearly indicate the current state of the organization in relation to its goals.
Good metrics focus on improvement.
Good metrics measure progress, which means that room is needed for improvement. For example, a 10% shorter lead time for software delivery. With some metrics, the team may be focused on maintaining that level instead of improving it. For example, with a customer satisfaction of 100%.
Good metrics inspire action.
When metrics are important and processes can be improved, the organization knows what to do or which questions to ask. Why has the software delivery speed dropped? Have we made tool changes or was there more rework? By asking questions, possible causes are determined and action can immediately be taken.

Remark: you may wonder “Do bad metrics also exist?” Yes, they do. For example, the metric of the percentage of test cases that is automated, especially if the assumption is that 100% automation is the holy grail. This is because the quality of the test cases is much more important than the level of automation. Don’t misunderstand us, we do see great value in test automation, but it is not a goal in itself.

Metrics and continuous improvement

Metrics-oriented thinking is key to continuous improvement. DevOps aims to improve business outcomes, but there are challenges in selecting the right metrics and collecting the metric data. Continuous improvement requires continuous change, measurement, and iteration. What’s more, the agreed-upon metrics drive this cycle, but also create insights for the broader organization. Metrics and continuous improvement are strongly related, especially in DevOps. In this section, we will describe the metrics, and in “Continuous improvement“, we will describe continuous improvement.

In order to improve the DevOps activities, not only the QA & testing activities have to be measured, you also need a well-balanced set of metrics with which the consequences of a particular implemented improvement measure can be compared, considering the situation before the measure was adopted. Often a benchmark per industry, topic or specific area is chosen. In DevOps, we really need such a set of metrics, since we should continually evaluate the effectiveness and efficiency of the activities to both monitor the on-going activities and look for improvement opportunities.

It is not always easy to explain the difference between efficiency and effectiveness. Efficiency and effectiveness are two related concepts with a delicate but nevertheless important difference in meaning. They are often mistakenly used interchangeably. Here is an explanation of the differences based on Joris in ‘t Veld’s work [Veld 1988].

Effectiveness indicates whether the outcome of the process has been realized or not. (A helpful metaphor here is determining whether the destination was reached.) Unlike efficiency, effectiveness does not relate to the process itself, but to its outcome.
Efficiency is the degree of utilization of resources to achieve a particular goal. (A metaphor such as how short the route is to the destination may be helpful.) A process is said to be efficient when few resources are used in relation to the common, agreed-upon standard. These resources could be, for example, time, effort (person-hours), commodities, or money.

The following is how these terms are defined in the ISO25010 standard [ISO25010 2011]:

Effectiveness is the accuracy and completeness with which users achieve specified goals.

Efficiency is the resources expended in relation to the accuracy and completeness with which users achieve goals.

Generally, one tries to organize processes in such a way that they are both efficient and effective – in other words, increasing productivity. Obviously, this is easier said than done, because efficiency targets may conflict with effectiveness [Black 2017].

How to define a set of metrics

A set of applicable metrics – measuring effectiveness and efficiency – can be very useful in both providing insight into the status of the process and in supporting decision-making in relation to the stated objectives. It also has a strong relationship with Indicators from the VOICE model. You could even use a metric as an indicator, with which you could establish whether the objectives are met, and the pursued value can be achieved. As stated, each test project is different, so the set of metrics will differ from one project to another and even from one test objective to another. In DevOps, you can set up your own set of metrics, perhaps with the Goal-Question-Metric (GQM) approach [Basili 1994], but you could also start with a few of the metrics as described in this section. For more explanation refer to GQM. It is important to note that the chosen goals should be carefully considered. It is also good to realize that choosing and keeping the same metrics will stop the improvement at some point. In addition, issues regularly arise to which you must adjust the metric to be measured.

If you want to define metrics, you may want to take the following best practices into account before or while defining those metrics:

Start small.
Start with a limited set of metrics and build it up slowly.
Keep the metrics simple.
The definition should appeal to the intuition of those involved. The more complicated the metrics, the more difficult they are to interpret and to use.
Define easy-to-gather metrics.
Choose metrics that are relatively simple to collect. The more difficult it is to collect data, the greater the chance that it will not be accepted or wrongly registered. We advise to start with data that is already available.
Avoid data errors.
Collect data electronically as much as possible. This is the quickest way of data collection and it also avoids the introduction of manual errors into the data set. If the data is of poor quality (not consistent / not reliable), they cannot be used for the metric.
Record accurately.
In the case of billable time registration, for example, it sometimes happens that incorrect billing codes are used. For example, software engineers especially tend to book time against testing codes when they are solving a problem, which is not a testing task but a development task.
Keep presentations simple.
Avoid complicated statistical techniques and models during presentations. Just use easy-to-understand figures like tables, diagrams, and pie charts.
Be transparent.
Provide feedback to people who have handed in the data as quickly as possible. Show them what you did with the information.

As we all know, different projects will have different (test) objectives. In one situation, the main objective could be risk reduction (effectiveness), while in another it could be early time to market and in a third it could be cost reduction (both efficiency). There should be a balance between all those aspects. A good balance between effectiveness and efficiency will improve productivity.

Effectiveness and efficiency metrics

In addition to the obvious QA & testing effectiveness and efficiency metric tables, we added a DevOps efficiency metrics table. In DevOps it is all about continuous delivery and shipping code as fast as possible, while not breaking things. By tracking DevOps efficiency metrics, we can evaluate just how fast we can move before things start to break.

The metrics in the tables below are not meant to be used all together. They are merely examples from which you, depending on your chosen goals, select the relevant metrics.

Effectiveness Metrics	How to Measure
Percentage of unit tests pass	Ratio between the number of passed unit tests and the total number of unit tests.
Percentage of code coverage	Ratio between the number of tested program statements and the total number of program statements.
Percentage of requirements coverage	Percentage of requirements covered by identified test cases.
Percentage of critical anomalies detected	Ratio between the number of critical anomalies detected and the total number of critical anomalies.
Percentage of identified risks covered by (executed) tests	Ratio between identified risks and the total number of (executed) tests in relation to these risks.
Anomaly detection effectiveness	The total number of anomalies found during testing, divided by the total number of anomalies – estimated partly on the basis of production data.
Percentage of appropriate use of testing techniques	Ratio between actual test coverage and the planned (or needed or agreed) test coverage.
Percentage of anomalies caused by modifications	Anomalies because of modifications that are tested, as a part of the total number of anomalies arising as a result of changes.

Efficiency Metrics	How to Measure
Percentage of automated tests related to risk coverage	Ratio between risk coverage related automated tests and the total number of automated tests.
Fault density	The ratio between the number of faults found and the size of the application.
Percentage of test costs	Ratio between the test costs and the total development costs.
Savings achieved by reusing test products	Effort, duration, and/or resource savings achieved on the current project based on testing work product reuse from previous projects.
Cost per detected anomaly	Total test cost divided by the number of anomalies found.
Budget utilization	Ratio between the budget and the actual cost of testing.
Test efficiency	The number of required tests versus the number of anomalies found.
Number of anomalies found (relative)	The ratio between the number of anomalies found and the size of the system.
Savings of the test	Indicates how much has been saved by carrying out the test. In other words, what would the losses have amounted to if the test had not been carried out? [Black 2004] [Aalst 2010]

DevOps Efficiency Metrics	How to Measure
Percentage of successful code builds	Ratio between problem-free code builds and the total number of code builds.
Deployment frequency	The number of deployments per given period of time.
Deployment size	The number of user stories, features, story points, etc. deployed per given period of time.
Deployment time	The actual lead time of the deployment itself.
Delivery time	The time between starting to work on an item (e.g. user story) and releasing the result (e.g. code) into production.
Percentage down time of pipeline	Ratio between down time of the CI/CD pipeline and the total time.
Percentage of failed deployments	Ratio between number of failed deployments and total number of deployments.
Mean time to detection (MTTD)	Ratio between the sum of all the time incident detection times and the total number of incidents over a given period of time
Mean time to recovery (MTTR)	Ration between the total recovery time and the total number of recoveries over a given period of time.

DORA DevOps performance metrics

As a DevOps organization you can set up your own set of metrics, but you could also consider using an existing set of metrics. For example, as mentioned in the Accelerate State of DevOps Report from DORA [DORA 2019]. An additional advantage is that you can also use this as a benchmark. DORA assessed teams to understand how they are developing, delivering and operating software systems. In their assessment, three types of performance levels are identified: high, medium and low performers. In the report, the division between these levels was respectively: 48%, 37% and 15%. Within the high-performance level 7% was considered elite level performers.

The following table gives you an idea of the performance level at which your organization operates. For more detail and actual figures refer to www.devops-research.com.

Four key metrics of the DORA benchmark.

	Elite	High	Medium	Low
Deployment frequency	On-demand (multiple deploys/day)	Between once per hour and once per day	Between once per week and once per month	Between once per week and once per month
Lead time for changes	Less than one hour	Between one day and one week	Between one week and one month	Between one month and six months
Time to restore service	Less than one hour	Less than one day	Less than one day	Between one week and one month
Change failure rate	0-15%	0-15%	0-15%	46-60%

Top 20 QA metrics collected by Forrester

To provide the DevOps community with an objective perspective on which quality metrics are most critical for DevOps success, Tricentis commissioned Forrester to investigate the topic. In the next table the top 20 QA DevOps metrics as collected by Forrester [Forrester 2019] are listed. Here too, metrics choices are made, depending on the chosen goals.

Top 20 QA DevOps metrics.

	Build	Functional validation	Integration testing	End-to-End regression testing
1	Automated Tests Prioritized by Risk	Requirements Covered by Tests	Requirements Covered by Tests	Percentage of automated E2E tests
2	Successful Code Builds	Critical Anomalies	New Anomalies	Requirements Covered by Tests
3	Unit Test Pass/Fail Rate	Pass/Fail Rate	Anomaly Density	Total Anomalies
4	Total Number of Anomalies	Anomaly Density	Test Pass/Fail Rate	Test Cases Executed
5	Code Coverage	Risk Coverage	Functional Code Coverage	Test Case Coverage

Build

The “Automated Tests Prioritized by Risk” DevOps quality metric is used by teams who have business risks clearly defined and tests correlated to those risks.
The “Successful Code Builds” DevOps quality metric measures the number of builds that are free of any errors or warnings.
The “Unit Test Pass/Fail Rate” DevOps quality metric measures the percentage of unit tests that pass or fail.
The “Total Number of Anomalies” DevOps quality metric measures the number of anomalies identified during each build and highlights how the total number of anomalies either diminishes or increases as teams get closer to their final production build.
The “Code Coverage” DevOps quality metric measures what percentage of the source code is exercised by tests. Code coverage can be measured in several different ways; for example, statement (line) coverage, branch coverage, etc.

Functional validation

The “Requirements Covered By Tests” DevOps quality metric looks at the percentage of requirements that corresponds to functional test cases. It can be measured at different granularities, depending on what is most important for the team.
The “Critical Anomalies” DevOps quality metric tracks how the total amount of critical functional anomalies changes over time.
The “Pass/Fail Rate” DevOps quality metric measures the percentage of functional tests that pass or fail efforts concerning tests that matter most to the business.
The “Anomaly Density” DevOps quality metric is the number of anomalies found during functional testing in an application divided by the size of the application.
The “Risk Coverage” DevOps quality metric looks at the percentage of risks that are covered by functional tests (or depending on the risk possibly also non-functional risks). It can be measured at different granularities, depending on what is most important for the team.

Integration testing

The “Requirements Covered By Tests” DevOps quality metric looks at the percentage of requirements that are covered by integration tests. It can be measured at different granularities, depending on what is most important for the team.
The “New Anomalies” DevOps quality metric measures the sum of new anomalies identified during integration testing.
The “Anomaly Density” DevOps quality metric is the number of anomalies found during integration testing in an application divided by the size of the application.
The “Test Pass/Fail Rate” DevOps quality metric measures the percentage of integration tests that pass or fail.
The “Functional Code Coverage” DevOps quality metric measures how well integration test cases cover code.

End-to-end regression testing

The “Percentage of Automated E2E tests” DevOps quality metric measures how much of the total test suite is automated. It is calculated by dividing the total number of test cases by the number of automated test cases.
(Please note that more automation is not necessarily better.)
The “Requirements Covered by Tests” DevOps quality metric looks at the percentage of requirements that are correlated to E2E tests. It can be measured at different granularities, depending on what is most important for the team.
The “Total Anomalies” DevOps quality metric measures the sum of all anomalies identified during E2E testing.
The “Test Cases Executed” DevOps quality metric measures the total number of tests executed.
The “Test Case Coverages” DevOps quality metric measures the effectiveness of tests by looking at how well test cases cover the applications functional requirements. It can be measured at different granularities, depending on what is most important for the team.

Long non-exhaustive list of raw metrics

The below table is a long non-exhaustive list of raw metrics from which you can compile a set that applies to your chosen goals and specific situation. You may adopt these metrics as they are or use them as a basis to adapt them.

Long non-exhaustive list of raw metrics.

automated tests prioritized by risk	blocked test cases
build failure rate	build verification tests
code churn	code coverage
critical anomalies	anomaly density
anomaly status by priority	anomaly status by severity
functional code coverage	new critical anomalies
new anomalies	new requirements added
new requirements tested	number of automated tests
open anomalies	pass/fail rate
percent of passed tests for new requirements	percentage of requirements tested
percentage of test cases passed	percentage of automated e2e tests
planned test case coverage	prelease readiness
requirements covered by tests	risk coverage
static analysis results	successful code builds
test case coverage	test cases executed
test effectiveness	test efficiency
test execution time	test hygiene
test pass/fail rate	tests did not run
time spent preparing test data	time spent preparing test environments
total critical anomalies	total anomalies
total number of anomalies	total test execution time
unit test coverage	unit test pass/fail rate

How do indicators and metrics relate

In TMAP we distinguish two reasons for measuring: indicators and metrics.

Indicators are used to determine whether the business value and the IT objectives are achieved by the IT system and the business process it supports, to accomplish this indicators measure products, processes and/or people.

Metrics relate to continuous improvement (improving the IT delivery process and the people involved and indirectly also the products). If you perform a specific measurement the result may be used as an indicator, or as a metric, and in some cases the same measurement may even be used both as an indicator and a metric. Below figure shows how measurements relate to indicators and/or metrics.

An example of a measurement that is typically an indicator is “conversion rate” (people that visit a website and actually buy), it relates to the business value only.

An example of a measurement that is typically a metric is “deployment frequency”, it relates to improving the IT delivery process only.

An example of a measurement that can be both an indicator and a metric is “escaped fault ratio” (anomaly detection effectiveness) which both is about how well the business value is achieved and how the IT delivery process can be improved.