Metrics | DevOps

A DevOps team wants to be in control. Therefore, the team needs to measure relevant parameters. The resulting metrics give useful information about the status and are a starting point for improvement measures. So, the team needs metrics related to what they want to control and what they want to improve.

What is not defined cannot be measured. What is not measured cannot be improved. What is not improved, always degrades.

– William Thomson (Lord Kelvin)

What are metrics and why and how do you use them? This is explained in the first section, together with the fundamentals of "good" metrics. Then, in the next section the relation between metrics and continuous improvement, and how to define a set of metrics is explained. In the sections that follow, we will present four sets of DevOps metrics.

The first is a (non-exhaustive) list of metrics for effectiveness and efficiency. The second set is part of the DORA [DORA 2019] benchmark model with which the performance level of your DevOps organization can be established. The third set contains the top 20 QA DevOps metrics as collected by Forrester [Forrester 2019]. The fourth set is a long non-exhaustive list of raw metrics you can choose from.

Keep in mind that these are just example sets, you can find other sets on the internet.

Fundamentals of good metrics

Everyone is involved in choosing "good" metrics and setting business goals; organization, team and individuals. Start at the organizational level and work your way down and repeat this step until each team member has clear metrics and goals for which they are responsible. But what are "good" metrics? "Good" can be broadly defined as metrics that show whether goals that have previously been given priority are being achieved. Choosing "good" metrics is an iterative process. Rarely are the "good" metrics chosen immediately. The most successful users of metrics know that their metrics are evolving and must actively adjust their metrics to achieve the best measure of their goals. We define three fundamentals for "good" metrics:

  • Good metrics are important for the growth and goals of the organization and/or team.
    The most important metrics must always be closely linked to the primary goal of the organization and/or team, whereby it is important to choose metrics that clearly indicate the current state of the organization in relation to its goals.
  • Good metrics can be improved.
    Good metrics measure progress, which means that room is needed for improvement. For example, a 10% shorter lead time for software delivery. With some metrics, the team may be focused on maintaining that level instead of improving it. For example, with a customer satisfaction of 100%.
  • Good metrics inspire action.
    When metrics are important and processes can be improved, the organization knows what to do or which questions to ask. Why has the software delivery speed dropped? Have we made tool changes or was there more rework? By asking questions, possible causes are determined and action can immediately be taken.

Remark: you may wonder "Do bad metrics also exist?" Yes, they do. For example, the metric of the percentage of test cases that is automated, especially if the assumption is that 100% automation is the holy grail. This is because the quality of the test cases is much more important than the level of automation. Don't misunderstand us, we do see great value in test automation, but it is not a goal in itself.

Metrics and continuous improvement

Metrics-oriented thinking is key to continuous improvement. DevOps aims to improve business outcomes, but there are challenges in selecting the right metrics and collecting the metric data. Continuous improvement requires continuous change, measurement, and iteration. What's more, the agreed-upon metrics drive this cycle, but also create insights for the broader organization. Metrics and continuous improvement are strongly related, especially in DevOps. In this section, we will describe the metrics, and in "Continuous improvement", we will describe continuous improvement.

In order to improve the DevOps activities, not only the QA & testing activities have to be measured, you also need a well-balanced set of metrics with which the consequences of a particular implemented improvement measure can be compared, considering the situation before the measure was adopted. Often a benchmark per industry, topic or specific area is chosen. In DevOps, we really need such a set of metrics, since we should continually evaluate the effectiveness and efficiency of the activities to both monitor the on-going activities and look for improvement opportunities.

It is not always easy to explain the difference between efficiency and effectiveness. Efficiency and effectiveness are two related concepts with a delicate but nevertheless important difference in meaning. They are often mistakenly used interchangeably. Here is an explanation of the differences based on Joris in 't Veld's work [Veld 1988].

  • Effectiveness indicates whether the outcome of the process has been realized or not. (A helpful metaphor here is determining whether the destination was reached.) Unlike efficiency, effectiveness does not relate to the process itself, but to its outcome.
  • Efficiency is the degree of utilization of resources to achieve a particular goal. (A metaphor such as how short the route is to the destination may be helpful.) A process is said to be efficient when few resources are used in relation to the common, agreed-upon standard. These resources could be, for example, time, effort (person-hours), commodities, or money.

The following is how these terms are defined in the ISO25010 standard [ISO25010 2011]:

Effectiveness is the accuracy and completeness with which users achieve specified goals.
Efficiency is the resources expended in relation to the accuracy and completeness with which users achieve goals.

Generally, one tries to organize processes in such a way that they are both efficient and effective – in other words, increasing productivity. Obviously, this is easier said than done, because efficiency targets may conflict with effectiveness [Black 2017].

effectiveness vs efficiency

How to define a set of metrics

A set of applicable metrics – measuring effectiveness and efficiency – can be very useful in both providing insight into the status of the process and in supporting decision-making in relation to the stated objectives. It also has a strong relationship with Indicators from the VOICE model. You could even use a metric as an indicator, with which you could establish whether the objectives are met, and the pursued value can be achieved. As stated, each test project is different, so the set of metrics will differ from one project to another and even from one test objective to another. In DevOps, you can set up your own set of metrics, perhaps with the Goal-Question-Metric (GQM) approach [Basili 1994], but you could also start with a few of the metrics as described in this section. For more explanation refer to GQM. It is important to note that the chosen goals should be carefully considered. It is also good to realize that choosing and keeping the same metrics will stop the improvement at some point. In addition, issues regularly arise to which you must adjust the metric to be measured.

If you want to define metrics, you may want to take the following best practices into account before or while defining those metrics:

  • Start small.
    Start with a limited set of metrics and build it up slowly.
  • Keep the metrics simple.
    The definition should appeal to the intuition of those involved. The more complicated the metrics, the more difficult they are to interpret and to use.
  • Define easy-to-gather metrics.
    Choose metrics that are relatively simple to collect. The more difficult it is to collect data, the greater the chance that it will not be accepted or wrongly registered. We advise to start with data that is already available.
  • Avoid data errors.
    Collect data electronically as much as possible. This is the quickest way of data collection and it also avoids the introduction of manual errors into the data set. If the data is of poor quality (not consistent / not reliable), they cannot be used for the metric.
  • Record accurately.
    In the case of billable time registration, for example, it sometimes happens that incorrect billing codes are used. For example, software engineers especially tend to book time against testing codes when they are solving a problem, which is not a testing task but a development task.
  • Keep presentations simple.
    Avoid complicated statistical techniques and models during presentations. Just use easy-to-understand figures like tables, diagrams, and pie charts.
  • Be transparent.
    Provide feedback to people who have handed in the data as quickly as possible. Show them what you did with the information.

As we all know, different projects will have different (test) objectives. In one situation, the main objective could be risk reduction (effectiveness), while in another it could be early time to market and in a third it could be cost reduction (both efficiency). There should be a balance between all those aspects. A good balance between effectiveness and efficiency will improve productivity.

Effectiveness and efficiency metrics

In addition to the obvious QA & testing effectiveness and efficiency metric tables, we added a DevOps efficiency metrics table. In DevOps it is all about continuous delivery and shipping code as fast as possible, while not breaking things. By tracking DevOps efficiency metrics, we can evaluate just how fast we can move before things start to break.

The metrics in the tables below are not meant to be used all together. They are merely examples from which you, depending on your chosen goals, select the relevant metrics.

Effectiveness Metrics

How to Measure

Percentage of unit tests pass

Ratio between the number of passed unit tests and the total number of unit tests.

Percentage of code coverage

Ratio between the number of tested program statements and the total number of program statements.

Percentage of requirements coverage

Percentage of requirements covered by identified test cases.

Percentage of critical anomalies detected

Ratio between the number of critical anomalies detected and the total number of critical anomalies.

Percentage of identified risks covered by (executed) tests

Ratio between identified risks and the total number of (executed) tests in relation to these risks.

Anomaly detection effectiveness

The total number of anomalies found during testing, divided by the total number of anomalies – estimated partly on the basis of production data.

Percentage of appropriate use of testing techniques

Ratio between actual test coverage and the planned (or needed or agreed) test coverage.

Percentage of anomalies caused by modifications

Anomalies because of modifications that are tested, as a part of the total number of anomalies arising as a result of changes.

 

Efficiency Metrics

How to Measure

Percentage of automated tests related to risk coverage

Ratio between risk coverage related automated tests and the total number of automated tests.

Fault density

The ratio between the number of faults found and the size of the application.

Percentage of test costs

Ratio between the test costs and the total development costs.

Savings achieved by reusing test products

Effort, duration, and/or resource savings achieved on the current project based on testing work product reuse from previous projects.

Cost per detected anomaly

Total test cost divided by the number of anomalies found.

Budget utilization

Ratio between the budget and the actual cost of testing.

Test efficiency

The number of required tests versus the number of anomalies found.

Number of anomalies found (relative)

The ratio between the number of anomalies found and the size of the system.

Savings of the test

Indicates how much has been saved by carrying out the test. In other words, what would the losses have amounted to if the test had not been carried out? [Black 2004] [Aalst 2010]

 

DevOps Efficiency Metrics

How to Measure

Percentage of successful code builds

Ratio between problem-free code builds and the total number of code builds.

Deployment frequency

The number of deployments per given period of time.

Deployment size

The number of user stories, features, story points, etc. deployed per given period of time.

Deployment time

The actual lead time of the deployment itself.

Delivery time

The time between starting to work on an item (e.g. user story) and releasing the result (e.g. code) into production.

Percentage downtime

Ratio between down time of the CI/CD pipeline and the total time.

Percentage of failed deployments

Ratio between number of failed deployments and total number of deployments.

Mean time to detection (MTTD)

Ratio between the sum of all the time incident detection times and the total number of incidents over a given period of time

Mean time to recovery (MTTR)

Ration between the total recovery time and the total number of recoveries over a given period of time.

 

DORA DevOps performance metrics

As a DevOps organization you can set up your own set of metrics, but you could also consider using an existing set of metrics. For example, as mentioned in the Accelerate  State of DevOps Report from DORA [DORA 2019]. An additional advantage is that you can also use this as a benchmark. DORA assessed teams to understand how they are developing, delivering and operating software systems. In their assessment, three types of performance levels are identified: high, medium and low performers. In the report, the division between these levels was respectively: 48%, 37% and 15%. Within the high-performance level 7% was considered elite level performers.

The following table gives you an idea of the performance level at which your organization operates. For more detail and actual figures refer to www.devops-research.com.

Four key metrics of the DORA benchmark.

 

Elite

High

Medium

Low

Deployment frequency

On-demand (multiple deploys/day)

Between once per hour and once per day

Between once per week and once per month

Between once per week and once per month

Lead time for changes

Less than one hour

Between one day and one week

Between one week and one month

Between one month and six months

Time to restore service

Less than one hour

Less than one day

Less than one day

Between one week and one month

Change failure rate

0-15%

0-15%

0-15%

46-60%

Top 20 QA metrics collected by Forrester

To provide the DevOps community with an objective perspective on which quality metrics are most critical for DevOps success, Tricentis commissioned Forrester to investigate the topic. In the next table the top 20 QA DevOps metrics as collected by Forrester [Forrester 2019] are listed. Here too, metrics choices are made, depending on the chosen goals.

Top 20 QA DevOps metrics.

 

Build

Functional validation

Integration testing

End-to-End regression testing

1

Automated Tests Prioritized by Risk

Requirements Covered by Tests

Requirements Covered by Tests

Percentage of automated E2E tests

2

Successful Code Builds

Critical Anomalies

New Anomalies

Requirements Covered by Tests

3

Unit Test Pass/Fail Rate

Pass/Fail Rate

Anomaly Density

Total Anomalies

4

Total Number of Anomalies

Anomaly Density

Test Pass/Fail Rate

Test Cases Executed

5

Code Coverage

Risk Coverage

Functional Code Coverage

Test Case Coverage

Build
  1. The "Automated Tests Prioritized by Risk" DevOps quality metric is used by teams who have business risks clearly defined and tests correlated to those risks.
  2. The "Successful Code Builds" DevOps quality metric measures the number of builds that are free of any errors or warnings.
  3. The "Unit Test Pass/Fail Rate" DevOps quality metric measures the percentage of unit tests that pass or fail.
  4. The "Total Number of Anomalies" DevOps quality metric measures the number of anomalies identified during each build and highlights how the total number of anomalies either diminishes or increases as teams get closer to their final production build.
  5. The "Code Coverage" DevOps quality metric measures what percentage of the source code is exercised by tests. Code coverage can be measured in several different ways; for example, statement (line) coverage, branch coverage, etc.
Functional validation
  1. The "Requirements Covered By Tests" DevOps quality metric looks at the percentage of requirements that corresponds to functional test cases. It can be measured at different granularities, depending on what is most important for the team.
  2. The "Critical Anomalies" DevOps quality metric tracks how the total amount of critical functional anomalies changes over time.
  3. The "Pass/Fail Rate" DevOps quality metric measures the percentage of functional tests that pass or fail efforts concerning tests that matter most to the business.
  4. The "Anomaly Density" DevOps quality metric is the number of anomalies found during functional testing in an application divided by the size of the application.
  5. The "Risk Coverage" DevOps quality metric looks at the percentage of risks that are covered by functional tests (or depending on the risk possibly also non-functional risks). It can be measured at different granularities, depending on what is most important for the team.
Integration testing
  1. The "Requirements Covered By Tests" DevOps quality metric looks at the percentage of requirements that are covered by integration tests. It can be measured at different granularities, depending on what is most important for the team.
  2. The "New Anomalies" DevOps quality metric measures the sum of new anomalies identified during integration testing.
  3. The "Anomaly Density" DevOps quality metric is the number of anomalies found during integration testing in an application divided by the size of the application.
  4. The "Test Pass/Fail Rate" DevOps quality metric measures the percentage of integration tests that pass or fail.
  5. The "Functional Code Coverage" DevOps quality metric measures how well integration test cases cover code.
End-to-end regression testing
  1. The "Percentage of Automated E2E tests" DevOps quality metric measures how much of the total test suite is automated. It is calculated by dividing the total number of test cases by the number of automated test cases.
    (Please note that more automation is not necessarily better.)
  2. The "Requirements Covered by Tests" DevOps quality metric looks at the percentage of requirements that are correlated to E2E tests. It can be measured at different granularities, depending on what is most important for the team.
  3. The "Total Anomalies" DevOps quality metric measures the sum of all anomalies identified during E2E testing.
  4. The "Test Cases Executed" DevOps quality metric measures the total number of tests executed.
  5. The "Test Case Coverages" DevOps quality metric measures the effectiveness of tests by looking at how well test cases cover the applications functional requirements. It can be measured at different granularities, depending on what is most important for the team.

Long non-exhaustive list of raw metrics

The below table is a long non-exhaustive list of raw metrics from which you can compile a set that applies to your chosen goals and specific situation. You may adopt these metrics as they are or use them as a basis to adapt them.

Long non-exhaustive list of raw metrics.

automated tests prioritized by risk

blocked test cases

build failure rate

build verification tests

code churn

code coverage

critical anomalies

anomaly density

anomaly status by priority

anomaly status by severity

functional code coverage

new critical anomalies

new anomalies

new requirements added

new requirements tested

number of automated tests

open anomalies

pass/fail rate

percent of passed tests for new requirements

percentage of requirements tested

percentage of test cases passed

percentage of automated e2e tests

planned test case coverage

prelease readiness

requirements covered by tests

risk coverage

static analysis results

successful code builds

test case coverage

test cases executed

test effectiveness

test efficiency

test execution time

test hygiene

test pass/fail rate

tests did not run

time spent preparing test data

time spent preparing test environments

total critical anomalies

total anomalies

total number of anomalies

total test execution time

unit test coverage

unit test pass/fail rate