Skip to main content

Maintainability

What it is

Maintainability is your ability to spend time on planned work.

Your ability to focus on planned work depends on how often you are forced to deal with unplanned work. If you spend too much time on unplanned work, you will not be able to achieve high levels of velocity.

What it covers

Maintainability covers all forms of "infrastructure debt."

The debt analogy

In the financial world, debt means we owe someone money. The more debt we have, the more we pay in interest. If debt grows faster than we can service it, we risk financial bankruptcy.

In the infrastructure world, debt takes on a variety of forms:

  • Drift - Your infrastructure code doesn't match what's actually deployed in your cloud
  • Non-codified assets - Resources exist in your cloud but aren't represented in code
  • Outdated IaC - You're using old patterns or tools when better options exist
  • Non-standardization - Your organization solves the same problem in many different ways

We pay off infrastructure debt with resources, mainly time, focus, and money.

The more infrastructure debt we accumulate, the more resources we must allocate just to maintain the status quo. If we can't allocate enough resources to both maintain current systems and achieve our velocity goals, we risk DevOps bankruptcy, a state where the infrastructure becomes unmaintainable and requires fundamental restructuring.

How to improve it

Identify the sources of debt, and for each one, put in place processes and tooling that systematically and proactively address the debt.

Automated drift detection

Debt source: Drift

Prevention and remediation: Automatically detect drift on a scheduled basis, along with a proposal on how to resolve it. E.g. Open a pull request on a weekly basis that identifies drift and can resolve it by merging the pull request.

Related: Drift detector component

Streamlined resource imports

Debt source: Non-codified assets

Prevention and remediation: Use tooling that can discover unmanaged resources and generate the necessary code to bring them under IaC management.

Related: Importer component

Automated IaC updates

Debt source: Outdated IaC

Prevention and remediation: Automate the process of updating your IaC to use the latest approved versions of tools, modules, and patterns. Track available updates and provide automated pull requests that upgrade dependencies while running tests to ensure compatibility.

Related: IaC updater component

Infrastructure estate insights

Debt source: Non-standardization

Prevention and remediation: Provide visibility into each repo, environment, and unit to identify where teams do not adhere to your standards. Make this information easy to discover, both for the platform engineers and application teams.

Related: Scorecard component

How to measure it

As we've seen, maintainability breaks down into specific sources of debt. For each debt source, focus on the critical metric that drives the most insight.

Let's look at those now, though your own mileage may vary.

Debt source: Drift

Measure drift by tracking the drift rate, which is the percentage of your IaC resources that have drifted from their codified state.

Debt source: Non-codified assets

Measure non-codified assets by tracking the IaC coverage rate, which is the percentage of your cloud resources that are managed with Infrastructure as Code.

Debt source: Outdated IaC

Measure outdated IaC by tracking the up-to-date coverage rate, which is the percentage of your deployed infrastructure that uses the latest versions of your approved tools and patterns.

Debt source: Non-standardization

You can break standardization down into a discrete set of categories such as:

  • Tooling choices: IaC tool, CI/CD tool, etc.
  • Tooling configuration: IaC patterns, CI/CD configuration, etc.
  • Component Use: Catalog, Runbooks, etc.
  • Governance Status: Static analysis, Security, cost management, policies, etc.

You can evaluate how well these standards are applied at the repo, environment, or unit level. You can measure each standard as either a binary value (complies / does not comply) or range value (e.g. 0 to 10).

For example, you could assess whether a given unit uses Terragrunt, which might be your standard IaC orchestrator. You could then ask how many units in a given environment or repo use Terragrunt.

Next

You've now covered the three fundamental concerns! Now it's time to talk about how we build a developer platform to address them. Let's start by covering the principles of such a platform.