Automation
Overview
Automation focuses on the codification of otherwise manual processes to improve the speed, scale, and reliability of those processes. Codified processes are typically represented as programs or scripts that can either be manually run or automatically triggered at various points in the software development lifecycle.
There are three domains where automation is deployed to the greatest effect:
-
Local Development: programs that aid developers in setting up their local development environment, interact easily and safely with live systems, and generally reduce overall developer toil
-
Continuous Integration (CI): programs that focus on rapidly integrating high quality code into shared branches within a version control system. This includes: unit and integration testing, static code analysis, vulnerability testing, change previews, conflict resolution, approval gating, etc.
-
Continuous Deployment (CD): programs for moving integrated code frequently and resiliently to our various environments and ultimately to production. This includes several components: infrastructure-as-code (IaC), automated and consistent deployments, deployment monitoring and rollback mechanisms, end-to-end tests, notifications and approval gating, etc.
While "continuous" is a part of the common nomenclature for describing these buckets, this is more aspirational than practical in many circumstances. For example, manually triggering automated deployments is a completely acceptable practice in many organizational contexts.
Motivation
This pillar provides organizational value for the following reasons:
-
Improves development velocity by
- providing better means and tooling to quickly review code changes
- reducing risk and cost of re-work through testing and code analysis
- reducing collaboration friction by creating consistent development standards and workflows across the organization
- reducing toil that forces developers to context switch away from feature work
-
Minimizes the change failure rate (CFR) by reducing the number of bugs that make it to production and reduces the mean time to resolution (MTTR) by improving the speed to deliver fixes
-
De-silos key knowledge through codification and integration into a CI / CD platform, enhancing long-term organizational sustainability
-
Improves system security by gating production access to automated change management systems, not only limiting scope of access but also automatically collecting detailed audit logs for each change
-
Provides a mechanism for automating compliance objectives across a distributed number of teams
Benchmark
Metrics
These are common measurements that can help an organization assess their performance in this pillar. These are intended to be assessed within the context of performance on the key platform metrics, not used in a standalone manner.
Indicator | Business Impact | Ideal Target |
---|---|---|
Average latency between opening and merging a PR | Increased levels of work-in-progress (WIP) lead to decreased code quality, higher delivery latency, higher reviewer friction, and code stagnation and conflicts. | < 1 day |
Average latency between integration and production release | ROI for new code is not realized until deployed to production. Improves MTTR. | < 1 day |
Average execution time of CI pipelines on PR | Beyond a certain threshold, the CI pipeline stops being a part of a developer's synchronous workflow. This results in larger PRs, increasing reviewer friction and delivery latency, and a larger amount of WIP. | < 5 minutes |
Average execution time end-to-end tests | E2E tests are generally the primary gatekeeper in CD pipelines; E2E tests should be built in such a way as to provide resiliency guarantees without unduly blocking deployments. | < 1 hour |
Average execution time of CD pipelines | Faster deployments improve the speed of testing changes in lower environments and the ability to deploy break-fixes to production. | < 5 minutes |
Average time to set up developer tooling | Improves developer flexibility in onboarding to work on different parts of the ecosystem. | < 5 minutes |
Average time to set up an isolated, production-like environment | Allowing easy creation of production-like environments greatly improves developer velocity, improves the feasibility of E2E testing, and ensures quick recovery when infrastructure breaks. | < 10 minutes |
Percent of repos with CI | See above. | 100% |
Percent of repos with CD | See above. | 100% |
Percent of repos with changelogs | Provides a straightforward way to assess changes in any given release. Improves communication with external stakeholders, is critical for effective QA, and critical for triaging production issues or downtime. | 100% |
Percent of system fully defined in infrastructure-as-code | Codifying infrastructure as code allows an organization to apply all of the best practices it uses with application to code on its infrastructure. This leads to substantially better outcomes for development velocity and system reliability. | 100% |
Organization Goals
These are common goals to help organizations improve their performance on the key platform metrics. While each goal represents a best practice, the level of impact and optimal approach will depend on your organization's unique context.
Category | Code | Goal |
---|---|---|
Developer Env | A.E.1 | All tooling necessary to begin developing on a codebase should be installable in a single command. |
A.E.2 | All developers in the same organizational unit should share the exact same versions of all required tooling. | |
A.E.3 | A developer should be able to launch an isolated copy of the organizational unit's entire system, including equivalent infrastructure. | |
A.E.4 | Where feasible, integration tests should be able to be triggered locally prior to committing and pushing code. | |
A.E.5 | Where applicable, developers should have scripts that automatically seed test databases with production-like data. | |
A.E.6 | Where applicable, developers should have scripts that automatically re-apply the latest database schema changes. | |
A.E.7 | Multi-step operations that are either done frequently or in times of urgency (disaster recovery) are codified as scripts that can be run locally. | |
Infrastructure | A.F.1 | All infrastructure should be codified using infrastructure-as-code. |
A.F.2 | All system settings should be codified using configuration-as-code. | |
Standards | A.S.1 | Each organizational unit's standard code integration and deployment processes should be included in shared documentation. |
A.S.2 | All repos in the same organizational unit should share the same CI/CD automation code. | |
A.S.3 | All repos in the same organizational unit should share the same developer environment configuration. | |
A.S.4 | All repos should have an automatic means to generate a Changelog in standard format. | |
Integration | A.I.1 | Prior to integration, every commit should generate build artifacts. |
A.I.2 | Prior to integration, every commit should run unit and integration tests and generate an associated code coverage report. | |
A.I.3 | Prior to integration, every commit should should trigger static code analysis of 1st-party code. | |
A.I.4 | Prior to integration, every commit should should trigger dependency vulnerability scanning. | |
A.I.5 | Prior to integration, every commit should should verify alignment with organization standards by running code linting. | |
A.I.6 | Proposed integrations should create previews of system under the proposed changes. | |
A.I.7 | Authors should be notified of commits that cannot be integrated. | |
Deployment | A.D.1 | Deploying new changes to an environment should be attainable in a single command. |
A.D.2 | Newly integrated code should be automatically deployed to a development / testing environment. | |
A.D.3 | Deployment to production environments should be gated by end-to-end test suites. | |
A.D.4 | Triggering a deploying to a live environment should create a diff documenting the changes. This may be used as a deployment gate. | |
A.D.5 | Infrastructure, application, and configuration code changes should be deployed (and rolled back) atomically, in the same pipeline. | |
A.D.6 | Notifications should be sent for deployment successes and failures. | |
A.D.7 | Errors or significant increases in error rates should trigger automatic rollbacks of deployments. |