According to this survey from driftctl, 96% of the teams asked reports manual changes being the main cause for infrastructure drift. Other concerns are not moving from development to production fast enough and introducing many changes at once.

Drift is a big problem in system development. Test environments get broken and halts the whole development process, or they are in an unknown state.

As a software developer, you have probably experienced drift many times due to parallel, isolated development processes and not merging into production fast enough. We mitigate some of this drift by introducing continuous integration and continuous deployment processes that reliably can move software from development to production faster while still guaranteeing quality by test automation and gated check in. We use DevOps to shift knowledge left in order to remove blocking phases and speed up the process, and we use GitOps to source control operational changes in order to gain control over configuration unknown syndromes.

Development Goals

Before we further explore what concepts we have adopted to mitigate drift in application development, and how we can use it to also mitigating infrastructure drift, let’s have a look at some objectives for general product development. In particular, let’s study three core objectives:

Fast Deliveries
Quality
Simplicity

Fast deliveries makes the product reach the market and potential consumers fast, preferably faster than the competition. Quality makes the customer stay with the product, and simplicity makes both of the previous objectives easier to achieve and maintain.

Automation

Key part of moving fast is automation of repetitive tasks. It decreases the risk of drift by continuously moving us forward faster with higher accuracy by replicating and reproducing test scenarios, releases and deployments, continuously reporting back feedback helping us steer in the right direction. The more automation the faster we move, the lower risk of drift, the higher the quality.

Least Privilege / Zero Trust / BeyondCorp

Security is something that should be embraced by software developers during the development cycle, not something that are forced upon from the side or as an after construct. This is maybe even more important when building infrastructure. When security becomes a real problem, it’s not uncommon that it is too late to do something about it. Trust is fragile and so is also the weakest link to our customers precious data.

Applying a least Privilege policy does not only minimize risk of granting god mode to perpetrators, it also minimizes the possibility to introduce manual applied drift.

While least privilege can lower the attack surface, Zero Trust simplifies the way we do security. If there are no hurdles in the way of development progress, there is less risk to succumb to the temptation of disabling security mechanisms in order to make life easier.

Infrastructure as Code / GitOps

Source controlling application code has been common practice for many years. By using manifests to define wanted states and declarative code on how to get there, source control follows naturally. The reasons for why infrastructure as code is powerful are the same as for application source code; to track and visualize how functionality changes while being able to apply it in a reproducible way by automation.

Less risk of drift.

GitOps makes it possible to detect risky configuration changes by introducing reviews and running gated checks. It simplifies moving changes forward (and backwards) in a controllable way by enabling small increments while keeping a breadcrumb trail on how we got where we are.

DevSecInfraOps

Cross functional teams help us bridge knowledge gaps, removing handovers and synchronization problems. It brings needed knowledge closer to the development process, shortening time to production and getting security and operational requirements built in. Infrastructure is very much part of this process.

Local Environments

Using a different architecture for local development increases drift as well as the cost to build and maintain it. Concepts like security, network and scalability are built into cloud infrastructure, and often provided by products that are not available for local development. As for distributed systems, these are hard to use locally since they run elastically over multiple hosts possibly across geographic boundaries.

What if we could minimize both application and infrastructure drift by reusing the same cloud native infrastructure architecture and automation to produce the same type of environment, anywhere, any time, for any purpose, while adhering to the above criteria. Test, staging, it all shifts left into the development stage, shortening the path to production, and enabling using all environments as a production like environment.

In the next part we will deep dive into how we can work with cloud infrastructure similar to how we work with application development while adopting all of the above concepts.

Mitigating Infrastructure Drift by using Software Development Principals