As organizations continue to adopt infrastructure as code (IaC) practices and tools like Terraform, it becomes increasingly important to maintain a consistent and predictable state of the infrastructure. However, managing Terraform at scale can present several challenges, such as drifts caused by manual changes to the infrastructure by the console or changes made in Git branches that are not merged into the upstream branch. These drifts can be challenging to trace and lead to longer troubleshooting times for small or quick tasks.
One of the main challenges of running Terraform at scale is drift. Drift occurs when the actual state of the infrastructure differs from the desired state defined in the Terraform code. This can happen for several reasons, including manual changes made to the infrastructure outside of Terraform, or changes made in Git branches that are not merged into the master branch. Drift can also occur when provider APIs introduce changes such as default fields that are not present to in the HCL.
Managing drift at scale can be a tedious and time-consuming task. It can be challenging to trace the source of the drift and determine how and when it occurred. Especially if you have hundreds of stacks (tf directories)
To address these challenges, the open-source tool Terradrift was created. Terradrift is a Terraform drift detection tool that uses terraform-exec under the hood to perform
terraform plan and report drift changes instantly or as Prometheus metrics. This allows organizations to continuously monitor their infrastructure for drift and quickly identify and address any issues as they arise.
Terradrift is designed to work in two modes, Server and CLI mode.
Both modes will scan all terraform stacks (directories) in a given working directory and run terraform plan to detect if drifts exist. The difference between the two modes is that the
terradrift-cli will run the scan once and exit, Printing the output of the current state, while the
terradrift-server will continuously scan based on a defined schedule and exposes the drift results as Prometheus exporter metrics on /metrics endpoint providing the flexibility of reporting the drift. By using Prometheus Alerts based on how long the drift has been detected, Also you can create dashboards based on those metrics stored on Prometheus or any monitoring platform.
terradrift-cli discovers the stacks from the given workdir and then runs the
terraform plan command to detect the drifts based on the plan output.
$ terradrift-cli --workdir ./examples/ --config examples/config.yaml
STACK-NAME DRIFT ADD CHANGE DESTROY PATH TF-VERSION
api-production false 0 0 0 gcp/api 1.2.7
api-staging false 0 0 0 gcp/api 1.2.7
core-production true 0 0 1 aws/core-production 1.2.7
core-staging true 1 0 0 gcp/core-staging 1.0.6
See all details in terradrift-cli
Server mode (terradrift-server)
You can run the server following the example below after setting the required environment variables for Github token and the cloud provider.
$ ./terradrift-server --repository https://github.com/username/reponame \
--git-token $GIT_TOKEN \
--config ./config.yaml \
Retrieving the drifts as prometheus metrics
$ curl http://localhost:8080/metrics
# HELP terradrift_plan_add_resources Number of resources to be added based on tf plan
# TYPE terradrift_plan_add_resources gauge
# HELP terradrift_plan_change_resources Number of resources to be changed based on tf plan
# TYPE terradrift_plan_change_resources gauge
# HELP terradrift_plan_destroy_resources Number of resources to be destroyed based on tf plan
# TYPE terradrift_plan_destroy_resources gauge
Terradrift supports all cloud providers, whatever your current IaC does. if your normal
terraform cmds requires certain environment variables to be exported, all you have to do is to include the flag
In conclusion, managing Terraform at scale can present several challenges, including drifts. Terradrift helps organizations address these drifts by continuously providing real-time visibility into the state of the infrastructure. This allows organizations to maintain a consistent and predictable state of their infrastructure while reducing the risk of non-tracked changes.