In late 2016, we released the Comprehensive Guide to Terraform blog post series. It got so much attention (the series is now approaching 2 million views!) that a few months later, in early 2017, we turned it into a book, Terraform: Up & Running. It’s remarkable just how much has changed in the two years since then: 4 major Terraform releases, a language change (HCL to HCL2), a revamp of Terraform state (backends, locking, and workspaces), the Terraform providers split, the Terraform Registry, massive community growth, and much more.
Today, I have two exciting announcements to share:
In this blog post, I’ll cover the highlights of what has changed in the Terraform world in the last 2 years by going over the top 10 problems that have been fixed in Terraform since the release of the 1st edition of Terraform: Up & Running:
If you haven’t looked at Terraform in a while, I hope this blog post will give you a reason to look again!
Before Terraform 0.12, all expressions — that is, all references to variables, resources, and functions — had to be wrapped in interpolation syntax of the form ${...}
:
variable "ami_id" {
description = "ID of the AMI to deploy"
}
resource "aws_launch_configuration" "example" {
name = "example"
instance_type = "t2.micro"
# Variable references had to be wrapped in interpolation syntax
image_id = "${var.ami_id}"
# Function calls had to be wrapped in interpolation syntax
user_data = "${file("user-data.sh")}"
}
resource "aws_autoscaling_group" "example" {
name = "example"
min_size = 2
max_size = 5
# References to other resources had to be wrapped in
# interpolation syntax
launch_configuration = "${aws_launch_configuration.example.name}"
}
Terraform 0.12 moved to HCL2 as the underlying language, and in HCL2, expressions are a first-class construct that can be used anywhere, without all the extra wrapping. Here’s the same code, rewritten for Terraform 0.12:
variable "ami_id" {
description = "ID of the AMI to deploy"
}
resource "aws_launch_configuration" "example" {
name = "example"
instance_type = "t2.micro"
# No need to wrap variables anymore
image_id = var.ami_id
# No need to wrap functions.
user_data = file("user-data.sh")
}
resource "aws_autoscaling_group" "example" {
name = "example"
min_size = 2
max_size = 5
# No need to wrap resource references
launch_configuration = aws_launch_configuration.example.name
}
Ah, that’s much easier on the eyes. All of Terraform: Up & Running has been updated to use the new 0.12 syntax.
The type system in earlier versions of Terraform had a number of limitations:
# String types worked mostly as expected. However, note the use of
# an empty string to indicate this variable is "unset."
variable "ssh_key_name" {
description = "The SSH key to use"
type = "string"
default = ""
}
# This should be a number, but only strings were supported, so
# we relied on magical type coercion.
variable "num_instances" {
description = "The number of EC2 instances to deploy"
default = 3
}
# We'd like to enforce this is a list of maps with specific keys,
# but can only enforce that it is a list.
variable "tags" {
description = "The tags to apply to the EC2 instances"
type = "list"
}
Terraform 0.12 has a richer type system that officially supports:
string
number
bool
list(<TYPE>)
map(<TYPE>)
set(<TYPE>)
tuple(<TYPE_1>, <TYPE_2>, ...)
object({ATTR_1 = <TYPE_1>, ATTR_2 = <TYPE_2>, ...})
any
null
Here are the same variables defined with Terraform 0.12:
# Again, the type is string, but now you can use null to indicate a
# variable is "unset."
variable "ssh_key_name" {
description = "The SSH key to use"
type = string
default = null
}
# Number is now a first-class type!
variable "num_instances" {
description = "The number of EC2 instances to deploy"
type = number
default = 3
}
# You can now define nested types (e.g., list()) and custom
# types with specific keys using object({ATTR = }).
variable "tags" {
description = "The tags to apply to the EC2 instances"
type = list(object({
key = string
value = string
propagate_at_launch = bool
}))
}
Error messages in older versions of Terraform were often confusing, misleading, or downright missing. For example, let’s say you were using the terraform_remote_state
data source as follows:
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "terraform-up-and-running-state"
region = "us-east-1"
# Note the accidental typo in the key value
key = "produtcion/vpc/terraform.tfstate"
}
}
Due to the typo in the key
, Terraform won’t be able to find the state file. Unfortunately, older versions of Terraform would swallow that error completely:
$ terraform apply
data.terraform_remote_state.vpc: Refreshing state...
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
In newer versions of Terraform, here’s what happens when you run the exact same code:
$ terraform apply
data.terraform_remote_state.vpc: Refreshing state...
Error: Unable to find remote state
on main.tf line 1, in data "terraform_remote_state" "vpc":
1: data "terraform_remote_state" "vpc" {
No stored state was found for the given workspace in the given backend.
Not only do you get a clear error message, you even get to see the relevant snippet of code that caused the problem! The handling of just about all types of errors (e.g., type errors, runtime errors, crashes, etc) has seen similar improvements.
Terraform is a declarative language, so imperative logic such as for-loops have always been tricky. In older versions of Terraform, your only option was to use the count
meta-parameter:
resource "aws_instance" "example" {
# Use count to create 3 EC2 instances
count = 3
ami = "ami-abcd1234"
instance_type = "t2.micro"
}
By setting count
to 3
, the code above will create 3 EC2 Instances when you run apply
, similar to a for-loop. Unfortunately, this approach to loops and conditionals had a number of major limitations in older versions of Terraform. For example, one major limitation of count
was that it did not allow you to reference any data sources or resources:
data "aws_subnet_ids" "default" {
vpc_id = "vpc-abcd1234"
}
resource "aws_instance" "example" {
# This used to cause an error
count = length(data.aws_subnet_ids.default.ids)
ami = "ami-abcd1234"
instance_type = "t2.micro"
subnet_id = data.aws_subnet_ids.default.ids[count.index]
}
The code above uses the [aws_subnet_ids](https://www.terraform.io/docs/providers/aws/d/subnet_ids.html)
data source to fetch the list of subnets in a VPC, and then tries to create one EC2 instance in each of those subnets. This is a perfectly reasonable thing to do, but in older versions of Terraform, referencing any resource or data source in count
would lead to an error:
aws_instance.example:resource count can't reference resource variable: data.aws_subnet_ids.default.ids
Another major count
limitation was that the count
meta parameter can only be used on entire resources and data sources, but not on the contents of those resources. For example, consider how tags are set in the aws_autoscaling_group
resource:
resource "aws_autoscaling_group" "example" {
# (...)
tag {
key = "Name"
value = var.cluster_name
propagate_at_launch = true
}
}
Each tag
must be specified as an inline block. In previous versions of Terraform, there was no way to take in a list of tags from a user and loop over them to create the tag
inline blocks dynamically.
In newer versions of Terraform, the count
parameter can reference data sources, so the exact same code works just fine:
data "aws_subnet_ids" "default" {
vpc_id = "vpc-abcd1234"
}
resource "aws_instance" "example" {
# This works now!
count = length(data.aws_subnet_ids.default.ids)
ami = "ami-abcd1234"
instance_type = "t2.micro"
subnet_id = data.aws_subnet_ids.default.ids[count.index]
}
As for dealing with inline blocks, such as tag
in aws_autoscaling_group
, in Terraform 0.12 and newer, you can use a for_each
expression. For example, you can add an input variable called custom_tags
:
variable "custom_tags" {
description = "Custom tags to set on the Instances in the ASG"
type = map(string)
}
And loop over it using the for_each
construct to dynamically generate tag
blocks:
resource "aws_autoscaling_group" "example" {
# (...)
dynamic "tag" {
# Use for_each to loop over var.custom_tags
for_each = var.custom_tags
# In each iteration, set the following arguments in the
# tag block
content {
key = tag.key
value = tag.value
propagate_at_launch = true
}
}
}
Chapter 5 of Terraform: Up & Running, 2nd edition, has several sections dedicated to iteration, including lots of examples of how to use count
, for_each
expressions, for
expressions, and for
string directives.
Terraform allows you to use ternary syntax for conditionals:
instance_type = var.env == "prod" ? "m4.large" : "t2.micro"
The code above will set instance_type
to m4.large
in the prod
workspace and t2.micro
in all others. Older versions of Terraform had two major limitations with conditionals. The first limitation was that conditionals were not short-circuiting, which means that both clauses of the conditional would be evaluated, no matter what the boolean value was:
# Won't work! var.foo[0] will always be evaluated!
length(var.foo) > 0 ? var.foo[0] : "default"
The code above would fail, as var.foo[0]
would always be evaluated, even if var.foo
was empty.
The second limitation was that conditionals only worked with primitive values like strings and numbers, but not lists or maps:
# Won't work! Conditionals can only return primitives.
length(var.foo) > 0 ? var.foo ? ["default"]
The code above would fail, as var.foo
and ["default"]
are both lists, but the clauses of a conditional in older versions of Terraform could only return primitives.
Both of these problems have been fixed in Terraform 0.12. Conditionals are now short-circuiting and work with any arbitrary type.
# Both of these work as expected in Terraform 0.12 and above!
length(var.foo) > 0 ? var.foo[0] : "default"
length(var.foo) > 0 ? var.foo ? ["default"]
Chapter 5 of Terraform: Up & Running, 2nd edition dives into the details of conditionals, including lots of examples of how to do different types of if-statements and if-else statements using count
, for_each
expressions, for
expressions, if
string directives, and ternary syntax.
When the 1st edition of Terraform: Up & Running came out, Terraform supported remote state storage (e.g., storing your state in an S3 bucket), but there were several limitations. The first limitation was that there was no way to define your remote state settings as code. Instead, every developer on your team had to remember to run a complicated remote config
command before they could run terraform apply
:
$ terraform remote config \
-backend=s3 \
-backend-config="bucket=my-bucket" \
-backend-config="key=terraform.tfstate" \
-backend-config="region=us-east-1" \
-backend-config="encrypt=true"
The second limitation was that there was no support for locking of state files. So if two developers ran terraform apply
at the same time, you ran the risk of their changes conflicting or overwriting each other.
Newer versions of Terraform introduced remote state backends, which allow you to define your remote state configuration as part of your Terraform code:
# main.tf
terraform {
backend "s3" {
bucket = "my-bucket"
key = "terraform.tfstate"
region = "us-east-1"
encrypt = true
}
}
Moreover, most remote state backends now support locking. For example, the S3 backend supports locking using DynamoDB:
# main.tf
terraform {
backend "s3" {
bucket = "my-bucket"
key = "terraform.tfstate"
region = "us-east-1"
encrypt = true
# Enable locking via DynamoDB
dynamodb_table = "TerraformLocks"
}
}
With locking enabled, any time you run terraform apply
, it will obtain the lock before making changes, and release the lock after making changes:
$ terraform apply
Acquiring state lock. This may take a few moments...
(...)
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Releasing state lock. This may take a few moments...
If someone else was running terraform apply
at the same time, they would already have the lock, and you will have to wait (e.g., you can use the -lock-timeout=10m
parameter to tell Terraform to wait up to 10 minutes for the lock to be released). Chapter 3 of Terraform: Up & Running, 2nd edition goes into the details of managing state with Terraform, including backends
, locking, Terraform workspaces, and isolating state files across different environments.
When the 1st edition of Terraform: Up & Running came out, Terraform didn’t have much of a story around writing automated tests. Most people built their Terraform modules the best they could and hoped for the best. But to quote Google’s Site Reliability Engineering book, hope is not a strategy.
In 2018, Gruntwork open sourced Terratest, a swiss army knife for testing infrastructure code. It’s a Go library that provides a large collection of helpers for writing automated tests for your Terraform code, as well as many other types of infrastructure as code tools, including Docker, Packer, and Kubernetes. This library has made it possible for our small team to build and maintain an Infrastructure as Code Library that has more than 300,000 lines of code used in production by hundreds of companies.
Automated tests for Terraform code typically have the following structure:
terraform init
and terraform apply
.terraform destroy
.For example, here’s what an automated test for a Terraform module that deploys a web server might look like:
func TestWebServer(t *testing.T) {
terraformOptions := &terraform.Options {
// The path to where your Terraform code is located
TerraformDir: "../web-server",
}
// At the end of the test, run `terraform destroy`
defer terraform.Destroy(t, terraformOptions)
// Run `terraform init` and `terraform apply`
terraform.InitAndApply(t, terraformOptions)
// Run `terraform output` to get the value of an output variable
url := terraform.Output(t, terraformOptions, "url")
// Verify that we get back a 200 OK with the expected text. It
// takes ~1 min for the Instance to boot, so retry a few times.
status := 200
text := "Hello, World"
retries := 15
sleep := 5 * time.Second
http_helper.HttpGetWithRetry(t, url, status, text, retries, sleep)
}
The 2nd edition of Terraform: Up & Running has a new chapter entirely dedicated to testing that walks through unit tests, integration tests, end to end tests, dependency injection, static analysis, property checking, and using test stages and parallelism to reduce test times.
When you use Terraform in the real world, you’re typically using it as part of a team. When the 1st edition of Terraform: Up & Running came out, effective workflows for teamwork with Terraform were not well understood. We saw teams struggling with a number of challenges, including:
Chapter 8, of Terraform: Up & Running, 2nd edition, has been completely rewritten to answer these exact questions. In that chapter, you’ll find a set of techniques you can use to get your boss on board with adopting Terraform, as well as a detailed walkthrough of two workflows:
Here’s a concise summary of the two workflows, side by side (you’ll have to grab a copy of the book for the full details):
In a general purpose programming language (e.g., Ruby, Python, Java), you can put reusable code into a function, combine and compose simpler functions to create more complicated ones, and build up a library of reusable functions that can be shared across your whole team. In Terraform, you can do the same thing by creating modules, which are like reusable “blueprints” for your infrastructure.
Modules are a big deal. They are the key ingredient to writing reusable, maintainable, and testable Terraform code. Once you start using them, there’s no going back. You’ll start building everything as a module, creating a library of modules to share within your company, and start thinking of your entire infrastructure as a collection of reusable modules.
All of this was already possible when Terraform: Up & Running first came out, but there was one problem: there were very few modules available publicly that you could use or learn from. Pretty much every company was reinventing the wheel, building up more or less the same modules from scratch, over and over again.
The Terraform community has grown enormously the last two years, and there are now hundreds of modules available for you to use, including:
Terraform: Up & Running, 2nd edition includes two chapters dedicated to how to build modules: Chapter 4, Terraform modules, introduces the basics of creating modules and Chapter 6, Production-grade Terraform code, dives into the details of designing module APIs, module composition, module testing, module versioning, production checklists, and everything else you need to build Terraform modules that you can rely on in production.
In September 2016, while writing the 1st edition of Terraform: Up & Running, I gathered a bunch of data to see how Terraform’s community compared to that of other popular infrastructure as code (IAC) tools, including Chef, Puppet, Ansible, CloudFormation, and Heat. I looked at whether each IAC tool was open source or closed source, what cloud providers it supported, the total number of contributors and stars on GitHub, how many commits and active issues there were over a one-month period, how many open source libraries were available for the tool, the number of questions listed for that tool on StackOverflow, and the number of jobs that mention the tool on Indeed.com. The following table shows the results:
Obviously, this was not a perfect apples-to-apples comparison. For example, some of the tools had more than one repository, and some used other methods for bug tracking and questions; searching for jobs with common words like “chef” or “puppet” is tricky; and so on.
That said, it was clear that, at the time, Chef, Puppet, and Ansible were all more popular than Terraform. Terraform seemed to be growing quickly, but compared to the other tools, the community was still relatively small.
I revisited all the same numbers in May, 2019, and here’s what I found:
Again, the numbers are far from perfect, but the trends still seem to shine through clearly: Ansible and Terraform are both seeing explosive growth. Check out the percentage change in the numbers between the 2016 and 2019 comparisons:
(Note: the decline in Terraform’s commits and issues is solely due to the fact that I’m only measuring the core Terraform repo, whereas in 2017, all the provider code was extracted into separate repos, so the vast amount of activity across the more than 100 provider repos is not being counted.)
The increase in the number of contributors, stars, open source libraries, StackOverflow posts, and jobs for Terraform is through the roof. Moreover, Terraform has grown from supporting a handful of major cloud providers (e.g., AWS, GCP, and Azure) to over 100 official providers and many more community providers.
That means you can now use Terraform to not only manage many other types of clouds (e.g., there are now providers for Alicloud, Oracle Cloud Infrastructure, VMware vSphere, and others), but also to manage many other aspects of your world as code, including version control systems (e.g., using the GitHub, GitLab, or BitBucket providers), data stores (e.g., using the MySQL, PostreSQL, or InfluxDB providers), monitoring and alerting systems (e.g., using the DataDog, New Relic, or Grafana providers), platform tools (e.g., using the Kubernetes, Helm, Heroku, Rundeck, or Rightscale providers), and much more. Moreover, each provider has much better coverage these days: e.g., the AWS provider now covers the majority of important AWS services and often adds support for new services even before CloudFormation!
Over the last couple of years, Terraform has become a more mature, robust, and powerful tool. And based on the rate at which the community is growing, I fully expect it to get even better, and I’m very excited to see how it evolves in the future.
In this blog post, you got a small taste of the major changes from just the last couple of years. For the full details, grab yourself a copy of the Early Release of Terraform: Up & Running, 2nd edition, and let me know what you think!