Terraform: Up & Running, 2nd edition Early Release is now available!

Learn about the top 10 problems that have been fixed in Terraform since the 1st edition

In late 2016, we released the Comprehensive Guide to Terraform blog post series. It got so much attention (the series is now approaching 2 million views!) that a few months later, in early 2017, we turned it into a book, Terraform: Up & Running. It’s remarkable just how much has changed in the two years since then: 4 major Terraform releases, a language change (HCL to HCL2), a revamp of Terraform state (backends, locking, and workspaces), the Terraform providers split, the Terraform Registry, massive community growth, and much more.

Today, I have two exciting announcements to share:

  1. We’ve updated the Comprehensive Guide to Terraform blog post series all the way through Terraform 0.12!
  2. The Early Release of the 2nd edition of *Terraform: Up & Running* is now available! The 2nd edition of the book is nearly double the length of the 1st edition (~160 more pages), and it has been fully updated through Terraform 0.12, including two completely new chapters: Production-grade Terraform Code and How to Test Terraform Code.

In this blog post, I’ll cover the highlights of what has changed in the Terraform world in the last 2 years by going over the top 10 problems that have been fixed in Terraform since the release of the 1st edition of Terraform: Up & Running:

  1. Expressions
  2. Type system
  3. Error handling
  4. Iteration
  5. Conditionals
  6. Terraform state
  7. Testing
  8. Workflow
  9. Modules
  10. Community

If you haven’t looked at Terraform in a while, I hope this blog post will give you a reason to look again!

1. Expressions

The problem

Before Terraform 0.12, all expressions — that is, all references to variables, resources, and functions — had to be wrapped in interpolation syntax of the form ${...}:

variable "ami_id" {
description = "ID of the AMI to deploy"
}
resource "aws_launch_configuration" "example" {
name          = "example"
instance_type = "t2.micro"
# Variable references had to be wrapped in interpolation syntax
image_id = "${var.ami_id}"
# Function calls had to be wrapped in interpolation syntax
user_data = "${file("user-data.sh")}"
}
resource "aws_autoscaling_group" "example" {
name                      = "example"
min_size                  = 2
max_size                  = 5
# References to other resources had to be wrapped in
# interpolation syntax
launch_configuration = "${aws_launch_configuration.example.name}"
}

The solution

Terraform 0.12 moved to HCL2 as the underlying language, and in HCL2, expressions are a first-class construct that can be used anywhere, without all the extra wrapping. Here’s the same code, rewritten for Terraform 0.12:

variable "ami_id" {
description = "ID of the AMI to deploy"
}
resource "aws_launch_configuration" "example" {
name          = "example"
instance_type = "t2.micro"
# No need to wrap variables anymore
image_id = var.ami_id
# No need to wrap functions.
user_data = file("user-data.sh")
}
resource "aws_autoscaling_group" "example" {
name                      = "example"
min_size                  = 2
max_size                  = 5
# No need to wrap resource references
launch_configuration = aws_launch_configuration.example.name
}

Ah, that’s much easier on the eyes. All of Terraform: Up & Running has been updated to use the new 0.12 syntax.

2. Type system

The problem

The type system in earlier versions of Terraform had a number of limitations:

  1. Only strings, lists, and maps were officially supported.
  2. Unofficially, you could also use numbers and booleans, but that meant relying on somewhat magical type coercion with strings.
  3. There was no way to enforce type constraints on more complicated types (e.g., a list of maps or an object with specific keys).
  4. There was no support for saying a variable or argument should be unset (i.e., that a resource should use its default behavior for that argument), so some resources used zero values (e.g., empty string) to indicate something should be unset, while others didn’t support it at all.
# String types worked mostly as expected. However, note the use of
# an empty string to indicate this variable is "unset."
variable "ssh_key_name" {
description = "The SSH key to use"
type        = "string"
default     = ""
}
# This should be a number, but only strings were supported, so
# we relied on magical type coercion.
variable "num_instances" {
description = "The number of EC2 instances to deploy"
default     = 3
}
# We'd like to enforce this is a list of maps with specific keys,
# but can only enforce that it is a list.
variable "tags" {
description = "The tags to apply to the EC2 instances"
type        = "list"
}

The solution

Terraform 0.12 has a richer type system that officially supports:

  1. string
  2. number
  3. bool
  4. list(<TYPE>)
  5. map(<TYPE>)
  6. set(<TYPE>)
  7. tuple(<TYPE_1>, <TYPE_2>, ...)
  8. object({ATTR_1 = <TYPE_1>, ATTR_2 = <TYPE_2>, ...})
  9. any
  10. null

Here are the same variables defined with Terraform 0.12:

# Again, the type is string, but now you can use null to indicate a
# variable is "unset."
variable "ssh_key_name" {
description = "The SSH key to use"
type        = string
default     = null
}
# Number is now a first-class type!
variable "num_instances" {
description = "The number of EC2 instances to deploy"
type        = number
default     = 3
}
# You can now define nested types (e.g., list()) and custom
# types with specific keys using object({ATTR = }).
variable "tags" {
description = "The tags to apply to the EC2 instances"
type        = list(object({
key                 = string
value               = string
propagate_at_launch = bool
}))
}

3. Error handling

The problem

Error messages in older versions of Terraform were often confusing, misleading, or downright missing. For example, let’s say you were using the terraform_remote_state data source as follows:

data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "terraform-up-and-running-state"
region = "us-east-1"
# Note the accidental typo in the key value
key = "produtcion/vpc/terraform.tfstate"
}
}

Due to the typo in the key, Terraform won’t be able to find the state file. Unfortunately, older versions of Terraform would swallow that error completely:

$ terraform apply
data.terraform_remote_state.vpc: Refreshing state...
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

The solution

In newer versions of Terraform, here’s what happens when you run the exact same code:

$ terraform apply
data.terraform_remote_state.vpc: Refreshing state...
Error: Unable to find remote state
on main.tf line 1, in data "terraform_remote_state" "vpc":
1: data "terraform_remote_state" "vpc" {
No stored state was found for the given workspace in the given backend.

Not only do you get a clear error message, you even get to see the relevant snippet of code that caused the problem! The handling of just about all types of errors (e.g., type errors, runtime errors, crashes, etc) has seen similar improvements.

4. Iteration

The problem

Terraform is a declarative language, so imperative logic such as for-loops have always been tricky. In older versions of Terraform, your only option was to use the count meta-parameter:

resource "aws_instance" "example" {
# Use count to create 3 EC2 instances
count = 3
ami           = "ami-abcd1234"
instance_type = "t2.micro"
}

By setting count to 3, the code above will create 3 EC2 Instances when you run apply, similar to a for-loop. Unfortunately, this approach to loops and conditionals had a number of major limitations in older versions of Terraform. For example, one major limitation of count was that it did not allow you to reference any data sources or resources:

data "aws_subnet_ids" "default" {
vpc_id = "vpc-abcd1234"
}
resource "aws_instance" "example" {
# This used to cause an error
count = length(data.aws_subnet_ids.default.ids)
ami           = "ami-abcd1234"
instance_type = "t2.micro"
subnet_id = data.aws_subnet_ids.default.ids[count.index]
}

The code above uses the [aws_subnet_ids](https://www.terraform.io/docs/providers/aws/d/subnet_ids.html) data source to fetch the list of subnets in a VPC, and then tries to create one EC2 instance in each of those subnets. This is a perfectly reasonable thing to do, but in older versions of Terraform, referencing any resource or data source in count would lead to an error:

aws_instance.example:resource count can't reference resource variable: data.aws_subnet_ids.default.ids

Another major count limitation was that the count meta parameter can only be used on entire resources and data sources, but not on the contents of those resources. For example, consider how tags are set in the aws_autoscaling_groupresource:

resource "aws_autoscaling_group" "example" {
# (...)
tag {
key                 = "Name"
value               = var.cluster_name
propagate_at_launch = true
}
}

Each tag must be specified as an inline block. In previous versions of Terraform, there was no way to take in a list of tags from a user and loop over them to create the tag inline blocks dynamically.

The solution

In newer versions of Terraform, the count parameter can reference data sources, so the exact same code works just fine:

data "aws_subnet_ids" "default" {
vpc_id = "vpc-abcd1234"
}
resource "aws_instance" "example" {
# This works now!
count = length(data.aws_subnet_ids.default.ids)
ami           = "ami-abcd1234"
instance_type = "t2.micro"
subnet_id = data.aws_subnet_ids.default.ids[count.index]
}

As for dealing with inline blocks, such as tag in aws_autoscaling_group, in Terraform 0.12 and newer, you can use a for_each expression. For example, you can add an input variable called custom_tags:

variable "custom_tags" {
description = "Custom tags to set on the Instances in the ASG"
type        = map(string)
}

And loop over it using the for_each construct to dynamically generate tag blocks:

resource "aws_autoscaling_group" "example" {
# (...)

dynamic "tag" {
# Use for_each to loop over var.custom_tags
for_each = var.custom_tags
# In each iteration, set the following arguments in the
# tag block
content {
key                 = tag.key
value               = tag.value
propagate_at_launch = true
}
}
}

Chapter 5 of Terraform: Up & Running, 2nd edition, has several sections dedicated to iteration, including lots of examples of how to use count, for_each expressions, for expressions, and for string directives.

5. Conditionals

The problem

Terraform allows you to use ternary syntax for conditionals:

instance_type = var.env == "prod" ? "m4.large" : "t2.micro"

The code above will set instance_type to m4.large in the prod workspace and t2.micro in all others. Older versions of Terraform had two major limitations with conditionals. The first limitation was that conditionals were not short-circuiting, which means that both clauses of the conditional would be evaluated, no matter what the boolean value was:

# Won't work! var.foo[0] will always be evaluated!
length(var.foo) > 0 ? var.foo[0] : "default"

The code above would fail, as var.foo[0] would always be evaluated, even if var.foo was empty.

The second limitation was that conditionals only worked with primitive values like strings and numbers, but not lists or maps:

# Won't work! Conditionals can only return primitives.
length(var.foo) > 0 ? var.foo ? ["default"]

The code above would fail, as var.foo and ["default"] are both lists, but the clauses of a conditional in older versions of Terraform could only return primitives.

The solution

Both of these problems have been fixed in Terraform 0.12. Conditionals are now short-circuiting and work with any arbitrary type.

# Both of these work as expected in Terraform 0.12 and above!
length(var.foo) > 0 ? var.foo[0] : "default"
length(var.foo) > 0 ? var.foo ? ["default"]

Chapter 5 of Terraform: Up & Running, 2nd edition dives into the details of conditionals, including lots of examples of how to do different types of if-statements and if-else statements using count, for_each expressions, for expressions, if string directives, and ternary syntax.

6. Terraform state

The problem

When the 1st edition of Terraform: Up & Running came out, Terraform supported remote state storage (e.g., storing your state in an S3 bucket), but there were several limitations. The first limitation was that there was no way to define your remote state settings as code. Instead, every developer on your team had to remember to run a complicated remote config command before they could run terraform apply:

$ terraform remote config \
-backend=s3 \
-backend-config="bucket=my-bucket" \
-backend-config="key=terraform.tfstate" \
-backend-config="region=us-east-1" \
-backend-config="encrypt=true"

The second limitation was that there was no support for locking of state files. So if two developers ran terraform apply at the same time, you ran the risk of their changes conflicting or overwriting each other.

The solution

Newer versions of Terraform introduced remote state backends, which allow you to define your remote state configuration as part of your Terraform code:

# main.tf
terraform {
backend "s3" {
bucket  = "my-bucket"
key     = "terraform.tfstate"
region  = "us-east-1"
encrypt = true
}
}

Moreover, most remote state backends now support locking. For example, the S3 backend supports locking using DynamoDB:

# main.tf
terraform {
backend "s3" {
bucket  = "my-bucket"
key     = "terraform.tfstate"
region  = "us-east-1"
encrypt = true
# Enable locking via DynamoDB
dynamodb_table = "TerraformLocks"
}
}

With locking enabled, any time you run terraform apply, it will obtain the lock before making changes, and release the lock after making changes:

$ terraform apply
Acquiring state lock. This may take a few moments...
(...)
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Releasing state lock. This may take a few moments...

If someone else was running terraform apply at the same time, they would already have the lock, and you will have to wait (e.g., you can use the -lock-timeout=10m parameter to tell Terraform to wait up to 10 minutes for the lock to be released). Chapter 3 of Terraform: Up & Running, 2nd edition goes into the details of managing state with Terraform, including backends, locking, Terraform workspaces, and isolating state files across different environments.

7. Testing

Problem

When the 1st edition of Terraform: Up & Running came out, Terraform didn’t have much of a story around writing automated tests. Most people built their Terraform modules the best they could and hoped for the best. But to quote Google’s Site Reliability Engineering book, hope is not a strategy.

Solution

In 2018, Gruntwork open sourced Terratest, a swiss army knife for testing infrastructure code. It’s a Go library that provides a large collection of helpers for writing automated tests for your Terraform code, as well as many other types of infrastructure as code tools, including Docker, Packer, and Kubernetes. This library has made it possible for our small team to build and maintain an Infrastructure as Code Library that has more than 300,000 lines of code used in production by hundreds of companies.

Automated tests for Terraform code typically have the following structure:

  1. Deploy your Terraform module via terraform init and terraform apply.
  2. Validate the module works as expected: e.g., if the module deploys a web server, send HTTP requests; if the module deploys a database, read/write data; etc.
  3. Undeploy the Terraform module via terraform destroy.

For example, here’s what an automated test for a Terraform module that deploys a web server might look like:

func TestWebServer(t *testing.T) {
terraformOptions := &terraform.Options {
// The path to where your Terraform code is located
TerraformDir: "../web-server",
}
// At the end of the test, run `terraform destroy`
defer terraform.Destroy(t, terraformOptions)
// Run `terraform init` and `terraform apply`
terraform.InitAndApply(t, terraformOptions)
// Run `terraform output` to get the value of an output variable
url := terraform.Output(t, terraformOptions, "url")
// Verify that we get back a 200 OK with the expected text. It
// takes ~1 min for the Instance to boot, so retry a few times.
status := 200
text := "Hello, World"
retries := 15
sleep := 5 * time.Second
http_helper.HttpGetWithRetry(t, url, status, text, retries, sleep)
}

The 2nd edition of Terraform: Up & Running has a new chapter entirely dedicated to testing that walks through unit tests, integration tests, end to end tests, dependency injection, static analysis, property checking, and using test stages and parallelism to reduce test times.

8. Workflow

Problem

When you use Terraform in the real world, you’re typically using it as part of a team. When the 1st edition of Terraform: Up & Running came out, effective workflows for teamwork with Terraform were not well understood. We saw teams struggling with a number of challenges, including:

  1. How to convince your boss to adopt Terraform in the first place.
  2. How to develop Terraform code locally, including manual and automated tests.
  3. How to set up code review, continuous integration (CI), and continuous delivery (CD) processes for Terraform code.
  4. How to roll out Terraform changes across environments (dev, stage, prod) and what to do in case of errors.
  5. How to combine Terraform with other tools, such as Ansible, Packer, Docker, and Kubernetes.

Solution

Chapter 8, of Terraform: Up & Running, 2nd edition, has been completely rewritten to answer these exact questions. In that chapter, you’ll find a set of techniques you can use to get your boss on board with adopting Terraform, as well as a detailed walkthrough of two workflows:

  1. A workflow for taking application code (e.g., a Ruby on Rails or Java/Spring app) from development all the way to production.
  2. A workflow for taking infrastructure code (e.g., Terraform modules) from development to production.

Here’s a concise summary of the two workflows, side by side (you’ll have to grab a copy of the book for the full details):

9. Modules

Problem

In a general purpose programming language (e.g., Ruby, Python, Java), you can put reusable code into a function, combine and compose simpler functions to create more complicated ones, and build up a library of reusable functions that can be shared across your whole team. In Terraform, you can do the same thing by creating modules, which are like reusable “blueprints” for your infrastructure.

Modules are a big deal. They are the key ingredient to writing reusable, maintainable, and testable Terraform code. Once you start using them, there’s no going back. You’ll start building everything as a module, creating a library of modules to share within your company, and start thinking of your entire infrastructure as a collection of reusable modules.

All of this was already possible when Terraform: Up & Running first came out, but there was one problem: there were very few modules available publicly that you could use or learn from. Pretty much every company was reinventing the wheel, building up more or less the same modules from scratch, over and over again.

Solution

The Terraform community has grown enormously the last two years, and there are now hundreds of modules available for you to use, including:

  1. Terraform Registry: In 2017, HashiCorp launched the Terraform Registry, a collection of open source, community maintained modules for Terraform. Since the release, the Registry has grown to well over 1,000 modules across a variety of clouds (AWS, GCP, Azure, etc.).
  2. Gruntwork Infrastructure as Code Library: At Gruntwork, we’ve created a library of production-grade modules for all major cloud providers. These modules are thoroughly documented and tested, commercially supported and maintained, and are used directly in production by hundreds of companies around the world.

Terraform: Up & Running, 2nd edition includes two chapters dedicated to how to build modules: Chapter 4, Terraform modules, introduces the basics of creating modules and Chapter 6, Production-grade Terraform code, dives into the details of designing module APIs, module composition, module testing, module versioning, production checklists, and everything else you need to build Terraform modules that you can rely on in production.

10. Community

Problem

In September 2016, while writing the 1st edition of Terraform: Up & Running, I gathered a bunch of data to see how Terraform’s community compared to that of other popular infrastructure as code (IAC) tools, including Chef, Puppet, Ansible, CloudFormation, and Heat. I looked at whether each IAC tool was open source or closed source, what cloud providers it supported, the total number of contributors and stars on GitHub, how many commits and active issues there were over a one-month period, how many open source libraries were available for the tool, the number of questions listed for that tool on StackOverflow, and the number of jobs that mention the tool on Indeed.com. The following table shows the results:

Obviously, this was not a perfect apples-to-apples comparison. For example, some of the tools had more than one repository, and some used other methods for bug tracking and questions; searching for jobs with common words like “chef” or “puppet” is tricky; and so on.

That said, it was clear that, at the time, Chef, Puppet, and Ansible were all more popular than Terraform. Terraform seemed to be growing quickly, but compared to the other tools, the community was still relatively small.

Solution

I revisited all the same numbers in May, 2019, and here’s what I found:

Again, the numbers are far from perfect, but the trends still seem to shine through clearly: Ansible and Terraform are both seeing explosive growth. Check out the percentage change in the numbers between the 2016 and 2019 comparisons:

(Note: the decline in Terraform’s commits and issues is solely due to the fact that I’m only measuring the core Terraform repo, whereas in 2017, all the provider code was extracted into separate repos, so the vast amount of activity across the more than 100 provider repos is not being counted.)

The increase in the number of contributors, stars, open source libraries, StackOverflow posts, and jobs for Terraform is through the roof. Moreover, Terraform has grown from supporting a handful of major cloud providers (e.g., AWS, GCP, and Azure) to over 100 official providers and many more community providers.

That means you can now use Terraform to not only manage many other types of clouds (e.g., there are now providers for Alicloud, Oracle Cloud Infrastructure, VMware vSphere, and others), but also to manage many other aspects of your world as code, including version control systems (e.g., using the GitHub, GitLab, or BitBucket providers), data stores (e.g., using the MySQL, PostreSQL, or InfluxDB providers), monitoring and alerting systems (e.g., using the DataDog, New Relic, or Grafana providers), platform tools (e.g., using the Kubernetes, Helm, Heroku, Rundeck, or Rightscale providers), and much more. Moreover, each provider has much better coverage these days: e.g., the AWS provider now covers the majority of important AWS services and often adds support for new services even before CloudFormation!

Conclusion

Over the last couple of years, Terraform has become a more mature, robust, and powerful tool. And based on the rate at which the community is growing, I fully expect it to get even better, and I’m very excited to see how it evolves in the future.

In this blog post, you got a small taste of the major changes from just the last couple of years. For the full details, grab yourself a copy of the Early Release of Terraform: Up & Running, 2nd edition, and let me know what you think!

Text Link