Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.
Hello Grunts,
This month, we made major updates to Terratest, our IaC testing library, including adding support for Google Cloud, RDS, SSH Agent, and log file gathering; updated our ECS modules to support Service Discovery (based on Route 53), daemon services (e.g., run exactly one DataDog container on each ECS node), and volumes; fixed some important issues with our cross-account IAM Roles and aws-auth
script; fixed a bug in our SQS module that revealed an important Terraform issue; and many other fixes and tweaks. Read on for all the details.
As always, if you have any questions or need help, email us at support@gruntwork.io!
We’re continuing to grow and improve Terratest, our open source, swiss-army knife for testing infrastructure code. With lots of help from the open source community, we’ve added a number of major new abilities to Terratest this month. Here are the highlights:
Google Cloud Support: We added support for GCP to Terratest! Terratest now supports a broad collection of GCP features including getting public IPs, adding labels, getting a random region or zone, reading/writing in GCS buckets, and much more. This is available in Terratest, v0.10.0.
Gather log files: Added methods that allow you to fetch files over SSH from servers, EC2 Instances, and ASGs. The main use case for this is to fetch the contents of log files (or any other files you want to see!) from EC2 Instances at the end of the test to make debugging a test failure 10x easier. This is available in Terratest, v0.9.17.
SSH Agent:**** You can now use ssh-agent authentication with Terratest’s SSH methods by simply setting the SshAgent
field to true
in the ssh.Host
struct (Terratest, v0.9.16). You can run an in-process SSH agent using the ssh.NewSshAgent
method and have Terratest use that SSH Agent for SSH connections via the new OverrideSshAgent
parameter in ssh.Host
(Terratest, v0.10.1). You can use the new SshAgent
property in terraform.Options
to specify an in-process SSH agent to use when running Terraform. This makes it easier at test time to use custom SSH keys with remote-exec
, files
, and other ssh-based provisioners (Terratest, v0.10.2).
RDS: You can now use Terratest to test your RDS databases! Check out the new methods in the aws
package, including GetAddressOfRdsInstance
, GetWhetherSchemaExistsInRdsMySqlInstance
, GetParameterValueForParameterOfRdsInstance
, and GetAllParametersOfRdsInstance
. This is available in Terratest, v0.10.3. Support for other DBs and checks will be coming in the future.
Motivation: AWS ECS makes it easy to deploy Docker containers as long-running ECS services, but as ECS selects which EC2 instance will run the container, and as EC2 Instances can be replaced or scaled up or down, ECS services have dynamically assigned IPs. If you want your services to be able to talk to each other, you need some form of service discovery so they can find out which service lives at which IPs. In the past, we used internal load balancers for this purpose, but a few months ago, Amazon announced integrated support for Service Discovery in ECS that allows you to reach your ECS services through hostnames managed by Route 53.
For example, service foo
can talk to service bar
at bar.internal-domain.com
and service baz
at baz.internal-domain.com
. With integrated support for service discovery, ECS can now take care of automatically registering IPs when a new container is deployed and de-registering them when the container is deployed or crashes. Here are some of the advantages of using ECS Service Discovery over a load balancer:
Solution: We created a terraform module under our module-ecs package called ecs-service-with-discovery. It allows you to deploy an ECS Service with Service discovery in AWS, taking care of registering the ECS service, configuring the network, and making a the necessary Route 53 alias for public hostnames. Currently our module supports public or private hostnames (examples are provided for both scenarios) and tasks with the awsvpc
network mode. host
and bridge
network modes will be supported in future updates.
What to do about it: Upgrade to version v0.8.0 of module-ecs if this functionality would be useful to you.
Motivation: Many of our customers needed a way to run exactly one copy of a specific Docker container, such as a DataDog agent, on each EC2 Instance in their ECS cluster. This was hard to do with the ECS scheduler, so typically, some sort of hack was required.
Solution: We’ve added a new ecs-daemon-service module that that you can use to deploy exactly one task on each active container instance that meets all of the task placement constraints specified in your cluster.
What to do about it: The ecs-daemon-service
module is available in module-ecs, v0.8.2.
We had several other updates to ECS this month:
cidr_blocks
parameter in the ecs-fargate
module to properly handle lists of CIDR blocks.ecs-fargate
module now outputs the IAM Role ID and name via the fargate_task_execution_iam_role_id
and fargate_task_execution_iam_role_name
output variables, respectively.ecs-service-with-alb
module using the new volumes
parameter.Motivation: One of our customers wanted to use our aws-auth
script to assume an IAM Role, and use MFA, and set an expiration time longer than the 1h default. Upon trying this, he was hitting the error The requested DurationSeconds exceeds the 1 hour session limit for roles assumed by role chaining.
Solution: This issue is fixed in module-security, v0.15.0. Note that you need to update two things:
aws-auth
: The new version of the script can assume an IAM Role and use MFA without “role chaining,” so you can use longer expiration times.cross-account-iam-roles
: After fixing the aws-auth
issue, we hit a new one: we were able to successfully assume IAM Roles in other accounts, but every API call would fail with the error Access Denied
. It turns out that the IAM Roles we were creating in the cross-acount-iam-roles
module were requiring an MFA token not only to assume the IAM Role, but also for every API call after, which doesn’t work with aws sts assume-role
. We’ve updated the cross-account-iam-roles
module so it only requires MFA to assume the role in its Trust Policy, which is all that’s really necessary for MFA protection.What to do about it: Update your cross-account-iam-roles
usage in each of your AWS accounts and update your local copy of the aws-auth
script to module-security, v0.15.0.
Motivation: A Gruntwork customer was importing an existing RDS instance whose resources (subnet name, description, etc.) did not match the naming conventions used within the module-data-storage package. This was causing Terraform to indicate that their RDS instance would be destroyed and recreated, an obviously unacceptable consequence.
Solution: We updated the module-data-storage
package to expose several new variables to allow you to specify the names and descriptions of all the sub-resources, allowing them to match that of the already running instance that is being imported. If these new variables are not specified, the previous default naming schemes are used.
What to do about it: Upgrade to version v0.6.7 of module-data-storage if this functionality would be useful to you.
We’ve made a number of updates to Terragrunt this month:
TERRAGRUNT_DOWNLOAD
environment variable.terraform init -from-module=xxx
to download the code for xxx
, it now sets the -get=false
, -get-plugins=false
, and -backend=false
params. These will all be handled in a later call to init
instead. This should improve iteration speed when calling Terragrunt commands.init-from-module
, which only executes when downloading remote Terraform configurations based on the source
parameter, and init
, which executes for all other init
invocations (e.g,. to configure backends, download providers, download modules).extra_arguments
by specifying key value pairs in the env_vars
parameter.consul-cluster
module by specifying the security_group_tags
parameter.consul-cluster
module using the new additional_security_group_ids
parameter.lb_tags
.vault-elb
module using the optional param health_check_port
.vault-cluster
and vault-elb
modules.vpc_id
parameter.fetch
should now work with GitHub Enterprise URLs! If the repo URL you specify is not GitHub.com, fetch
will automatically assume it's GitHub Enterprise and use the proper API calls. This defaults to GitHub Enterprise version v3
, but that can be overridden with the new --github-api-version
option.vX.Y
).aws_get_instances_with_tag
and aws_wrapper_get_ips_with_tag
were added to allow retrieving EC2 instances and their (public/private) IPs using the value of a specific tag.aws_get_instances_with_tag
and aws_wrapper_get_ips_with_tag
functions were updated to return only pending and running EC2 instances.iam_user_self_mgmt
policy in the iam-policies
module now includes the iam:DeleteVirtualMFADevice
permission, which seems to be required now to add an MFA device, but is also useful for deleting one.allow_inbound_from_security_group_ids_num
which is the number of elements in var.allow_inbound_from_security_group_ids
. We should be able to compute this automatically and we were computing this automatically, but due to a Terraform limitation, if there are any dynamic resources in var.allow_inbound_from_security_group_ids
, then we won't be able to. See: hashicorp/terraform#11482git-add-commit-push
script will now retry on "cannot lock ref" errors that seem to come up if two git push
calls happen simultaneously.fifo_queue
set to true
without running into a naming error. Due to a terraform limitation, our conditions on resources weren’t taking effect and the appropriate .fifo
suffix for FIFO queues wasn’t being appended as required to the supplied name. See: hashicorp/terraform#13389What happened: AWS has announced that Aurora Serverless for MySQL is now generally available in several regions!
Why it matters: Aurora Serverless is an on-demand, auto-scaling, serverless relational database. That means you don’t need to deploy any servers or configure it with a certain amount of capacity ahead of time. You just provision a “cluster” in a few seconds and after that, it will scale CPU, memory, and storage capacity up and down in response to load. Moreover, if you don’t use the database for a while, you can configure Aurora Serverless to shut it down automatically (e.g., after 60 min of inactivity) so that you only pay for storage costs. This can be very useful for infrequently used databases (e.g., in a pre-prod environment).
What to do about it: Aurora Serverless for MySQL is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Tokyo). Note that while it has many advantages, there are also many limitations, so use with care! We’ll be updating module-data-storage
with support for Aurora Serverless soon.
What happend: HashiCorp has released Vault 0.11.
Why it matters: A few of the key new features in Vault 0.11 are:
What to do about it: Try out the new version and let us know how it works for you!
Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.