Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.
Hello Grunts,
In the last month, we added production-ready usage patterns for ELK, updated our ECS modules to do deployment checks, added support for Aurora serverless, made a large number of updates to Terratest, including a log parser that makes it easier to debug failing tests, and fixed a large number of bugs.
As always, if you have any questions or need help, email us at support@gruntwork.io!
Motivation: Last August, we released a set of modules for running the ELK Stack (Elasticsearch, Logstash, Kibana) on top of AWS: package-elk. We got requests from customers about the best ways to use those modules, both in pre-prod and production environments.
Solution: We’ve updated our Acme Reference Architecture with two examples:
What to do about it: Use the code above to deploy package-elk in your own AWS accounts!
Motivation: When using Terraform to deploying Docker containers to ECS, the built in aws_ecs_service
resource will return report “success” as soon as the container has been scheduled for deployment. However, this doesn’t check if ECS actually managed to deploy your service! So terraform apply
completes successfully, and you don’t get any errors, but in reality, the container may fail to deploy due to a bug or the ECS cluster being out of resources.
Solution: All of the ecs-service
modules in module-ecs
will now run a separate binary as part of the deployment to verify the container is actually running before completing apply
. This binary will wait for up to 10 minutes (configurable via the deployment_check_timeout_seconds
input parameter) before timing out the check. Upon a check failure, the binary will output the last 5 events on the ECS service, helping you debug potential deployment failures during the terraform apply
. In addition, if you setup an ALB or NLB with the service, the binary will check the ALB/NLB to verify the container is passing health checks.
What to do about it: The binary will automatically be triggered with each deploy when you update to module-ecs, v0.10.0. This binary requires a working python install to run (supports versions 2.7, 3.5, 3.6, and 3.7). If you do not have a working python install, you can get the old behavior by setting the enable_ecs_deployment_check
module input to false
.
Motivation: Customers have been asking us for the ability to use Aurora Serverless, which is an an on-demand relational database (MySQL compatible) that will start-up, shut-down, and scale on demand, without you having to provision or manage servers in advance. This is especially useful in pre-prod environments and for sporadically used apps, where you want the database to power down when not in use, so you don’t have to pay for it.
Solution: Our aurora module now supports Aurora serverless! Just set the engine_mode
parameter to "serverless"
and you’re good to go! You can also configure scaling settings using the new scaling_configuration_xxx
parameters and enable deletion protection using the deletion_protection
parameter.
What to do about it: Update to module-data-storage, v0.7.1 and give Aurora Serverless a try!
Motivation: Infrastructure tests can be slow. Therefore, you typically (a) log every action the test code takes so that you can debug issues purely from the logs, without having to re-run the slow tests and (b) run as many tests in parallel as you can. However, when you do this, all the logs get interleaved due to the concurrent nature of test execution. This makes it difficult to piece out what is going on when a test fails.
Solution: Terratest now ships a log parsing binary that can be used to piece out what is happening in automated tests written in Go. To use the binary, you must first extract the logs to a file and then feed that file to the log parser. Here’s an example:
# Run your Go tests and send the output to a file
go test | tee test-logs.txt
# Pass the file through the log parser
terratest_log_parser --testlog test-logs.txt --outputdir /tmp/logs
The command will then break out the interleaved entries by test, outputting each test log to its own file in a specified directory. Here’s what that output looks like in CircleCI:
In addition, the log parser will emit a junit XML report so that it can be used by CI engines for additional insights, such as making it much easier to see which test failed:
What to do about it: You can install the helper binary using the gruntwork-installer and then take a look at the README for a walk-through of how to use the command. There is no need to upgrade your test code to terratest v0.13.8, as the binary does not depend on any updates to the tests themselves.
Motivation: We have several Lambda functions in module-data-storage that make it easy to automatically back up databases running in RDS to another AWS account on a scheduled basis. We wrote these Lambda functions a while ago, and they were using our old Lambda code, had deprecation warnings, and did not allow names to be customized, so it was possible to end up with a name that exceeded the maximum length allowed by AWS.
Solution: We’ve updated and refactored all these Lambda functions! The lambda-cleanup-snapshots
, lambda-copy-shared-snapshot
, lambda-create-snapshot
, and lambda-share-snapshot
modules now all use package-lambda under the hood (instead of the older lambda code that used to live in module-ci
) and expose optional lambda_namespace
and schedule_namespace
parameters that you can use to completely customize all the names of resources created by these modules.
What to do about it: Update to module-data-storage, v0.7.0.
terraform.OutputList
function for reading and parsing lists returned by terraform output
.GetAmiPubliclyAccessible
and GetAccountsWithLaunchPermissionsForAmi
.GetInstances
and GetPublicIps
.OutputMap
/ OutputMapE
functions to read and parse maps from terraform output
.GetRandomZone()
function now accepts an argument for forbiddenRegions
.terratest_log_parser
as described earlier in this blog post.terratest_log_parser
.github.com/codegangsta/cli
to its new name, gopkg.in/urfave/cli.v1
. There should be no change in behavior.stdin
were not showing up correctly.cloud-nuke
will now delete ECS services and tasks.file_fill_template
was added to allow replacing specific template strings in a file with actual values.consul-cluster
module using the new (optional) service_linked_role_arn
parameter.scheduled-lambda-job
module now namespaces all of its resources with the format "${var.lambda_function_name}-scheduled"
instead of "${var.lambda_function_name}-scheduled-lambda-job"
. This makes names shorter and less likely to exceed AWS name length limits. If you wish to override the namespacing behavior, you now set a new input variable called namespace
.asg-disk-alarms
module to include the file system and mount path. This ensures that if you create multiple alarms for multiple disks on the same auto scaling groups, they each get a unique name, rather than overwriting each other.asg-rolling-deploy
module so the script it uses within works with either Python 2 or Python 3.server-group
modules in a sequential order instead of only in parallel. This is useful when creating a collection of clusters where Cluster A may depend on Cluster B.module-asg
version v0.6.18
so that the rolling deploy script works with either Python 2 or 3. Upgrade Oracle JDK installer to version 8u192-b12
.zookeeper_servers
to use the latest module-asg
and then expose the rolling_deployment_done
as an output so that other modules can be launched after this module deploys.git-add-commit-push
script to check there are files staged for commit before trying to commit.run-kafka
script now exposes params to configure SSL protocols and ciphers, SASL authentication, ACLs, ZooKeeper chroot, and JMX. The Kafka Connect, Schema Registry, and REST Proxy modules now allow you to configure a keystore for validating SSL connections.ecs-service-with-discovery
module will now create an IAM role for the ECS task that can be extended with custom policies, similar to the ecs-service
module. Note: this is a backwards incompatible change. Refer to the release notes for more information.What happened: AWS has announced that Lambda functions can now run for up to 15 minutes!
Why it matters: Lambda functions used to be limited to a max runtime of 5 minutes. This made them useful for short, one-off tasks, but any workload that took longer than that would have to be executed elsewhere (e.g., in an ECS Cluster). The time limit has now been increased to 15 minutes, which means you can use Lambda functions for an even larger variety of use cases.
What to do about it: This feature is available everywhere immediately. Simply set the max timeout to your lambda function to 15 minutes (900 seconds), and you’ll be good to go!
What happened: At HashiConf 2018, the HashiCorp team announced some major changes with Terraform Enterprise:
plan
and apply
commands remotely, in the Terraform Enterprise SaaS product (rather than on your own computer), while still streaming logs and data back to your own computer.Why it matters: It looks like the HashiCorp team is making a push for all Terraform users to move to Terraform Enterprise (at one of the three tiers) to simplify collaboration and team workflows.
What to do about it: For now, the only thing you can do is to sign up for a waitlist for the free remote state management functionality. However, keep your eye on this space, as new functionality will likely be rolling out soon.
What happened: Amazon’s ECS-optimized AMI now supports Amazon Linux 2.
Why it matters: Amazon Linux 2 includes systemd, newer versions of the Linux kernel, C library, compiler, and tools, and access to more/newer software packages. It’s also the version of Amazon Linux that will get long-term support (at least through 2023).
What to do about it: If you are using ECS, you may wish to update your Packer templates to use the new ECS-optimized AMIs. You can find the AMI name pattern to put into the source_ami_filter
param in your Packer template, as well as the latest AMI IDs, on this page. Note that you need to use the following versions of Gruntwork modules to get Amazon Linux 2 support:
fail2ban
on Amazon ECS-optimized Linux 2, you’ll need to install yum-utils
first (e.g., add sudo yum install -y yum-utils
earlier in your Packer template).Once you’ve built a new AMI, follow this guide to roll it out across your ECS cluster.