Once a month, we send out a newsletter to all Gruntwork customers that describes all the updates we’ve made in the last month, news in the DevOps industry, and important security updates. Note that many of the links below go to private repos in the Gruntwork Infrastructure as Code Library and Reference Architecture that are only accessible to customers.
Hello Grunts,
In the last month, we’ve made a major upgrade to the Infrastructure as Code Library: in partnership with Google, we’ve added a collection of open source, production-grade, reusable modules for deploying your infrastructure on Google Cloud Platform (GCP)! We also launched a new documentation website, replaced our OracleJDK module with an OpenJDK module, added a module to automatically issue and validate TLS certs, made major updates to our Kubernetes/EKS code (including support for private endpoints, log shipping, ingress controllers, external DNS, etc), and fixed a number of critical bugs.
As always, if you have any questions or need help, email us at support@gruntwork.io!
Motivation: Up until recently, we had been primarily focused on AWS, but this month, we’re excited to announce, in partnership with Google, that we’ve added first-class support for Google Cloud Platform (GCP)! And best of all, thanks to this partnership, all of our GCP modules are open source!
Solution: We worked directly with Google engineers to develop a set of reusable, production-grade infrastructure modules, including:
We also now offer commercial support for both AWS and GCP. Check out our announcement blog post for the details.
What to do about it: To get started with these modules, check out our post on the Google Cloud Blog, Deploying a production-grade Helm release on GKE with Terraform. This blog post will walk you through setting up a Kubernetes cluster, configuring Helm, and using Helm to deploy a web service on Google Cloud in minutes. You can even try out the code samples from that blog post directly in your browser, without having to install anything or write a line of code, using Google Cloud Shell!
Motivation: DevOps is hard. There seem to be 1,001 little details to get right, and you never have the time to learn them all.
Solution: We’ve launched a new Gruntwork Docs site that helps you get up and running even faster! You can already find guides for Deploying a Dockerized App on GCP/GKE and Deploying a Production Grade EKS cluster.
What to do about it: Head to the Gruntwork Docs site at docs.gruntwork.io. We’ll be adding much more content in the future, so let us know in the comments and via support what DevOps issues you’re struggling with, and we’ll do our best to write up guides to answer your questions.
Motivation: We discovered that Oracle has changed their policies to require authentication for all downloads of their JDK. This broke our install-oracle-jdk
module. This in turn has impacted all of our Java based infrastructure packages: Kafka, Zookeeper, ELK.
Solution: We created a new install-open-jdk
module that will install OpenJDK instead of Oracle’s JDK. It was created to be a drop-in replacement for our other module. In the past, the Oracle JDK used to be the best option, as OpenJDK was missing many features, had worse performance, and didn’t offer commercial support. However, in recent years, the differences between the JDKs in terms of features and performance have become negligible and Oracle no longer allows you to use their JDK for free (a paid license is required for production usage!). Therefore, most teams are now better off going with OpenJDK, which you can install using this module. Note that if you need commercial support for the JDK, you may wish to use Azul or AdoptOpenJdk instead. We’re updating all of our own Java based infrastructure packages to use this new module.
What to do about it: The new OpenJDK installer module is available as part of Zookeeper’s v0.5.4 release. Check it out and use it instead of install-oracle-jdk
as we will be deprecating and removing it shortly.
Motivation: AWS Certificate Manager (ACM) makes it easy to issue free, auto-renewing TLS certificates. So far, we’ve mostly been creating these certificates manually via the AWS Console, but we’ve always wanted to manage them as code.
Solution: We’ve created a new Terraform module called acm-tls-certificate
that can issue and automatically validate TLS certificates in ACM! Usage couldn’t be simpler:
# Create a TLS certificate for example.your-domain.com
module "cert" {
source = "git::git@github.com:gruntwork-io/module-load-balancer.git//modules/acm-tls-certificate?ref=v0.13.2"
domain_name = "example.your-domain.com"
hosted_zone_id = "ZABCDEF12345"
}
You pass in the domain name you want to use and the ID of the Route 53 Hosted Zone for that domain, and you get a free, auto-renewing TLS certificate that you can use with ELBs, CloudFront, API Gateway, etc! For example, here’s how you can use this certificate with an Application Load Balancer (ALB):
# Create a TLS certificate for example.your-domain.com
module "cert" {
source = "git::git@github.com:gruntwork-io/module-load-balancer.git//modules/acm-tls-certificate?ref=v0.13.2"
domain_name = "example.your-domain.com"
hosted_zone_id = "ZABCDEF12345"
}
# Attach the TLS certificate to an ALB
module "alb" {
source = "git::git@github.com:gruntwork-io/module-load-balancer.git//modules/alb?ref=v0.13.2"
alb_name = "example-alb"
https_listener_ports_and_acm_ssl_certs = [
{
port = 443
tls_domain_name = "${module.cert.certificate_domain_name}"
},
]
# ... other params omitted ...
}
And now your load balancer is using the TLS certificate on its listener for port 443!
What to do about it: The acm-tls-certificate module is available in module-load-balancer, v0.13.2. Check out this example for fully-working sample code.
Motivation: Since December of last year, we have been busy building up a production grade IaC module for EKS that makes it 10x easier to deploy and manage EKS. What makes infrastructure production grade depends on how many items of our Production Grade Checklist is covered. This month we shipped multiple new modules that enhance the security and monitoring capabilities of the EKS cluster deployed with our modules.
Solution: Over the last month we enhanced our EKS module with the following updates:
eks-cluster-control-plane
module. Check out the module docs for more info. (v0.2.3)kubectl
to access the cluster using the kubernetes
provider and kubergrunt
. (v0.3.0)fluentd-cloudwatch
module and introduced a module to create reciprocating security group rules (the eks-cluster-workers-cross-access
module). (v0.3.1)enabled_cluster_log_types
variable. You can read more about this feature in the official AWS documentation. (v0.4.0)eks-alb-ingress-controller
and eks-alb-ingress-controller-iam-policy
, which allows you to map Ingress resources to AWS ALBs. See the module documentation for more information. (v0.5.0)eks-k8s-external-dns
and eks-k8s-external-dns-iam-policy
, which allows you to map Ingress
resource host paths to route 53 domain records so that you automatically configure host name routes to hit the Ingress
endpoints. See the module documentation for more information. (v0.5.1, v0.5.2, v0.5.3)What to do about it: Upgrade to the latest version of terraform-aws-eks (v0.5.4) to start taking advantage of all the new features!
Motivation: In Ben Whaley’s VPC reference architecture, it is common to setup a management VPC that acts as a gateway to other application VPCs. In this setup, operators typically VPN into the management VPC and access the other VPCs in your infrastructure over a VPC peering connection. One challenge with this setup is that domain names in Route 53 private hosted zones are not available to the peering VPC.
Solution: To allow DNS lookups of private hosted zones over a peering connection, we can use Route 53 Resolvers to forward DNS queries for specific endpoints to the application VPCs. We created two new modules in module-vpc to support this use case: vpc-dns-forwarder
and vpc-dns-forwarder-rules
.
What to do about it: The vpc-dns-forwarder and vpc-dns-forwarder-rules are available in module-vpc, v0.5.7. Take a look at the updated vpc-peering example for fully working sample code.
Motivation: The Gruntwork server-group module includes a script called rolling_deployment.py
which can be used for hooking up a load balancer to perform health checks on the server-group. This script relied on an API call which recently started throwing an exception that we were not handling. This resulted in a situation where the unhandled exception in the health-checker script could cause a deployment of the server-group to fail erroneously.
Solution: We updated the rolling_deployment
script to properly handle the exception. See this PR for more details
What to do about it: Update to module-asg v0.6.26 to pick up the fix.
Motivation: The Gruntwork ecs-cluster module includes a script called roll-out-ecs-cluster-update.py
which can be used to roll out updates (e.g., a new AMI or instance type) to the Auto Scaling Group that underlies the ECS cluster. This script should work without downtime, but recently, one of our customers ran it, and when the script finished, it had left the cluster with some of the instances updated to the new AMI, but some still running the old AMI, and the old ones were stuck in DRAINING
state. Clearly, something was wrong!
Solution: It looks like AWS made backwards incompatible changes to the default termination policy for Auto Scaling Groups, and as the roll-out-ecs-cluster-update.py
depended on the behavior of this termination policy as part of its roll-out procedure, this change ended up breaking the script. To fix this issue, we’ve updated the ecs-cluster
module to expose the termination policy via a new termination_policies
input variable, and we’ve set the default to OldestInstance
(instead of Default
) to fix the roll out issues.
What to do about it: Update to module-ecs, v0.13.0 to pick up the fix. Update, 05.09.19: it’s possible this does not fix the issue fully. See #134 for ongoing investigation.
Motivation: One of our customers was connected to VPN servers in two different accounts (stage and prod) and noticed connectivity wasn’t working quite right. It turns out the cause was that the Gruntwork Reference Architecture was using the conflicting CIDR blocks for the “mgmt VPCs” (where the VPN servers run) in those accounts.
Solution:** We’ve updated the Reference Architecture to use different CIDR blocks for the mgmt VPCs in each account. The app VPCs were already using different CIDR blocks.
What to do about it: If you wish to connect to multiple VPN servers at once, or you need to peer the various mgmt VPCs together for some reason, you’ll want to ensure each one has a different CIDR block. The code change is easy: see this commit for an example. However, VPC CIDR blocks are considered immutable in AWS, so to roll this change out, you’ll need to undeploy everything in that mgmt VPC, undeploy the VPC, deploy the VPC with the new CIDR block, and then deploy everything back into the VPC.
SetStrValues
argument for helm.Options
, which corresponds to the --set-string
argument. This can be used to force certain values to cast to a string as opposed to another data type.GetAccountId
and GetAccountIdE
methods now use STS GetCallerIdentity
instead of IAM GetUser
under the hood, so they should now work whether you're an IAM User, IAM Role, or other AWS authentication method while running Terratest.GetEcsService
and GetEcsTaskDefinition
, which can be used to retrieve ECS Service and ECS Task Definition objects respectively. Check out the new Terraform example and corresponding test to see it in action.-var
command line option.GetParameter
, GetParameterE
, PutParameter
, PutParameterE
.PutS3BucketVersioning
, PutS3BucketVersioningE
, GetS3BucketVersioning
, GetS3BucketVersioningE
, AssertS3BucketVersioningExists
, AssertS3BucketVersioningExistsE
.PutS3BucketPolicy
, PutS3BucketPolicyE
, GetS3BucketPolicy
, GetS3BucketPolicyE
, AssertS3BucketPolicyExists
, AssertS3BucketPolicyExistsE
.skip = true
in your Terragrunt configuration to tell Terragrunt to skip processing a terraform.tfvars
file. This can be used to temporarily protect modules from changes or to skip over terraform.tfvars
files that don't define infrastructure by themselves.terragrunt-info
command you can run to get a JSON dump of Terragrunt settings, including the config path, download dir, working dir, IAM role, etc.run-vault
script to run Vault in agent mode rather than server mode by passing the --agent
argument, along with a set of new --agent-xxx
configs (e.g., --agent-vault-address
, --agent-vault-port
, etc). The Vault agent is a client daemon that provides auto auth and caching features.k8s wait-for-ingress
sub command which can be used to wait until an Ingress
resource has an endpoint associated with it.tls gen
command to use the new way of authenticating to Kubernetes (specifically passing in server and token info directly) and using JSON to configure the TLS subject. This release also introduces a new command helm wait-for-tiller
which can be used to wait for a tiller deployment to roll out Pods, and have at least one Pod that can be pinged. This enables chaining calls to helm after helm is deployed when using a different helm deployment process that doesn't rely on the helm client (e.g creating deployment resources manually).kubergrunt helm configure
with a new option --as-tf-data
, which enables you to call it in an external data source. Passing this flag will cause the command to output the configured helm home directory in the output json on stdout
at the end of the command.k8s-tiller
, which can be used to use manage Tiller deployments using Terraform. The difference with the kubergrunt
approach is that this supports using Terraform to apply updates to the Tiller Deployment
resource. E.g you can now upgrade Tiller using Terraform, or update the number of replicas of Tiller Pods
to deploy. Note that this still assumes the use of kubergrunt
to manage the TLS certificates.k8s-namespace
and k8s-namespace-roles
modules now support conditionally creating the namespace and roles via the create_resources
input variable.list-remove
which can be used to remove items from a terraform list. See the module docs for more info.redirect_http_to_https
variable to true
on the jenkins-server
module to automatically redirect all HTTP requests to HTTPS.fargate_without_lb
resource incorrectly set a health_check_grace_period_seconds
. From the terraform documentation, "Health check grace period is only valid for services configured to use load balancers".ecs-service
module using the new task_execution_name_prefix
input variable. The default is var.service_name
, as before.fail2ban
was not correctly working on non-ubuntu instances. Specifically, fail2ban
had a bug that prevented it from correctly banning brute force SSH attempts on CentOS and Amazon Linux 1 platforms. Checkout the release notes for more details.alarms
modules via a new tags
input variable and (b) configure the snapshot period and snapshot evaluation period for the elasticsearch-alarms
module using the new snapshot_period
and snapshot_evaluation_period
input variables, respectively.What happened: HashiCorp has released Terraform 0.12, release candidate 1 (rc1).
Why it matters: The final release of Terraform 0.12 draws closer and closer! Terraform 0.12 brings with it a number of powerful new features, but will also require a significant upgrade. We’ve already started updating our modules with support for 0.12, including updating Terratest to work with 0.11 and 0.12.
What to do about it: For now, continue to sit tight, and await 0.12 final, as well as our word that all of our modules have been updated. We’ll send upgrade instructions when everything is ready!
What happened: AWS has announced, rather quietly, that path-style S3 URLs will no longer be supported after September 30th, 2020. Update: AWS just released a new blog post that says path-style URLs will only be deprecated for new S3 buckets created after September 30th, 2020.
Why it matters: In the past, for an S3 bucket called my-bucket
, you could build S3 URLs in one of two formats:
s3.amazonaws.com/my-bucket/image.jpg
foo.s3.amazonaws.com/image.jpg
The former supported both HTTP and HTTPS, whereas the latter used to only support HTTP. Now, both support HTTPs, but the path-style URLs will no longer be supported after September 30th 2020. Update: AWS just released a new blog post that says path-style URLs will continue to work for S3 buckets created before September 30th, 2020, but will not be available for S3 buckets created after that date.
What to do about it: If you’re using path-style S3 URLs, update your apps to use virtual-host style URLs instead. Note that if your bucket name contains dots, virtual-host style URLs will NOT work, so you’ll have to migrate to a new S3 bucket!
Below is a list of critical security updates that may impact your services. We notify Gruntwork customers of these vulnerabilities as soon as we know of them via the Gruntwork Security Alerts mailing list. It is up to you to scan this list and decide which of these apply and what to do about them, but most of these are severe vulnerabilities, and we recommend patching them ASAP.
autobuild
feature. The tokens of affected users have already been revoked by Docker, but if you have not received an email from Docker Hub notifying you of the revocation, we recommend revoking the tokens in GitHub or BitBucket. Additionally, we recommend auditing the security logs to see if any unexpected actions have taken place. You can view security actions on your GitHub or BitBucket accounts to verify if any unexpected access has occurred (see this article for GitHub and this article for BitBucket). Now would also be a good time to review your organization’s OAuth App access, and consider enabling access restrictions on your org.**** We notified the Security Alerts mailing list about this vulnerability on May 1st, 2019.