Terraform: Patterns and Anti-Patterns [Part 3] - Remote State

Sat, Aug 19, 2017 terraform, devops, infra-as-code

Remote state is (IMO) one of Terraform’s most powerful and unsung features. It’s also a feature that I notice a lot of first-time users (and unfortunately sometimes people who’ve even been using it for some time) tend to gloss over and ignore. For the first-time user, the light bulb usually goes off for the need of a solution that remote state provides when a scenario like this comes to pass:

You’ve built out some POC or app environment with Terraform (which is awesome - well done!). Your code (and because you didn’t know any better, your statefile) are in an a centrally accessible (but hopefully private - more on that later) git repo. Since your last push, though, you’ve made some changes to your code and have created some new resources. Then one of your coworkers says, “Hey, I’ve added some resources to this project, and I’m building them out in the existing environment. I’ll let you know if I run into any problems!”

This scenario paints a backdrop for what I believe are two problem domains that remote state addresses well.

Domain 1: Distribution and Parity

From here, one of two things happens in the mind of the new user (based on their understanding of Terraform):

Nothing. You don’t realize yet that your workmate was working off of a statefile that didn’t reflect the true state of the environment you have created, and that when he applies his local changes he’s going to destroy the difference of what you’ve created since your last push to the git remote. That terminal bell you hear? That’s your SSH session getting kicked because the server you’ve been using to write Ansible playbooks or packer templates for the last few hours just got nuked (hope you had all your work saved somewhere outside of that host)!
Panic. In your mind’s eye, you see the scenario from #1 playing itself out. “Hang on! - Don’t apply your changes!” you say. You slack or email or push your local statefile, which is a complete pain for your workmate (especially if it’s a push, super especially if it causes some merge conflict with code he’s got locally).

Neither of these is pleasant. There is also the third scenario where no real problems are caused because your environment is in parity with the statefile in the repo (in which case you’ll get to benefit from my mistakes by taking this post to heart).

Remote state takes care of the distribution and parity problems for you. It also keeps your statefile out of git. In fact, at a minimum, your .gitignore in a Terraform repo should contain these lines:

*.tfstate*
.terraform/

Domain 2: Security

Whether you realize it or not, sensitive data winds up in a statefile. Things like RDS passwords - and depending on things like userdata scripts (where you could have potentially embedded other juicy things like API keys for services, etc.) - wind up in the statefile. Considering this, you can see why it makes sense to build a habit of keeping these sorts of things out of a git repo (especially if you’re using some centrally hosted git service like GitHub, Bitbucket, etc.).

These two factors comprise my chief advocacy factors for remote state. With the use of S3 as a backend, it essentially costs you nothing extra (buckets are free, statefiles are small enough as to incur an absolutely minimal storage charge if you incur any at all) to utilize. And, if you set up your bucket correctly, you get security (ACLs and encryption [in fact, you can even enforce encryption, which isn’t a bad idea], restricted deletion if you use MFA delete [also not a bad idea]), versioning, and the “11 9s” as well.

With that said, go ahead and create an S3 bucket with a sound access control policy and enable versioning. You’ll use this bucket as your backend. For the sake of this post we’ll call this bucket state-bucket and assume you created it in us-east-1.

Assuming you’re using a 0.9.x or greater version of Terraform, you’ll ensure that your current codebase uses remote state by adding this to your code (I usually opt for main.tf or somewhere top-of-mind):

terraform {
    backend "s3" {}
}

Astute observers will notice that there’s not much configuration here; in that, they would be correct. I am a big of what Terraform refers to as “partial configuration”. I opt to orchestrate the remaining parameters through command-line flags available to the init subcommand. “Wouldn’t it just be easier to keep all of that config in the Terraform code itself since it allows for it?“, you may ask. My answer to that is “yes and no, but mostly no”, for a variety of reasons:

Cons

No variable interpolation. It does not work in this block, meaning if I go ahead and hardcode my values in that block I’m now storing production environment assets alongside non-production environment assets in my statefile if I use the same Terraform repo to provision non-prod and prod environments. I’ll leave it to Charity Majors to detail why this is a bad idea. You can get this ability back by wrapping the init subcommand in some sort of wrapper script.
Higher maintainability. If you try to workaround the point above by creating separate repos for different environments and just update the backend block to point to different statefiles, you now will actually need another repo to basically serve as the upstream development repo for each and every repo you create per environment. Yuck.
Less flexibility. Mostly for the reasons above, your single Terraform configuration is now unable to support the entire application lifecycle. Yuck (again).
Higher overhead. It should also be apparent that you will be multiplying your CI/CD overhead by doing this as well. You really don’t realize how yucky things are until you have to begin to sort this out.

Pros

Higher reasonability. It’s easier to reason about where state for a given set of configuration files lives when the values are hard-coded right there in front of you, versus having to work them out through a bunch of variables in some sort of wrapper script (though, a well-written script should make this clearly visible to any user).
Audit/Traceability If your config is hard-coded in versioned files, it’s much easier to trace back who/where/when any changes were made to that configuration. However, if you use a wrapper script that’s also versioned with your config files (that also does some decent validation), then this is mitigated to a certain degree.
Portability. A bash/sh script isn’t guaranteed to run everywhere, especially if you have folks on your team who use Windows. This can be solved somewhat readily in that these teammates likely (though it’s not guaranteed) already have git-bash, cygwin, or something at least resembling a (mostly) POSIX-compliant emulation environment set up locally. You could also use this as an opportunity to hone your Powershell-scripting skills as well.

As with all things, these pros and cons represent a tradeoff that you as a smart and savvy devops engineer must weigh and make a decision based upon. For me, they seem to represent a tradeoff between flexibilty and quick insight. It is my opinion that the flexibility and reuse benefits far outweigh any insight benefits (as mentioned, I think you can basically get most of that back through basically having a good wrapper script, which is mostly a one-time upfront cost if you’re going to be using Terraform across multiple projects). For you, different factors may lead you to a different decision, and that’s OK. Just think through the potential consequences of that decision first!

Assuming that you opted for the partial config and the use of init in some sort of wrapper, you might be wondering what a wrapper script would look like. A very simplistic implementation might look something like this:

#!/usr/bin/env bash

#################################################
# Init script - I live alongside your TF files
# Run me before you try to run anything like:
#   terraform plan ...
#   terraform apply ...
#   terraform destroy ...
#################################################

set -x

TFENV=$1
BUCKET=$2
BUCKET_REGION=$3

function check_env() {
    local tfenv=$1

    if ! [[ "${tfenv}" == "dev" || "${tfenv}" == "prod" ... ]]; then
        echo -e "Environment ${tfenv} isn't an option!  Aborting."
        exit 1
    }
}

function init() {
    local tfenv="$1"
    local bucket="$2"
    local bucket_region="$3"

    local init_ret_code=""
    local project_dir=$(basename "$PWD")
    local bucket_path="tf/${project_dir}/${tfenv}/terraform.tfstate"

    echo -e "Running 'terraform init' for ${tfenv} env from s3://${bucket}/${bucket_path} in ${bucket_region}."
    terraform init \
        -backend=true \
        -backend-config "bucket=${bucket}" \
        -backend-config "region=${bucket_region}" \
        -backend-config "encrypt=true" \
        -backend-config "key=${bucket_path}"

    init_ret_code="$?"
    if [[ "${init_ret_code}" == "0" ]]; then
        echo "'terraform init' completed successfully."
    else
        echo "'terraform init' failed with code ${init_ret_code}.  Aborting."
        exit 1
    fi
}

... # maybe you have other functions or pre-checks you want to run

check_env "${TFENV}"
init "${TFENV}" "${BUCKET}" "${BUCKET_REGION}"

As the comments note, this lives alongside your Terraform files. Call it tfinit.sh or whatever you like and invoke it like this:

$ ./tfinit.sh dev state-bucket us-east-1

(Hint: You may notice that this script forms a good foundation for other commands like plan or apply and want to name the script accordingly and incorporate those additional functions. Go for it! If you’re a make/Makefile fan, those work really well for this purpose, too [I stuck with a bash script for the sake of simplicity here].)

I should also state a few underlying assumptions here if you’re using a script like this:

As in the previous posts, you’re authenticating using env variables (e.g. AWS_PROFILE).
This bucket is in the same account as your deployment will go, hence AWS_PROFILE doesn’t need to change between init and plan or apply phases. If this is the case, the situation is a bit stickier, but we’ll cover it in a future post.
This post/config doesn’t address state file locking, but it’s definitely something I’ll address in a future post.
The terraform config above can and should be configured in light of previous posts (I’ve simply omitted other parameters here for pedagogical purposes).

And until next time, happy building!