Abusing Terraform to Upload Static Websites to S3

It can be a pain to set up static websites by hand with S3. We can automate the process with Terraform.

Greg Schaberg

,

Staff Infrastructure and Web

Programming Insights

Oct 6, 2021

S3 has been a great option for hosting static websites for a long time, but it's still a pain to set up by hand. You need to traverse dozens of pages in the AWS Console to create and manage users, buckets, certificates, a CDN, and about a hundred different configuration options. If you do this repeatedly, it gets old fast. We can automate the process with Terraform, a well-known "infrastructure as code" tool, which lets us declare resources (e.g. servers, storage buckets, users, policies, DNS records) and let Terraform figure out how to build and connect them.

Terraform can create the infrastructure needed for a static website on AWS (e.g. users, bucket, CDN, DNS), and it can create and update the content (e.g. webpages, CSS/JS files, images), which goes outside the infrastructure part of "infrastructure as code" and is why I'm labeling it as an abuse or misuse of Terraform. Still, it works and has a few benefits:

You can define the bucket, properties, DNS, CDN, etc. in the same place as your content
You have a fully-automated process for standing up websites that only requires a single tool, Terraform

... and a few downsides:

Uploading files is slow compared to something like the AWS CLI's sync command
Terraform isn't meant for transforming or managing content, so you may outgrow Terraform's capabilities if you want advanced features or optimization

This article will breeze over the infrastructure parts of creating a static website on AWS and focus more on how to upload content and manage content metadata (MIME types and caching behavior). If you want to learn more about the infrastructure parts (e.g. setting up CloudFront, an SSL certificate, DNS routes), there are many great tutorials out there. Here are a few:

Let's get on to the code! If you want just the code, you can find it here: https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload

The Boilerplate

We need some boilerplate to set up infrastructure before we can upload files to an S3 bucket. So, let's create a bucket with Terraform and the AWS provider. We'll configure the provider and create the bucket in a main.tf file containing the following:

AWS Credentials

To create or interact with AWS resources, we need to provide credentials. The AWS Terraform provider accepts authentication in a variety of ways, but I'm going to use a credential file. That file is located at ~/.aws/credentials and looks like:

[aws_admin]

If you don't have credentials handy, you can follow AWS documentation to create a new user with a policy that grants S3 permissions.

Uploading Files to S3 with Terraform

Here's where we start using Terraform... creatively, i.e. for managing content instead of just infrastructure. For the content, I've created a basic multi-page website — a couple HTML files, a CSS file, and a couple images. By using Terraform's fileset function and the AWS provider's s3_bucket_object resource, we can collect all the files in a directory and upload all of them to objects in S3:

The for_each meta-argument loops over all files in the website directory tree, binding the file path (index.html, assets/normalize.css, etc.) to each.key, which can be used elsewhere in the block. The source_hash argument hashes the file, which helps Terraform determine when the file has changed and needs to be re-uploaded to the S3 bucket. (There's a similar etag argument, but it doesn't work when some kinds of S3 encryption are enabled.)

Terraform Apply

With our trusty main.tf file in hand, we can now invoke dark and mysterious powers, conjuring infinite computational power out of nothing! With the merest flourish of our terminal, unfathomable forces precipitate to our whim — we are the tactician, the champion and commander over greater numbers than were ever deployed in any Greek myth!

Ahem... anyway, do the following:

At the end of the output from the apply command, you should see the website endpoint:

Content Types, MIME Types, Oh My

Let's visit that URL in a browser and...

That's not what we expected. It turns out that S3 assigns a content type of binary/octet-stream to uploaded files by default. When visiting the website endpoint URL (which serves the index.html file), the browser sees that Content-Type: binary/octet-stream header and thinks "This is a binary file, so I'll prompt the user to download it".

We would prefer the browser to treat our HTML files as HTML, the CSS files as CSS, and so on. For that, we need the browser to receive the correct MIME type (e.g. text/html, text/css, image/png) in the Content-Type header. The easiest way to do that is to specify the correct content type when uploading files. To determine the correct type of our files, there are 2 approaches.

Determining MIME Types with a CLI Tool

The first approach is to use a command-line tool like file, xdg-mime or mimetype. These tools use different approaches:

file uses "magic tests" (looking for identifying bits at a small fixed offset into the file) to determine the type of files
xdg-mime and mimetype match against the file extension first, falling back to using file if the file doesn't have an extension

The below shell session demonstrates basic usage of each command (a dollar sign is used to distinguish input commands from output results):

A subtle detail in the above is that file may not label text files very precisely — it outputs the CSS file as text/plain instead of text/css because there's no magic test or consistent file header that can identify CSS files (nor the many other variations of text file types).

To determine MIME types with a CLI tool in our Terraform file, we'll add three pieces:

An external data source which, for each file to be uploaded, will call...
An external script that calls a CLI tool (e.g. mimetype) to determine the file's MIME type
The content_type argument of the aws_s3_bucket_object resource to assign the MIME type for each uploaded file

The external data source is a new block in main.tf as follows (I've turned the file list into a local value, because we're using it in multiple places now):

locals {
  website_files = fileset(var.website_root, "**")
}

data "external" "get_mime" {
  for_each = local.website_files
  program  = ["bash", "./get_mime.sh"]

The data source calls bash ./get_mime.sh once for each file, passing the filepath as JSON to stdin. Using the example from the Terraform docs, we can implement the bash script to grab the JSON filepath from stdin, run mimetype on the file, and export the result as a JSON object on stdout.

#!/bin/bash

# Exit if any of the intermediate steps fail
set -e

# Extract "filepath" from the input JSON into FILEPATH shell variable.
eval "$(jq -r '@sh "FILEPATH=\(

And finally in main.tf, we associate the correct MIME type from the bash script with the file when uploading to S3

resource "aws_s3_bucket_object" "file" {
  for_each = local.website_files

  bucket       = aws_s3_bucket.my_static_website.id
  key          = each.key
  source       = "${var.website_root}/${each.key}"
  source_hash  = filemd5("${var.website_root}/${each.key}")
  acl          = "public-read"
  # added:
  content_type = data.external.get_mime[each.key]

Determining MIME Types with a File Extension Map

The second approach to determining correct MIME types for our files is to simply provide a map of file extensions to MIME types. I first ran into this approach (for uploading files with Terraform) in this article on the StateFarm engineering blog, but it's a common approach in general:

The hashicorp/dir/template Terraform module has a mapping of extensions and MIME types
- Sidenote: An open Terraform issue requesting native MIME type detection directs users to use this Terraform module.
The AWS CLI uses the python mimetypes module, which has a built-in mapping as a fallback if it can't read a mapping from the system (at /etc/mime.types)
In non-desktop environments, the xdg-mime tool falls back to using the mimetype tool, which checks file extensions before performing magic tests (for the most part)

To use this approach, we add a mime.json file that maps file extensions to MIME types for whatever files we need to upload. It could be as simple as the below:

And we load that file as a local variable in Terraform and use it when looking up the content type:

locals {
  website_files = fileset(var.website_root, "**")

  mime_types = jsondecode(file("mime.json"))
}

resource "aws_s3_bucket_object" "file" {
  for_each = local.website_files

  bucket       = aws_s3_bucket.my_static_website.id
  key          = each.key
  source       = "${var.website_root}/${each.key}"
  source_hash  = filemd5("${var.website_root}/${each.key}")
  acl          = "public-read"
  content_type = lookup(local.mime_types, regex("\\.[^.]

This mapping-based approach has the advantages of being simple and more cross-platform than shelling out to CLI tools. The downside is that you need to make sure all filetypes you're using exist in the extension-to-MIME mapping and are correct.

Fixing a Stale CloudFront Cache

Now we have a working static website that we can visit in our browser! If you don't care about SSL or caching for some reason, you could stop here. But, I would argue that an important part of modern websites is making them secure and fast, so you'll likely want to put a CloudFront distribution in front of your S3 bucket. There are many other tutorials (such as all the ones linked at the top of this article) that cover CloudFront, so I won't dig into the details of that. However, I do want to dig into a problem that you run into when serving a static website via CloudFront: a stale cache.

By default, CloudFront applies a TTL of 86400 seconds (1 day), meaning CloudFront will fetch website files from your S3 bucket and serve the same files to visitors for a full day before re-fetching from S3. If you update website content (e.g. change CSS styles or javascript behavior) in S3, visitors may continue receiving cached versions from CloudFront and won't see your updates for up to a whole day! We'd prefer visitors to see the latest version of all website content, but we'd also like CloudFront to cache files as long as possible, so files can be served faster (directly from cache).

Cache Busting

One solution is cache-busting, which involves adding a hash (or "fingerprint") to non-HTML files' names. If the files' content changes, then the hash changes, so the browser downloads a completely different file (which can be cached forever).

I tried to implement this with Terraform, but uh... Terraform isn't meant for this sort of thing. Between the Terraform filemd5 and regex functions, you can get close, but I hit a wall when trying to replace filenames with their hashed version in all files. This could maybe work if you used template variables (e.g. <link href="${main.css}"> instead of <link ref="main.css">), but then you can no longer browse your website via the filesystem or a local server. Alas, here dies my ill-advised dream of making a Terraform-based static-site generator/bundler.

Fun fact: the melting face emoji was recently approved!

Cache Invalidation

The other solution to a stale CloudFront cache is invalidating files. This approach does not fit into Terraform's declarative paradigm — there are no resources for invalidations in the AWS provider and no third-party modules either. So, it requires more hacky-ness, in the form of a null_resource that triggers based on changes in file hashes and shells out to the AWS CLI to create a new invalidation. That approach might look something like the below:

The null resource is a new provider, so you'll need to run terraform init again.

What About Browser Caching?

We've talked about CloudFront caching, but there's another cache in between your content and your visitor: the browser. The browser cache and the Cache-Control header are a big topic all on their own; Harry Roberts's Cache-Control for Civilians is a great resource if you want to learn more.

For the purpose of this article, it's important to note that you shouldn't set an aggressive cache control header (e.g. Cache-Control: public, max-age=604800, immutable) on your website files without fingerprinting them. Otherwise, visitors' browsers will keep serving a file from their local cache for the max-age duration (one week, in the above example) before they send a request to CloudFront to check if the file is stale. CloudFront invalidations force CloudFront to fetch fresh content, but have no impact on the caching of visitors' browsers.

That's all for this adventure — thanks for joining me in pushing Terraform out of its comfort zone! If you have any suggestions or corrections, please let me know or send us a tweet, and if you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

Corrections

2022-06-13: Thanks to Antoine Bolvy on Twitter for catching a couple typos (locals.file_hashes -> local.file_hashes and website_content_filepath -> website_root)!

❮

Sensors 101: HDR Cameras

❮

Sensors 101: HDR Cameras

Calibration Statistics: Accuracy Vs Precision

❯

Calibration Statistics: Accuracy Vs Precision

❯

Can't Wait to Calibrate?

MetriCal delivers industry-leading calibration technology for your cameras and sensors, ensuring optimal performance in even the most demanding environments.

Plans and Pricing

Learn More

Contact Us

Call in the experts to solve your hardest perception problems. Grow your product and scale quickly.

Note: Tangram Robotics needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time.

Contact Us

Call in the experts to solve your hardest perception problems. Grow your product and scale quickly.

Note: Tangram Robotics needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time.

Contact Us

Call in the experts to solve your hardest perception problems. Grow your product and scale quickly.

Note: Tangram Robotics needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time.

Tangram Newsletter

Subscribe to our newsletter and keep up with latest calibration insights and Tangram Vision news.

Tangram Newsletter

Subscribe to our newsletter and keep up with latest calibration insights and Tangram Vision news.

Tangram Newsletter

Subscribe to our newsletter and keep up with latest calibration insights and Tangram Vision news.

Abusing Terraform to Upload Static Websites to S3

Greg Schaberg

,

Staff Infrastructure and Web

The Boilerplate

AWS Credentials

Uploading Files to S3 with Terraform

Terraform Apply

Content Types, MIME Types, Oh My

Determining MIME Types with a CLI Tool

Determining MIME Types with a File Extension Map

Fixing a Stale CloudFront Cache

Cache Busting

Fun fact: the melting face emoji was recently approved!

Cache Invalidation

What About Browser Caching?

Corrections

Can't Wait to Calibrate?

Contact Us

First Name (required)

Last Name (required)

Email (required)

Message

Contact Us

First Name (required)

Last Name (required)

Email (required)

Message

Contact Us

First Name (required)

Last Name (required)

Email (required)

Message

Tangram Newsletter

PRODUCT

COMPANY

RESOURCES

Copyright 2025, Tangram Robotics Inc.

Tangram Newsletter

PRODUCT

COMPANY

RESOURCES

Copyright 2025, Tangram Robotics Inc.

Tangram Newsletter

PRODUCT

COMPANY

RESOURCES

Copyright 2025, Tangram Robotics Inc.