Forem: Tangram Vision

Abusing Terraform to Upload Static Websites to S3

Greg Schafer — Wed, 06 Oct 2021 18:27:29 +0000

S3 has been a great option for hosting static websites for a long time, but it's still a pain to set up by hand. You need to traverse dozens of pages in the AWS Console to create and manage users, buckets, certificates, a CDN, and about a hundred different configuration options. If you do this repeatedly, it gets old fast. We can automate the process with Terraform, a well-known "infrastructure as code" tool, which lets us declare resources (e.g. servers, storage buckets, users, policies, DNS records) and let Terraform figure out how to build and connect them.

Terraform can create the infrastructure needed for a static website on AWS (e.g. users, bucket, CDN, DNS), and it can create and update the content (e.g. webpages, CSS/JS files, images), which goes outside the infrastructure part of "infrastructure as code" and is why I'm labeling it as an abuse or misuse of Terraform. Still, it works and has a few benefits:

You can define the bucket, properties, DNS, CDN, etc. in the same place as your content
You have a fully-automated process for standing up websites that only requires a single tool, Terraform

... and a few downsides:

Uploading files is slow compared to something like the AWS CLI's sync command
Terraform isn't meant for transforming or managing content, so you may outgrow Terraform's capabilities if you want advanced features or optimization

This article will breeze over the infrastructure parts of creating a static website on AWS and focus more on how to upload content and manage content metadata (MIME types and caching behavior). If you want to learn more about the infrastructure parts (e.g. setting up CloudFront, an SSL certificate, DNS routes), there are many great tutorials out there. Here are a few:

Let's get on to the code! If you want just the code, you can find it here: https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload

The Boilerplate

We need some boilerplate to set up infrastructure before we can upload files to an S3 bucket. So, let's create a bucket with Terraform and the AWS provider. We'll configure the provider and create the bucket in a main.tf file containing the following:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.60.0"
    }
  }
}

provider "aws" {
  # This should match the profile name in the credentials file described below
  profile = "aws_admin"
  # Choose the region where you want the S3 bucket to be hosted
  region  = "us-west-1"
}

# To avoid repeatedly specifying the path, we'll declare it as a variable
variable "website_root" {
  type        = string
  description = "Path to the root of website content"
  default     = "../content"
}

resource "aws_s3_bucket" "my_static_website" {
  bucket = "blog-example-m9wtv64y"
  acl    = "private"

  website {
    index_document = "index.html"
  }
}

# To print the bucket's website URL after creation
output "website_endpoint" {
  value = aws_s3_bucket.my_static_website.website_endpoint
}

AWS Credentials

To create or interact with AWS resources, we need to provide credentials. The AWS Terraform provider accepts authentication in a variety of ways, but I'm going to use a credential file. That file is located at ~/.aws/credentials and looks like:

[aws_admin]
aws_access_key_id = AKIA...
aws_secret_access_key = ...

If you don't have credentials handy, you can follow AWS documentation to create a new user with a policy that grants S3 permissions.

Uploading Files to S3 with Terraform

Here's where we start using Terraform... creatively, i.e. for managing content instead of just infrastructure. For the content, I've created a basic multi-page website — a couple HTML files, a CSS file, and a couple images. By using Terraform's fileset function and the AWS provider's s3_bucket_object resource, we can collect all the files in a directory and upload all of them to objects in S3:

# in main.tf, below the aforementioned boilerplate
resource "aws_s3_bucket_object" "file" {
  for_each = fileset(var.website_root, "**")

  bucket      = aws_s3_bucket.my_static_website.id
  key         = each.key
  source      = "${var.website_root}/${each.key}"
  source_hash = filemd5("${var.website_root}/${each.key}")
  acl         = "public-read"
}

The for_each meta-argument loops over all files in the website directory tree, binding the file path (index.html, assets/normalize.css, etc.) to each.key, which can be used elsewhere in the block. The source_hash argument hashes the file, which helps Terraform determine when the file has changed and needs to be re-uploaded to the S3 bucket. (There's a similar etag argument, but it doesn't work when some kinds of S3 encryption are enabled.)

Terraform Apply

With our trusty main.tf file in hand, we can now invoke dark and mysterious powers, conjuring infinite computational power out of nothing! With the merest flourish of our terminal, unfathomable forces precipitate to our whim — we are the tactician, the champion and commander over greater numbers than were ever deployed in any Greek myth!

Ahem... anyway, do the following:

# Initialize terraform in the current directory and download the AWS provider
terraform init
# Preview what changes will be made
terraform plan
# Make the changes (create and populate the S3 bucket)
terraform apply

At the end of the output from the apply command, you should see the website endpoint:

...
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

Outputs:

website_endpoint = "blog-example-m9wtv64y.s3-website-us-west-1.amazonaws.com"

Content Types, MIME Types, Oh My

Let's visit that URL in a browser and...

That's not what we expected. It turns out that S3 assigns a content type of binary/octet-stream to uploaded files by default. When visiting the website endpoint URL (which serves the index.html file), the browser sees that Content-Type: binary/octet-stream header and thinks "This is a binary file, so I'll prompt the user to download it".

We would prefer the browser to treat our HTML files as HTML, the CSS files as CSS, and so on. For that, we need the browser to receive the correct MIME type (e.g. text/html, text/css, image/png) in the Content-Type header. The easiest way to do that is to specify the correct content type when uploading files. To determine the correct type of our files, there are 2 approaches.

Determining MIME Types with a CLI Tool

The first approach is to use a command-line tool like file, xdg-mime or mimetype. These tools use different approaches:

file uses "magic tests" (looking for identifying bits at a small fixed offset into the file) to determine the type of files
xdg-mime and mimetype match against the file extension first, falling back to using file if the file doesn't have an extension

The below shell session demonstrates basic usage of each command (a dollar sign is used to distinguish input commands from output results):

# Demo of file
$ file --brief --mime-type index.html
text/html
$ file --brief --mime-type assets/normalize.css
text/plain

# Demo of xdg-mime
$ xdg-mime query filetype index.html
text/html
$ xdg-mime query filetype assets/normalize.css
text/css

# Demo of mimetype
$ mimetype --brief index.html
text/html
$ mimetype --brief assets/normalize.css
text/css

A subtle detail in the above is that file may not label text files very precisely — it outputs the CSS file as text/plain instead of text/css because there's no magic test or consistent file header that can identify CSS files (nor the many other variations of text file types).

To determine MIME types with a CLI tool in our Terraform file, we'll add three pieces:

An external data source which, for each file to be uploaded, will call...
An external script that calls a CLI tool (e.g. mimetype) to determine the file's MIME type
The content_type argument of the aws_s3_bucket_object resource to assign the MIME type for each uploaded file

The external data source is a new block in main.tf as follows (I've turned the file list into a local value, because we're using it in multiple places now):

locals {
  website_files = fileset(var.website_root, "**")
}

data "external" "get_mime" {
  for_each = local.website_files
  program  = ["bash", "./get_mime.sh"]
  query = {
    filepath : "${var.website_content_filepath}/${each.key}"
  }
}

The data source calls bash ./get_mime.sh once for each file, passing the filepath as JSON to stdin. Using the example from the Terraform docs, we can implement the bash script to grab the JSON filepath from stdin, run mimetype on the file, and export the result as a JSON object on stdout.

#!/bin/bash

# Exit if any of the intermediate steps fail
set -e

# Extract "filepath" from the input JSON into FILEPATH shell variable.
eval "$(jq -r '@sh "FILEPATH=\(.filepath)"')"

# Run mimetype on filepath to get the correct mime type.
MIME=$(mimetype --brief $FILEPATH)

# Safely produce a JSON object containing the result value.
jq -n --arg mime "$MIME" '{"mime":$mime}'

And finally in main.tf, we associate the correct MIME type from the bash script with the file when uploading to S3

resource "aws_s3_bucket_object" "file" {
  for_each = local.website_files

  bucket       = aws_s3_bucket.my_static_website.id
  key          = each.key
  source       = "${var.website_root}/${each.key}"
  source_hash  = filemd5("${var.website_root}/${each.key}")
  acl          = "public-read"
  # added:
  content_type = data.external.get_mime[each.key].result.mime
}

Determining MIME Types with a File Extension Map

The second approach to determining correct MIME types for our files is to simply provide a map of file extensions to MIME types. I first ran into this approach (for uploading files with Terraform) in this article on the StateFarm engineering blog, but it's a common approach in general:

The hashicorp/dir/template Terraform module has a mapping of extensions and MIME types
- Sidenote: An open Terraform issue requesting native MIME type detection directs users to use this Terraform module.
The AWS CLI uses the python mimetypes module, which has a built-in mapping as a fallback if it can't read a mapping from the system (at /etc/mime.types)
In non-desktop environments, the xdg-mime tool falls back to using the mimetype tool, which checks file extensions before performing magic tests (for the most part)

To use this approach, we add a mime.json file that maps file extensions to MIME types for whatever files we need to upload. It could be as simple as the below:

{
    ".html": "text/html",
    ".css": "text/css",
    ".png": "image/png"
}

And we load that file as a local variable in Terraform and use it when looking up the content type:

locals {
  website_files = fileset(var.website_root, "**")

  mime_types = jsondecode(file("mime.json"))
}

resource "aws_s3_bucket_object" "file" {
  for_each = local.website_files

  bucket       = aws_s3_bucket.my_static_website.id
  key          = each.key
  source       = "${var.website_root}/${each.key}"
  source_hash  = filemd5("${var.website_root}/${each.key}")
  acl          = "public-read"
  content_type = lookup(local.mime_types, regex("\\.[^.]+$", each.key), null)
}

This mapping-based approach has the advantages of being simple and more cross-platform than shelling out to CLI tools. The downside is that you need to make sure all filetypes you're using exist in the extension-to-MIME mapping and are correct.

Fixing a Stale CloudFront Cache

Now we have a working static website that we can visit in our browser! If you don't care about SSL or caching for some reason, you could stop here. But, I would argue that an important part of modern websites is making them secure and fast, so you'll likely want to put a CloudFront distribution in front of your S3 bucket. There are many other tutorials (such as all the ones linked at the top of this article) that cover CloudFront, so I won't dig into the details of that. However, I do want to dig into a problem that you run into when serving a static website via CloudFront: a stale cache.

By default, CloudFront applies a TTL of 86400 seconds (1 day), meaning CloudFront will fetch website files from your S3 bucket and serve the same files to visitors for a full day before re-fetching from S3. If you update website content (e.g. change CSS styles or javascript behavior) in S3, visitors may continue receiving cached versions from CloudFront and won't see your updates for up to a whole day! We'd prefer visitors to see the latest version of all website content, but we'd also like CloudFront to cache files as long as possible, so files can be served faster (directly from cache).

Cache Busting

One solution is cache-busting, which involves adding a hash (or "fingerprint") to non-HTML files' names. If the files' content changes, then the hash changes, so the browser downloads a completely different file (which can be cached forever).

I tried to implement this with Terraform, but uh... Terraform isn't meant for this sort of thing. Between the Terraform filemd5 and regex functions, you can get close, but I hit a wall when trying to replace filenames with their hashed version in all files. This could maybe work if you used template variables (e.g. <link href="${main.css}"> instead of <link ref="main.css">), but then you can no longer browse your website via the filesystem or a local server. Alas, here dies my ill-advised dream of making a Terraform-based static-site generator/bundler.

Fun fact: the melting face emoji was recently approved!

Cache Invalidation

The other solution to a stale CloudFront cache is invalidating files. This approach does not fit into Terraform's declarative paradigm — there are no resources for invalidations in the AWS provider and no third-party modules either. So, it requires more hacky-ness, in the form of a null_resource that triggers based on changes in file hashes and shells out to the AWS CLI to create a new invalidation. That approach might look something like the below:

locals {
  website_files = fileset(var.website_root, "**")

  file_hashes = {
    for filename in local.website_files :
    filename => filemd5("${var.website_root}/${filename}")
  }
}

resource "null_resource" "invalidate_cache" {
  triggers = locals.file_hashes

  provisioner "local-exec" {
    command = "aws --profile=aws_admin cloudfront create-invalidation --distribution-id=${aws_cloudfront_distribution.my_distribution.id} --paths=/*"
  }
}

The null resource is a new provider, so you'll need to run terraform init again.

What About Browser Caching?

We've talked about CloudFront caching, but there's another cache in between your content and your visitor: the browser. The browser cache and the Cache-Control header are a big topic all on their own; Harry Roberts's Cache-Control for Civilians is a great resource if you want to learn more.

For the purpose of this article, it's important to note that you shouldn't set an aggressive cache control header (e.g. Cache-Control: public, max-age=604800, immutable) on your website files without fingerprinting them. Otherwise, visitors' browsers will keep serving a file from their local cache for the max-age duration (one week, in the above example) before they send a request to CloudFront to check if the file is stale. CloudFront invalidations force CloudFront to fetch fresh content, but have no impact on the caching of visitors' browsers.

That's all for this adventure — thanks for joining me in pushing Terraform out of its comfort zone! If you have any suggestions or corrections, please let me know or send us a tweet, and if you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

Creating PostgreSQL Test Data with SQL, PL/pgSQL, and Python

Greg Schafer — Fri, 30 Apr 2021 21:18:30 +0000

After exploring various ways to load test data into PostgreSQL for my last blog post, I wanted to dive into different approaches for generating test data for PostgreSQL. Generating test data, rather than using static manually-created data, can be valuable for a few reasons:

Writing the logic for generating test data forces you to take a second look at your data model and consider what values are allowed and which values are edge cases.
Tools for generating test data make it easier to set up data per test. I would argue this is better than the alternatives of (a) hand-creating data per test or (b) trying to maintain a single dataset that is used across the entire test suite. The first option is tedious, and the second option can be brittle. As an example, if you're testing an e-commerce website and your test suite uses hard-coded product details and deactivating the product in your test dataset causes many tests to unexpectedly fail, then those tests were reliant on a pre-condition that happened to be satisfied in your test dataset. Generating data per test can make such pre-conditions more explicit and clear, especially for colleagues who inherit your tests and test data in the future.
Unless you already have a large dataset from a production environment or a partner company that you can use (hopefully after anonymization!), generating test data is the only way to get large datasets for benchmarking and load testing.

Similar to the previous article, if you're using an Object-Relational Mapping (ORM) library, then you'll probably create and persist objects into the database using the ORM or use the ORM to dump and restore test data fixtures using JSON or CSV. If you're not using an ORM, the approaches in this article may provide some learning or inspiration for how you can best generate data for your particular testing situation.

Follow Along with Docker

Similar to the previous article, you can follow along using Docker and the scripts in a subfolder of our Tangram Vision blog repo: https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.30_GeneratingTestDataInPostgreSQL

Unlike the previous article, I've provided a Dockerfile to add Python into the Postgres Docker image so we can run Python inside the PostgreSQL database. As described in the repo's README, you can build the docker image and run examples with:

docker build . --tag=postgres-test-data-blogpost

# The base postgres image requires a password to be set, but we'll just be
# testing locally, so no need to set a strong password.
docker run --name=postgres --rm --env=POSTGRES_PASSWORD=foo \
    --volume=$(pwd)/schema.sql:/docker-entrypoint-initdb.d/schema.sql \
    --volume=$(pwd):/repo \
    postgres-test-data-blogpost -c log_statement=all

The repo contains a variety of files that start with add-data- which demonstrate different ways of loading and generating test data. After the Postgres Docker container is running, you can run add-data- files in a new terminal window with a command like:

docker exec --workdir=/repo postgres \
    psql --host=localhost --username=postgres \
         --file=add-data-insert-random.sql

If you want to interactively poke around the database with psql, use:

docker exec --interactive --tty postgres \
    psql --host=localhost --username=postgres

Sample Schema

For example code and data, I'll use the following simple schema again:

Musical artists have a name
An artist can have many albums (one-to-many), which have a title and release date
Genres have a name
Albums can belong to many genres (many-to-many)

Sample schema relating musical artists, albums, and genres.

Generating Data

Using static datasets has advantages (you know exactly what data is in your database), but they can be tedious to maintain over time and impractical to create if you need a lot of data (e.g. for benchmarking or load testing). Generating data is an alternative approach which lets you define how data should look in one place and then generate and use as much data as you like.

There are a few different tools for generating test data that are worth exploring, from plain ol' SQL to higher-level programming languages like Python.

SQL

If you're like me, you may have started this article not expecting SQL to be capable of generating test data. With [generate_series](https://www.postgresql.org/docs/current/functions-srf.html) and [random](https://www.postgresql.org/docs/current/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE) and a little creativity, however, SQL is well-equipped to generate a variety of data.

To create 5 artists with 8 random hex characters for their names, you can do the following:

INSERT INTO artists (name)
SELECT substr(md5(random()::text), 1, 8) FROM generate_series(1, 5) as _g;

If you want to use random words instead of random hex characters, you can pick words from the system dictionary. I've copied Ubuntu's american-english word list to /usr/share/dict/words in the Docker image, so we just need to load it and pick a word randomly:

-- Temporary tables are only accessible to the current psql session and are
-- dropped at the end of the session.
CREATE TEMPORARY TABLE words (word TEXT);

-- The WHERE clauses excludes possessive words (almost 30k of them!)
COPY words (word) FROM '/usr/share/dict/words' WHERE word NOT LIKE '%''%';

-- Randomly order the table and pick the first result
SELECT * FROM words ORDER BY random() LIMIT 1;

No joke, the first word that the above query returned for me was "bravo". I don't know whether to be encouraged or creeped out.

On a separate note, the dictionary contains words that may be offensive and inappropriate in some settings. If you're pulling test data from the dictionary and don't want these words to pop up in your next demo to customers/bosses, make sure to take appropriate precautions!

Anyway, moving on... using these tools (and a few more), we can generate interesting test data for all of our tables. Comments in the code below explain extra functions and techniques being used.

-- Excerpt from add-data-insert-random.sql in the sample code repo

-- Use 8 random hex chars as the genre name.
INSERT INTO genres (name)
SELECT substr(md5(random()::text), 1, 8) FROM generate_series(1, 5) AS _g;

INSERT INTO artists (name)
SELECT
  -- Pick one random word as the artist name.
  (SELECT * FROM words ORDER BY random() LIMIT 1)
FROM generate_series(1, 4) AS _g;

INSERT INTO albums (artist_id, title, released)
SELECT
  -- Select a random artist from the artists table.
  -- NOTE: random() is only evaluated once in this subquery unless it depends on
  -- the outer query, hence the "_g*0" after random().
  (SELECT id FROM artists ORDER BY random()+_g*0 LIMIT 1),

  -- Select the first 1-3 rows after randomly sorting the word list, then join
  -- them with spaces between each word and capitalize the first letter of each
  -- word.
  initcap(array_to_string(array(
    SELECT * FROM words ORDER BY random()+_g*0 LIMIT ceil(random() * 3)
  ), ' ')),

  -- Subtract between 0-5 years from today as the album release date.
  (now() - '5 years'::interval * random())::date
FROM generate_series(1, 8) AS _g;

-- Assign a random album a random genre. Repeat 10 times.
INSERT INTO album_genres (album_id, genre_id)
SELECT
  (SELECT id FROM albums ORDER BY random()+_g*0 LIMIT 1),
  (SELECT id FROM genres ORDER BY random()+_g*0 LIMIT 1)
FROM generate_series(1, 10) AS _g
-- If we insert a row that already exists, do nothing (don't raise an error)
ON CONFLICT DO NOTHING;

But that's not all! We can define functions in SQL to reuse logic — if we want genres, artist names, and album titles to all be random words, then we can move random-word-picking into a function and use it in many places:

-- Excerpt from add-data-insert-random-function.sql in the sample code repo
CREATE OR REPLACE FUNCTION generate_random_title(num_words int default 1) RETURNS text AS $$
  SELECT initcap(array_to_string(array(
    SELECT * FROM words ORDER BY random() LIMIT num_words
  ), ' '))
$$ LANGUAGE sql;

INSERT INTO genres (name)
SELECT generate_random_title()
FROM generate_series(1, 5) AS _g;

INSERT INTO artists (name)
-- Generate 1-2 random words as the artist name.
SELECT generate_random_title(ceil(random() * 2 + _g * 0)::int)
FROM generate_series(1, 4) AS _g;

-- ...

PL/pgSQL

If the declarative style of SQL is awkward/difficult, we can turn to PL/pgSQL to generate test data in PostgreSQL using a more procedural/imperative programming style. PL/pgSQL provides familiar programming concepts like variables, conditionals, loops, return statements, and exception handling.

To demonstrate some of what PL/pgSQL can do, let's specify some more requirements for our generated data — roughly half of our artists should have names starting with "DJ" and all albums by DJ artists should belong to an "Electronic" genre. That implementation might look like:

-- Excerpt from add-data-plpgsql-insert.sql in the sample code repo
DO $$
DECLARE
  -- Declare (and optionally assign) variables used in the below code block.
  genre_options text[] := array['Hip Hop', 'Jazz', 'Rock', 'Electronic'];
  artist_name text;
  dj_album RECORD;
BEGIN
  -- Convert each array option into a row and insert them into genres table.
  INSERT INTO genres (name) SELECT unnest(genre_options);

  FOR i IN 1..8 LOOP
    SELECT generate_random_title(ceil(random() * 2)::int) INTO artist_name;
    -- About 50% of the time, add 'DJ ' to the front of the artist's name.
    IF random() > 0.5 THEN
      artist_name = 'DJ ' || artist_name;
    END IF;
    INSERT INTO artists (name)
    SELECT artist_name;
  END LOOP;

  -- ...

  -- Ensure all albums by a 'DJ' artist belong to the Electronic genre.
  FOR dj_album IN
    SELECT albums.* FROM albums
    INNER JOIN artists ON albums.artist_id = artists.id
    WHERE artists.name LIKE 'DJ %'
  LOOP
    RAISE NOTICE 'Ensuring DJ album % belongs to Electronic genre!', quote_literal(dj_album.title);
    INSERT INTO album_genres (album_id, genre_id)
    SELECT dj_album.id, (SELECT id FROM genres WHERE name = 'Electronic')
    -- If we insert a row that already exists, do nothing (don't raise an error)
    ON CONFLICT DO NOTHING;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

As you can see in the above code snippet, PL/pgSQL lets us:

Test conditions with IF statements (which can have ELSIF and ELSE blocks or alternately be represented with CASE statements),
Loop over a range of integers with FOR i IN 1..8 LOOP (which can loop in reverse or with a step),
Loop over rows from a query, as in the FOR dj_album IN ... example above,
Print helpful log statements with RAISE,
and do all the above in a performant way, because the client can send the whole code block to the server to execute, rather than serializing and sending each statement to the server one at a time as it would with raw SQL.

There's much more to learn about PL/pgSQL than I can cover here in a reasonable amount of space, but hopefully the above provides some insight into its capabilities to help you decide what tool makes sense for you!

Using Python

PL/pgSQL isn't the only procedural language available with PostgreSQL, it also supports Python! The Python procedural language, plpython3u for Python 3, is "untrusted" (hence the u at the end of the name), meaning you must be a superuser to create functions, and Python code can access and do anything that a superuser could. Luckily, we're generating test data in non-production environments, so Python is an acceptable option despite these security concerns.

To use plpython3u, we need to install python3 and postgresql-plpython3-$PG_MAJOR system packages and create the extension in the SQL script with the command below. I've already taken these steps for the Docker image and plpython script in the sample code repo.

CREATE EXTENSION IF NOT EXISTS plpython3u;

The main difference to be aware of when using Python in PostgreSQL is that all database access happens via the plpy module that is automatically imported in plpython3u blocks. The following example should help clarify some basics of using plpython3u and the plpy module:

-- Excerpt from add-data-plpython-intro.sql in the sample code repo
DO $$
    print("Print statements don't appear anywhere!")

    # Manually convert value to string, quote it, and interpolate
    artist_name = plpy.quote_nullable("DJ Okawari")
    returned = plpy.execute(f"INSERT INTO artists (name) VALUES ({artist_name})")
    plpy.info(returned)  # Outputs the next line
    # INFO:  <PLyResult status=7 nrows=1 rows=[]>

    # Let PostgreSQL parameterize the query
    artist_name = "Ella Fitzgerald"
    plan = plpy.prepare("INSERT INTO artists (name) VALUES ($1) RETURNING *", ["text"])
    returned = plan.execute(plan, [artist_name])
    plpy.info(returned)  # Outputs the next line
    # INFO:  <PLyResult status=11 nrows=1 rows=[{'artist_id': 2, 'name': 'Ella Fitzgerald'}]>

    returned = plpy.execute("SELECT * FROM artists")
    plpy.info(returned)  # Outputs the next line
    # INFO:  <PLyResult status=5 nrows=2 rows=[{'artist_id': 1, 'name': 'DJ Okawari'}, {'artist_id': 2, 'name': 'Ella Fitzgerald'}]>
$$ LANGUAGE plpython3u;

Here are the most important insights from the above code:

You can't print out debugging information with the Python print statement, you need to use logging methods available in the plpy module (such as info, warning, error).
The [plpy.execute function](https://www.postgresql.org/docs/12/plpython-database.html) can execute a simple string as a query. If you're interpolating variables into the query, you are responsible for converting the variable value into a string and properly quoting it.
Alternately, use plan = plpy.prepare then plan.execute to prepare and execute a query, which allows you to leave data conversion and quoting up to PostgreSQL. As a bonus, you can save plans so the database only has to parse the query string and formulate an execution plan once.
The return value of plpy.execute can tell you the status of the query, how many rows were inserted or returned, and the rows themselves.

Now that we have an understanding of how to use Python in PostgreSQL, let's apply it to generating test data for our sample schema. While we could translate the previous section's PL/pgSQL code to Python with very few changes, doing so wouldn't capitalize on the biggest advantage of using Python — the plethora of standard and third-party libraries available.

The Faker Package

Faker is a Python package that provides many helpers for generating fake data. You can generate realistic-looking first and last names, addresses, emails, URLs, job titles, company names, and much more. Faker also supports generating random words and sentences, and generating random data across many different data types (numbers, strings, dates, JSON, and more). Using Faker is straightforward:

-- Excerpt from add-data-plpython-faker.sql in the sample code repo
DO $$
    from random import randint, choice
    from faker import Faker

    fake = Faker()

    for _ in range(6):
        plan = plpy.prepare("INSERT INTO artists (name) VALUES ($1)", ["text"])
        plan.execute([fake.name()])

    # Alternately, we could add "RETURNING artist_id" to the above query and
    # save those values to avoid making this extra query for all artist_ids
    artist_ids = [row["artist_id"] for row in plpy.execute("SELECT artist_id FROM artists")]
    for _ in range(10):
        title = " ".join(word.title() for word in fake.words(nb=randint(1, 3)))
        plan = plpy.prepare(
            "INSERT INTO albums (artist_id, title, released) VALUES ($1, $2, $3)",
            ["int", "text", "date"],
        )
        plan.execute([choice(artist_ids), title, fake.date()])

    # ...
$$ LANGUAGE plpython3u;

The dataclasses Module

If you prefer to create Python objects to represent rows from your different tables, you could use a variety of different packages, such as attrs, factory_boy, or the built-in module dataclasses. These packages allow you to declare a field per table column and associate data types and factories for generating test data.

Please note that if you go very far down this path of representing rows as Python objects, you will find yourself re-creating a lot of ORM functionality. In that case, you should probably just use an ORM!

Here's an example of how you could use the dataclasses module to generate test data for our sample schema:

-- Excerpt from add-data-plpython-dataclasses.sql in the sample code repo
DO $$
    from dataclasses import dataclass, field
    import datetime
    from random import randint, choice
    from typing import List, Any, Type, TypeVar

    from faker import Faker

    T = TypeVar("T", bound="DataGeneratorBase")
    fake = Faker()

    # This is a useful base class for tracking instances so we can use them in
    # relationships (picking a random artist or genre to foreign key to).
    class DataGeneratorBase:
        def __new__(cls: Type[T], *args: Any, **kwargs: Any) -> T:
            "Track class instances in a list on the class"
            instance = super().__new__(cls, *args, **kwargs)  # type: ignore
            if "instances" not in cls.__dict__:
                cls.instances = []
            cls.instances.append(instance)
            return instance

    @dataclass
    class Genre(DataGeneratorBase):
        genre_id: int = field(init=False)
        name: str = field(default_factory=fake.street_name)

    @dataclass
    class Artist(DataGeneratorBase):
        artist_id: int = field(init=False)
        name: str = field(default_factory=fake.name)

    @dataclass
    class Album(DataGeneratorBase):
        album_id: int = field(init=False)
        artist: Artist = field(default_factory=lambda: choice(Artist.instances))
        title: str = field(
            default_factory=lambda: " ".join(
                word.title() for word in fake.words(nb=randint(1, 3))
            )
        )
        released: datetime.date = field(default_factory=fake.date)
        genres: List[Genre] = field(
            # Use Faker to pick a list of genres to avoid duplicates
            default_factory=lambda: fake.random_elements(Genre.instances, length=randint(0, 3), unique=True)
        )

    for _ in range(6):
        g = Genre()
        # "RETURNING id" lets us get the database-generated and store it on the
        # Python object for later reference without needing to issue additional
        # queries.
        plan = plpy.prepare(
            "INSERT INTO genres (name) VALUES ($1) RETURNING genre_id", ["text"]
        )
        g.genre_id = plan.execute([g.name])[0]["genre_id"]
    for _ in range(6):
        artist = Artist()
        plan = plpy.prepare(
            "INSERT INTO artists (name) VALUES ($1) RETURNING artist_id", ["text"]
        )
        artist.artist_id = plan.execute([artist.name])[0]["artist_id"]
    for _ in range(8):
        album = Album()
        plan = plpy.prepare(
            "INSERT INTO albums (artist_id, title, released) VALUES ($1, $2, $3) RETURNING album_id",
            ["int", "text", "date"],
        )
        album.album_id = plan.execute(
            [album.artist.artist_id, album.title, album.released]
        )[0]["album_id"]

        # Insert album_genres rows
        for g in album.genres:
            plan = plpy.prepare(
                "INSERT INTO album_genres (album_id, genre_id) VALUES ($1, $2)",
                ["int", "int"],
            )
            plan.execute([album.album_id, g.genre_id])
$$ LANGUAGE plpython3u;

The above snippet defines classes for each main table in our example schema: Genre, Artist, and Album. Then, it defines fields for each column along with a default_factory function that tells Python (or the Faker package, in many cases) how to generate suitable test data. I made the Album class the "owner" of the many-to-many relationship with Genres, so when an Album is created, it automatically picks 0-3 existing Genres to associate itself with during initialization.

The second half of the code passes the Python objects into SQL INSERT queries, returning the primary key IDs (which weren't generated during object creation, due to the init=False field argument) so they can be saved on the objects and used later when setting foreign keys. This highlights a difficulty with doing this sort of object-relational mapping yourself — you have to figure out dependencies between your types of data and enforce an ordering (in Python and SQL) so that you have database-created IDs at the right times. This can be a bit tedious and messy, especially if you have circular dependencies or self-referencing relationships in your tables.

Importing External .py Files

If your data model or data-generation code start to get complex, it can be annoying to have a lot of Python code in SQL files — your IDE won't want to lint, type-check, and auto-format your Python code! Luckily, you can keep your Python code in external .py files that you import and execute from inside a plpython3u block, using the technique shown below:


-- Excerpt from add-data-plpython-external-pyfile.sql in the sample code repo
DO $$
    import importlib.util

    # The second argument is the filepath on the server (inside the container)
    spec = importlib.util.spec_from_file_location("add_test_data", "/repo/add_test_data.py")
    add_test_data = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(add_test_data)
    add_test_data.main(plpy)
$$ LANGUAGE plpython3u;

The add_test_data.py file can look the exact same as the body of the plpython3u block from the previous example, but you'll need to wrap the bottom half (which uses plpy to run queries) in a function that accepts plpy as an argument, so it looks like:

# Excerpt from add_test_data.py in the sample code repo

# ...
def main(plpy: Any) -> None:
    for _ in range(6):
        g = Genre()
    # ...

Other (Trusted) Ways to Use Python

I want to briefly touch on two ways of using Python outside of PostgreSQL — running Python externally may be preferable if you want or need to avoid the untrusted nature of plpython3u. These approaches let you maintain your Python code completely independent of the database, which may be beneficial for reusability and maintainability.

You could use Python scripts to generate test data into CSV files and then load those into PostgreSQL with the COPY command. With this approach, however, you will likely end up with a multi-step process to generate and load test data. If you invoke a Python script (which outputs CSV) within the SQL COPY command, then you can't populate multiple tables with a single command. If you use multiple SQL COPY commands, it becomes convoluted to reference IDs across tables (foreign keys) across multiple Python script executions. The remaining reasonable approach is a multi-step one: run a Python script that saves multiple CSV files to disk (one per database table) and then run an SQL COPY command per CSV file to load the data.
You could run Python scripts that connect to PostgreSQL via a client library such as psycopg2. The psycopg2 package is used by many ORMs, such as the Django ORM and SQLAlchemy, but it doesn't impose any restrictions on how you handle your data — it just provides a Python interface for connecting to PostgreSQL, sending SQL commands, and receiving results.

Thank you for joining me on this exploration of loading test data (in the previous blog post) and generating test data for PostgreSQL! We tried out a variety of approaches and got some hands-on experience with code — I hope this helps you understand how to use these different approaches, weigh their tradeoffs, and choose which approach makes the most sense for your team and project.

If you have any suggestions or corrections, please let me know or send us a tweet, and if you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

Loading Test Data into PostgreSQL

Greg Schafer — Wed, 28 Apr 2021 22:47:27 +0000

Most web apps/services that use a relational database are built around a web framework and an Object-Relational Mapping (ORM) library, which typically have conventions that prescribe how to create and load test fixtures/data into the database for testing. If you're building a webapp without an ORM [1], the story for how to create and load test data is less clear. What tools and approaches are available, and which work best? There are a lot of articles around the internet that describe specific techniques or example code in isolation, but few that provide a broader survey of the many different approaches that are possible. I hope this article will help fill that gap, exploring and discussing different approaches for creating and loading test data in PostgreSQL.

[1] Wait a minute, why would you build a webapp without an ORM?! This question could spawn an entire article of its own and in fact, many other articles have debated about ORMs for the last couple decades. I won't dive into that debate — it's up to the creator to decide if a project should use an ORM or not, and that decision depends on a lot of project-specific factors, such as the expertise of the creator and their team, the types and velocity of data involved, the performance and scaling requirements, and much more.

If you're interested in generating test data instead of (or in addition to) loading test data, please check out the follow-up article that explores generating test data for PostgreSQL using SQL, PL/pgSQL, and Python!

Follow Along with Docker

Want to follow along? I've collected sample data and scripts in a subfolder of our Tangram Vision blog repo: https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.28_LoadingTestDataIntoPostgreSQL

As described in the repo's README, you can run examples using the official Postgres Docker image with:

# The base postgres image requires a password to be set, but we'll just be
# testing locally, so no need to set a strong password.
docker run --name=postgres --rm --env=POSTGRES_PASSWORD=foo \
    --volume=$(pwd)/schema.sql:/docker-entrypoint-initdb.d/schema.sql \
    --volume=$(pwd):/repo
    postgres:latest -c log_statement=all

To explain this Docker command a bit:

The base postgres image requires a password to be set (via the POSTGRES_PASSWORD environment variable), but we'll just be testing locally, so no need to set a strong password.
Executable scripts (*.sh and *.sql files) in the /docker-entrypoint-initdb.d folder inside the container will be executed as PostgreSQL starts up. The above command mounts schema.sql into that folder, so the database tables will be created.
The repo is also mounted to /repo inside the container, so example SQL and CSV files are accessible.
The PostgreSQL server is started with the log_statement=all config override, which increases the logging verbosity.

docker exec --workdir=/repo postgres \
    psql --host=localhost --username=postgres \
         --file=add-data-sql-copy-csv.sql

If you want to interactively poke around the database with psql, use:

docker exec --interactive --tty postgres \
    psql --host=localhost --username=postgres

Sample Schema

For example code and data, I'll use the following simple schema:

Musical artists have a name
An artist can have many albums (one-to-many), which have a title and release date
Genres have a name
Albums can belong to many genres (many-to-many)

Sample schema relating musical artists, albums, and genres.

Loading Static Data

The simplest way to get test data into PostgreSQL is to make a static dataset, which you can save as CSV files or embed in SQL files directly.

SQL COPY from CSV Files

In the code repo accompanying this blogpost, there are 4 small CSV files, one for each table of the sample schema. The CSV files contain headers and data rows as shown in the image below.

A small, static sample dataset of musical artists, albums, and genres.

We can import the data from these CSV files into a PostgreSQL database with the SQL COPY command:

-- Excerpt from add-data-copy-csv.sql in the sample code repo
COPY artists FROM '/repo/artists.csv' CSV HEADER;
COPY albums FROM '/repo/albums.csv' CSV HEADER;
COPY genres FROM '/repo/genres.csv' CSV HEADER;
COPY album_genres FROM '/repo/album_genres.csv' CSV HEADER

The COPY command has a variety of options for controlling quoting, delimiters, escape characters, and more. You can even limit which rows are imported with a WHERE clause. One potential downside is you must run it as a database superuser or as a user with permissions to read and write and execute files on the server — this isn't a concern when loading data for local testing, but keep it in mind if you ever want to use it in a more restrictive or production-like environment.

Psql Copy from CSV Files

The PostgreSQL interactive terminal (called psql) provides a copy command that is very similar to SQL COPY:

-- Excerpt from add-data-copy-csv.psql in the sample code repo
\copy artists from 'artists.csv' csv header
\copy albums from 'albums.csv' csv header
\copy genres from 'genres.csv' csv header
\copy album_genres from 'album_genres.csv' csv header

There are some important differences between SQL COPY and psql copy:

Like other psql commands, the psql version of the copy command starts with a backslash (\) and doesn't need to end with a semicolon (;).
SQL COPY runs in the server environment whereas psql copy runs in the client environment. To clarify, the filepath you provide to SQL COPY should point to a file on the server's filesystem. The filepath you provide to psql copy points to a file on the filesystem where you're running the psql client. If you're following along using the Docker image and commands provided in this blogpost, the server and client are the same container, but if you ever want to load data from your local machine to a database on a remote server, then you'll want to use psql copy.
As a corollary to the above, psql copy is less performant than SQL COPY, because all the data must travel from the client to the server, rather than being directly loaded by the server.
SQL COPY requires absolute filepaths, but psql can handle relative filepaths.
Psql copy runs with the privileges of the user you're connecting to the server as, so it doesn't require superuser or local file read/write/execute permissions like SQL COPY does.

Putting Data in SQL Directly

As an alternative to storing data in separate CSV files (which are loaded with SQL or psql commands), you can store data in SQL files directly.

SQL COPY from stdin and pg_dump

The SQL COPY and psql copy commands can load data from stdin instead of a file. They will parse and load all the lines between the copy command and \. as rows of data.

-- Excerpt from add-data-copy-stdin.sql in the sample code repo
COPY public.artists (artist_id, name) FROM stdin CSV;
1,"DJ Okawari"
2,"Steely Dan"
3,"Missy Elliott"
4,"TWRP"
5,"Donald Fagen"
6,"La Luz"
7,"Ella Fitzgerald"
\.

COPY public.albums (album_id, artist_id, title, released) FROM stdin CSV;
1,1,"Mirror",2009-06-24
2,2,"Pretzel Logic",1974-02-20
3,3,"Under Construction",2002-11-12
4,4,"Return to Wherever",2019-07-11
5,5,"The Nightfly",1982-10-01
6,6,"It's Alive",2013-10-15
7,7,"Pure Ella",1994-02-15
\.

...

In fact, this COPY ... FROM stdin approach is how [pg_dump](https://www.postgresql.org/docs/current/app-pgdump.html) outputs data if you're creating a dump or backup from an existing PostgreSQL database. However, pg_dump uses a tab-separated format by default, rather than the comma-separated format shown above.

By default, pg_dump also outputs SQL to re-create everything about the database (tables, constraints, views, functions, reset sequences, etc.), but you can instruct it to output only data with the --data-only flag. To try out pg_dump with the example Docker image, run:

docker exec --workdir=/repo postgres \
    pg_dump --host=localhost --username=postgres postgres

SQL INSERTs

Another way to put data directly in SQL is to use INSERT statements. This approach could look like the following:

-- Excerpt from add-data-insert-static-ids.sql in the sample code repo
INSERT INTO artists (artist_id, name)
OVERRIDING SYSTEM VALUE
VALUES
  (1, 'DJ Okawari'),
  (2, 'Steely Dan'),
  (3, 'Missy Elliott'),
  (4, 'TWRP'),
  (5, 'Donald Fagen'),
  (6, 'La Luz'),
  (7, 'Ella Fitzgerald');

INSERT INTO albums (album_id, artist_id, title, released)
OVERRIDING SYSTEM VALUE
VALUES
  (1, 1, 'Mirror', '2009-06-24'),
  (2, 2, 'Pretzel Logic', '1974-02-20'),
  (3, 3, 'Under Construction', '2002-11-12'),
  (4, 4, 'Return to Wherever', '2019-07-11'),
  (5, 5, 'The Nightfly', '1982-10-01'),
  (6, 6, 'It''s Alive', '2013-10-15'),
  (7, 7, 'Pure Ella', '1994-02-15');

...

The OVERRIDING SYSTEM VALUE clause lets us INSERT values into the primary key ID columns explicitly even though they are defined as GENERATED ALWAYS.

The pg_dump command's --column-inserts option will output data as INSERT statements (a separate statement per row), rather than as the default TSV format. Using INSERTs instead of COPY will run much slower when restoring the data, so this is only recommended if you're restoring the data to a database that doesn't support COPY, such as sqlite3. Using INSERTs can be sped up somewhat with the --rows-per-insert option, allowing you to INSERT many rows at a time per command, reducing the overhead of back-and-forth communication between client and server for every SQL statement.

Using INSERT statements, we could start moving away from statically declaring everything about our datasets — we could omit the primary key ID columns and lookup IDs as needed when inserting foreign keys, as in the following example:

-- Excerpt from add-data-insert-queried-ids.sql in the sample code repo
INSERT INTO artists (name)
VALUES
  ('DJ Okawari'),
  ('Steely Dan'),
  ('Missy Elliott'),
  ('TWRP'),
  ('Donald Fagen'),
  ('La Luz'),
  ('Ella Fitzgerald');

INSERT INTO albums (artist_id, title, released)
VALUES
  ((SELECT id FROM artists WHERE name = 'DJ Okawari'), 'Mirror', '2009-06-24'),
  ((SELECT id FROM artists WHERE name = 'Steely Dan'), 'Pretzel Logic', '1974-02-20'),
  ((SELECT id FROM artists WHERE name = 'Missy Elliott'), 'Under Construction', '2002-11-12'),
  ((SELECT id FROM artists WHERE name = 'TWRP'), 'Return to Wherever', '2019-07-11'),
  ((SELECT id FROM artists WHERE name = 'Donald Fagen'), 'The Nightfly', '1982-10-01'),
  ((SELECT id FROM artists WHERE name = 'La Luz'), 'It''s Alive', '2013-10-15'),
  ((SELECT id FROM artists WHERE name = 'Ella Fitzgerald'), 'Pure Ella', '1994-02-15');

...

This is hardly convenient, though, because we need to duplicate other row information (such as the artist name) in order to look up the corresponding ID. It gets even more complex if multiple artists have the same name! So, if you have a static dataset I'd suggest sticking to one of the previously mentioned approaches that use SQL COPY or psql copy.

Putting Data in CSVs vs in SQL Files

Is there a reason to prefer putting static datasets in CSVs or directly in SQL files? My thoughts boil down to the following points:

CSVs are a widely understood and supported format (just make sure to be clear and consistent with encoding!). If your datasets will be maintained or created by people who prefer spreadsheet programs to database-admin and command-line tools, CSVs may be preferable.
If you want to keep all your test data and database setup in one place, SQL files are a convenient way to do that.
If your testing or continuous integration processes use pg_dump or its output, then you're already using datasets embedded in an SQL file — keep doing what makes sense for you!

I hope you learned something new and useful about the different approaches and tools available for loading static datasets into PostgreSQL. If you're looking to learn more check out the follow-up article about generating test data for PostgreSQL!

If you have any suggestions or corrections, please let me know or send us a tweet, and if you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

Cover Photo by Susan Q Yin on Unsplash

Why Rust for Robots?

Adam Rodnitzky — Fri, 09 Apr 2021 18:21:37 +0000

Robotics largely runs on C++. Or at least it does for now. Here at Tangram Vision, we believe that there is better language for robotics, and that language is Rust.

C++ for Robotics

For robotic platforms where the intention is to deploy commercially and at scale, C++ has emerged as the standard over the last few decades, and there are a few key reasons why. For one, the level of ubiquity has become a self-reinforcing mechanism, with an ecosystem of libraries, tools and engineers that almost exclusively work in C++. In fact, ROS (the main client library of the most popular framework used by most roboticists) is primarily written in C++. Similarly, the popular OpenCV computer vision library is also written in and accessed with C++.

However, ubiquity of tools and engineers alone doesn't explain the popularity of C++ for robotics. There is a more fundamental reason: most robotics platforms must meet acceptable performance thresholds under known resource constraints. C++ is well suited for these embedded applications because of how "close to the metal" the language can get.

This "close to the metal" advantage also makes C++ a potentially huge hazard. Testing, architecture, and memory management for C++ isn't consistent across well-used dependencies; yet, these dependencies might capture and manipulate essential low-level resources. This means it becomes very easy to inadvertently build in a critical bug without realizing it... that is, until you're well into production. That critical bug won't be discovered until the product has been deployed in the real world, where an edge case scenario that slipped past QA triggers a memory leak, or system crash, or [your favorite disaster scenario here].

Unless you resemble this Venn diagram, there's a good chance that you'll inadvertently program in some level of trouble as you build out your robotics codebase in C++.

This is where Rust starts to make sense.

Enter Rust for Robotics

Rust is a relatively new language for robotics, but there's a rapidly growing set of projects and libraries that provide key frameworks for robotics development. Why the change?

For starters, the biggest benefit of building with Rust is memory safety and management. You have to try very hard to create a memory leak or race condition in Rust. Common gotchas like null pointers and data races are blocked altogether and won't compile. Likewise, the memory management approach in Rust is to use the stack to keep track of the program, and then use pointer references aimed at heaps where larger data structures are contained.

To access a data structure, ownership has to be established, thereby preventing multiple variables from accessing or modifying data structures simultaneously. Efficient? Yes. Safe? Also, yes. The best part: developers maintain that "close to the metal" access that they would normally go to C++ for. This makes Rust a highly-efficient, extremely safe language that also allows low-level access, something well-suited to the world of robotics where resource constraints and code safety are critical.

Rust is an obvious choice for robotics, but the transition to platforms written in Rust will take time. Rust itself is just over a decade old, whereas C++ has been around for nearly four decades. That's a lot of inertia to work against, but the Rust community is moving fast.

Resources for Rust in Robotics

There's a small but growing community of companies and developers (Tangram Vision included!) that are taking some of robotics' most commonly used libraries and tools and making them Rust compatible, as well as developing new tools to ease the development path for creating a Rust-programmed robot.

Here are a few of our favorites that cover some of the critical areas of robotics development. We've chosen to highlight resources that are still actively maintained within the last year.

Frameworks

OpenRR: An open-source Rust robotics platform

ROS

rosrust: A pure Rust implementation of the ROS client library
ros2-rust: Bindings, a code generator and code examples for ROS2
rustros_tf: A Rust port of the ROS tf library for keeping track of three dimensional transforms
Optimization Engine: Embedded optimization for robots and autonomous systems

Computer Vision

realsense-rust: High-level bindings for using Intel RealSense depth cameras (disclosure: Tangram Vision maintains this library!)
opencv-ros-camera: An OpenCV-compatible geometric model of a camera
adskalman: Kalman filter smoothing
cam-geom: Geometric models of cameras
bayes_estimate: A Bayesian estimation library

Collision Detection

openrr-planner: Path planning with collision avoidance

Controls

stepper: A universal stepper motor driver and controller interface for Rust

Simulation

nphysics: A 2D and 3D physics engine that can be used for robot simulation

Mathematics

Nalgebra: It's linear algebra...for Rust
petgraph: Graph data structure library, compatible with Rust

If there are other Rust resources that you've used for a robotics product that worked well, let us know! We plan on keeping this list curated and updated over time.

Should you build your Robot in Rust?

We think the answer will increasingly be yes. The fundamental benefits of the language already make it a great fit for the needs of roboticists and robots. The growing set of libraries and resources make it easier than ever to get started with foundations — and that includes the perception sensor tools and APIs that we are building here at Tangram Vision. Finally, more and more engineers are adopting Rust as a language of choice, particularly for embedded scenarios like robots. So, yes, the future is robots. And it will also be robots, built with Rust.

Making Great Docs with Rustdoc

Brandon Minor — Tue, 16 Mar 2021 00:26:01 +0000

At Tangram Vision, One of the things we've come to love about Rust is the tight integration of the tooling and ecosystem. Rustdoc, the official documentation system, is no exception to this; it's simple to use, creates beautiful pages, and makes documentation a joy.

We take documentation seriously. Documentation is for many the first entry point into the code, and good documentation is a primary driver of code adoption (along with functionality). This left us wondering: what are the best practices for making quality Rust documentation via Rustdoc? Through the course of our research, we found a couple of key resources that the community has provided:

The rustdoc book: A great place to start if you're looking to learn how to write documentation in your crates.
RFC 505: API Comment Conventions and RFC 1574: More API Documentation Conventions: Explanations on how the core Rust team documents the standard library and language interfaces.

However, while these are useful resources, we felt that a more approachable guide to crate documentation would be helpful to those starting out with their own crates. Through the course of this article, we'll find that Rust's crate+module structure naturally facilitates great documentation creation; we simply play to Rust's strengths.

Note: this guide is about achieving good documentation quality. It does not lay out any technical or formatting guidelines. For instance, we internally wrap our documentation at 120 characters a line... but a 120-character line limit won't do much to improve bad docs.

We assume a bit of knowledge with rustdocs below. If you are unfamiliar with the program, we encourage you to read the rustdoc book linked above as a primer.

Goals of Documentation

Here at Tangram Vision, we structure our documentation around two ideas:

What: Explaining what was built
How: Explaining how to use it

Anything beyond this is generally not helpful, for one singular reason: our documentation assumes that our users know Rust. We believe this assumption is critical to creating good documentation around the What and How. The docs should rarely discuss why a decision was made, unless that decision goes against intuition; instead, the docs only explain how to capitalize on that decision in one's own code.

This guideline of User Knowledge naturally leads us to another rule: Keep It Short. The aim is to be succinct by telling the user the What and the How in as little time as possible. This often means using simple, active language and avoiding technical jargon. If a longer discussion is needed, make sure to put the What and the How first, and the discussion after.

Both the What and the How can be seen at all levels of documentation in Rust. We'll see how best to organize that in a large repository below while maintaining consistent style and language.

Documentation Across a Crate

There's a natural top-down pattern to follow for Rust documentation. The top level lib.rs or main.rs is the first things users see, so it's the perfect spot to introduce the big What ideas. As the documentation gets more into the weeds, from modules to types to functions, documentation shifts more to the How. We'll see this play out as we discuss the different levels below.

We have added relevant links to our own documentation in the realsense-rust crate maintained by Tangram Vision OSS. Check these out for added context.

Crate

Crate-level example here

The crate documentation in lib.rs or main.rs should describe the purpose of the crate (the big What) alongside instructions and examples for getting started (the big How). This is a place for big ideas, since the details will come later in the lower levels. Any counter-intuitive design decisions or common gotchas should also be documented here, but remember: the What and the How always come first, and discussion follows.

Notice how the first sections are

Summary
Features
Usage

...all explaining the big What and How. The documentation goes into more detail afterwards, with headings like "Architecture & Guiding Principles", "Prefer Rust-native types to types used through the FFI", etc. However, these sections are there only to explain the non-intuitive design decisions that go behind creating an FFI interface like this one. For those that don't care, the What and How are presented first, front and center.

Modules

Module-level example here

Modules should contain more direct documentation than the crate itself. The focus again should be on describing the types found in that module and how they interact with the rest of the crate. From this alone, users should understand the Why, i.e. why they would reach for a module from your crate.

This can get a bit trickier with sub-modules. Avoid deeply nested sub-modules, since they complicate the structure of a crate. Modules with sub-modules more than two layers deep can probably be flattened out. Exceptions exist, but if this layering is needed, it makes sense to add a Why discussion to explain what made this necessary.

Types

Type-level example here

Types are our primary way of defining abstraction and ontological separations in our code. Documentation here should focus on:

Construction of the type (when and how, if users are allowed to construct their own).
Destruction / dropping the type → what happens when you drop a value of a given type? If you don't implement the Drop trait, then this is probably OK to ignore.
Performance characteristics of the type, if it is a data structure.

As one can see, the documentation naturally de-emphasizes the What and builds on the How as we go down. Again, counter-intuitive or non-obvious cases might have a Why, but the What and the How together should suffice.

Functions

Function-level example here

The last thing to document is functions and associated functions (functions in impl blocks). This could include constructors, mutable functions, or data returned by accessors on a type or types. Semantic and code examples are especially welcome here because they describe the How in practical terms.

Common Sections Across a Crate

Crate and module level documentation can be broken down into multiple sections with heading levels 1 (# in markdown), 2 (##), and 3 (###). As you move towards documenting types and functions, aim to be as flat as possible; only use heading level 1 if a section is needed.

These headings are ordered below (in our humble opinion) according to their usefulness for the user in conveying the What and the How for a Rust crate. It's important to note that not all headings need to be present at all times. If it doesn't make sense, don't add it, since it just increases the cognitive burden on the user.

`# Examples`

Examples are, by far, the easiest and most concise way to convey How. They are welcome at all levels: module, type, function, or crate. Write examples for both common use cases and corner cases, like with Error or None results.

Examples in rustdoc documentation will compile and run as doc-tests. This is an important point: all code in the documentation will actually compile and run! This means they automatically provide users with starting point for understanding. This is one of Rustdoc's greatest strengths, and it should be utilized whenever possible.

a frame of reference for the HowMake these examples functional code whenever possible. If this is not possible, e.g. with an example explaining improper usage, tha frame of reference for the Howen use the heading text or ignore next to the code block:

/// # Examples
///
/// '''ignore
/// let bummer = this_code_wont_work(); // but it's an illustrative example.
/// '''

Notice the heading is "Examples", plural. Be consistent with the plurality here. Even if there is only one example, the consistency helps with searching.

`# Errors`

At their best, Errors help a user understand why a certain action is prevented by the crate and how to respond to it. It's a rare instance where explaining the Why is not just encouraged, but necessary for proper use.

At a function level, this section is only needed if that function returns a Result type. In this case, it describes the type returned and when the error should be expected, e.g. an invalid internal state, Foreign Function Interface (FFI) interactions, bad arguments, etc.

Make errors actionable by either passing them to a higher level function or allowing a reaction from the caller. It is easier for users to understand how they got there in the first place by writing error types with reaction in mind. Moreover, if a caller can't act on an error, then there's not a strong reason to present it in the first place.

`# Safety`

First and foremost: Try to minimize unsafe interfaces where possible. Rust is a language built on memory safety, and violating this tenet should only be done with conscious intention. The Safety section should convey that intention.

When documenting Safety, be explicit about what is "unsafe" about the interface and explain best practices for its use with safe code. If possible, try to specify undefined behavior if an unsafe interface will leave your program in an undefined state.

`# Panics`

Use this section if a function can panic!. Complete coverage would aim to document any and all calls to .unwrap(), debug_assert!, etc.

Complete coverage is a good goal to aim for, but realize that a panic! call isn't always necessary. Many cases can be guarded against with small code changes. Returning Result types can avoid this entirely in exchange for better error handling. Low-level FFI calls can unwrap and panic! if passed a null pointer; yet this can be prevented if you start with NonNull<T> as the input, making an unwrap() call superfluous.

In any case, you should aim to have all error cases implemented whenever possible. If there is a case that can cause a panic!, list it in the docs.

`# Lifetimes`

Include a Lifetimes section if a type or function has special lifetime considerations that need to be taken into consideration by the user. Most of the time, a lifetime itself doesn't need describing; again, always assume users know Rust. Rather, this section should explain why the lifetime has been modeled in a certain way.

Rule of thumb: If you only have one lifetime (explicit or implicit), it probably doesn't need documentation.

/// Wrapper type for an underlying C-FFI pointer
/// 
/// # Lifetimes
///
/// The underlying pointer is generated from some C-FFI.
///
/// Adding a lifetime that only references phantom data may seem strange
/// and artificial. However, enforcing this lifetime is useful because
/// the C API may have undefined behavior or odd semantics outside of
/// whatever "assumed" lifetime the library writers intended. We make
/// this explicit in Rust with the hope of preventing mistakes in using
/// this API. 
/// 
pub struct SomeType<'a> {
    /// Pointer to data from a C-FFI
    ///
    /// We need to store this as NonNull because its use in the C API
    /// is covariant.
    pointer: std::ptr::NonNull<std::os::raw::c_void>,
    /// Phantom to annotate this type with an explicit lifetime.
    ///
    /// See type documentation for why this is done.
    _phantom: std::marker::PhantomData<&'a ()>,
}

The above example shows one instance in where a (single) strange lifetime is applied to enforce Rust's lifetime rules on a type. The lifetime seems superfluous, but may exist because there is some implicit assumption in C that is being made more explicit here.

`# Arguments`

Avoid listing arguments explicitly. Instead, names and types themselves should adequately describe arguments and their relationships.

That being said, an # Arguments section can make sense if there is a non-obvious assumption about the arguments or their types that needs to be made explicit, like if passing in certain values for a type invokes undefined behaviour. In such a case, an # Arguments or # Generic Arguments section is useful to clarify that care is needed when passing in data.

For generic arguments that may require trait bounds, "document" these by adding where clauses to your function or type. This is much more descriptive and useful, and has the added benefit of your compiler doing some of the work to validate these bounds for you.

Take a look at the standard library for examples of good argument names and types.

A Few Last Details

The below points deserve to be noted, but don't necessarily fit into any specific framework.

Document Traits, Not Trait Implementations

Traits are trickier than regular types and associated functions. Since a trait can be applied to multiple types (even those outside of its crate), don't document implementations of a trait. Instead, document the trait itself.

Good

/// A trait providing methods for adding different types of numbers to a type.
pub trait NumberAdder {
    /// Adds an integer to the type
    fn add_integer(&mut self, number: i32);

    /// Adds a float to the type
    fn add_float(&mut self, number: f32);
}

Bad

impl NumberAdder for Foo {
    /// Adds an integer to Foo
    fn add_integer(&mut self, number: i32) {
        // ...
    }

    /// Adds a float to Foo
    fn add_float(&mut self, number: f32) {
        // ...
    }
}

Traits are used primarily in generic code, so users will look at the trait itself to understand the interfaces. Any two types implementing the same trait should share common functionality; if two implementations of a trait require vastly differing documentation, the trait itself may not be modeled correctly.

References Should Always Be Links

Whenever a reference is made, whether it be to another part of the documentation, a website, a paper, or another crate: add a link to it. This applies even when referencing a common type found in the standard library (e.g. String, Vec, etc.).

Linking to other parts of the documentation avoids repeating information. It is also an easy way to point to higher level architecture decisions that might affect the lower level documentation.

rustdoc handles documentation links natively. Example:

/// See [`String`](std::string::String) documentation for more details.

Examples and Resources

Open-Source Documentation Examples

We've worked hard at Tangram Vision to follow our own guidelines and create world-class Rust documentation. You can read the fruits of our labor by visiting any of the repositories at our Open-Source software group. The RealSense Rust package that we maintain is some of our most complete documentation, and acts a good starting point.

Templates for Types and Functions

Most of these examples come in the form:

/// Summary line -> what is this
///
/// Longer description of what is returned, or semantics regarding the type.
/// ...
///
/// # Examples
///
/// '''
/// <some-rust-code>
/// '''

Types

/// Type for describing errors that result from trying to set an option  
/// on a sensor.
#[derive(Debug)]
pub enum SetOptionError {
    /// The option is not supported on the sensor.
    OptionNotSupported,
    /// The option is supported on the sensor but is immutable.
    OptionIsImmutable,
    /// Setting the option failed due to an internal exception.
    ///
    /// See the enclosed string for a reason why the internal exception occurred.
    CouldNotSetOption(String),
}

Functions

/// Sets a `value` for the provided `option` in `self`. 
/// 
/// Returns `Ok(())` on success, otherwise returns an error.
///
/// # Errors
/// 
/// Returns [`OptionNotSupported`](SetOptionError::OptionNotSupported) if the
/// option is not supported on this sensor.
///
/// Returns [`OptionIsImmutable`](SetOptionError::OptionIsImmutable) if the
/// option is supported but is immutable.
///
/// Returns [`CouldNotSetOption`](SetOptionError::CouldNotSetOption) if the
/// option could not be set due to an internal exception.
///
/// # Examples
/// 
/// '''
/// let option = SomeOption::Foo;
/// let value = 100.5;
/// 
/// match sensor.set_option(option, value) {
///     Ok(()) => {
///         println!("Success!");
///     }
///     Err(SetOptionError::OptionNotSupported) => {
///         println!("This option isn't supported, try another one!");
///     }
///     Err(SetOptionError::OptionIsImmutable) => {
///         println!("This option is immutable, we can't set it!");
///     }
///     _ => {
///         panic!();
///     }
/// }
/// '''
pub fn set_option(
    &self,
    option: SomeOption,
    value: f32,
) -> Result<(), SetOptionError> {
    // implementation here
    unimplemented!();
}

Bad Documentation

/// Get the value associated with the provided Rs2Option for the sensor.
///
/// # Arguments
///
/// - `option` - The option key that we want the associated value of.
///
/// # Returns
///
/// An f32 value corresponding to that option within the librealsense2 library, 
/// or None if the option is not supported.
///
pub fn get_option(&self, option: Rs2Option) -> Option<f32>;

Why is this bad?

The Arguments section is superfluous, since the names and types of the arguments make their use self-evident. See the Arguments section above.
The # Returns section isn't needed at all. First off, "Returns" should not be a header category; this information can be more concisely expressed in the function summary. Secondly, the return type that is there (Option<f32>) makes the possible return values clear already to the user.

A more correct way to write this example would be:

/// Get the value associated with the provided `option` for the sensor,
/// or `None` if no such value exists.
pub fn get_option(&self, option: Rs2Option) -> Option<f32>;

Cover photo credit: Photo by Beatriz Pérez Moya on Unsplash

Exploring Ansible via Setting Up a WireGuard VPN

Greg Schafer — Thu, 04 Mar 2021 17:41:20 +0000

Photo by Thomas Jensen on Unsplash

In my previous blogpost, we set up a WireGuard VPN server and client and learned about various configuration options for WireGuard, how to improve VPN server uptime, how to relay traffic, and more. Setting up a server and client like that is a lot of work! If the server dies or you want to set up a new server (maybe for a friend or family member this time), you have to go back to the walk-through and follow all the steps, remembering if you deviated from those instructions at any point.

There's a better way — automation! If you're only going to do a thing once (e.g. set up a VPN), investing in automation probably doesn't make sense. But if you anticipate doing a thing repeatedly, automating it frees up your time to learn and accomplish more in the future. You can also share your automation, empowering others to build and achieve more, faster.

Automation is the heart of computing, and many different automation tools and approaches have sprung up over time. For our project of automating VPN server setup, we can consider a variety of tools:

Shell scripts
- The simplest approach from a tooling perspective, writing shell scripts would involve running the commands from the previous WireGuard tutorial blogpost, using ssh for the commands that run on the server and rsync to copy configurations files to the server.
SSH scripting libraries like Capistrano or Fabric
- If shell scripting isn't ideal, there are libraries that expose similar scripting functionality in a more ergonomic interface for developers familiar with higher-level languages like Ruby and Python.
Infrastructure/configuration automation tools like Puppet, Chef, or Ansible
- Tools in this category are even more specialized for automating server infrastructure and configuration, often including an ecosystem of packages and plugins to automatically set up or configure nearly anything you can think of.
Infrastructure-as-code tools like Terraform
- Infrastructure-as-code (IaC) tools have a lot of overlap with the above category, but support provisioning cloud resources in a more first-class/native way.
Containers like Docker
- You could also run WireGuard in containers, deploying a server-configured container image to a cloud provider and running a client-configured container image locally to connect to the server. There are a few existing examples of this approach.

For this tutorial, I'm going to focus on the middle category above — infrastructure/configuration automation tools — and specifically, I'll focus on Ansible. There is a great comparison of different tools in this area by Gruntwork and, even though that article favors Terraform, Ansible is still a useful general-purpose tool, especially if you're working with servers that aren't "in the cloud", such as a Raspberry Pi at home.

Let's get started with automating VPN setup with Ansible! By the end of this article, we'll be able to set up a VPN server and client with a single command. Similar to the previous blogpost, I'll use Ubuntu 20.04 and DigitalOcean droplets.

Setting up Ansible

Ansible can be installed via an OS package manager like apt, but I prefer to use pip so I can get the latest updates and avoid cluttering system package management with third-party PPAs (Personal Package Archives). We'll also use pyenv (as suggested by Hypermodern Python) to make sure we're not breaking or cluttering the system Python installation. Install pyenv with the following:



# From https://github.com/pyenv/pyenv/wiki#suggested-build-environment
sudo apt-get update

sudo apt-get install --no-install-recommends make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

curl https://pyenv.run | bash

It's a good habit when a tutorial gives you curl <url> | bash to open up that URL and see what it's going to do. In this case, you'll see that it'll download and execute a shell script on GitHub that will clone 6 repos from GitHub to your ~/.pyenv folder and prompt you to add a few lines to your shell's initialization script.

Follow the output prompt from above, which asks you to put lines like the below in your shell initialization script (e.g. ~/.bashrc if you use the bash shell). Make sure to fill in your own username!



export PATH="/home/YOUR_USERNAME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Install a recent python version:



# List available python versions
pyenv install --list

# Install a specific version
pyenv install 3.9.2

# (Suggested) If you want to always use that version when running `python`
# in your terminal
pyenv global 3.9.2

If you want, you can also create a virtualenv to further isolate the Ansible installation, and make that virtualenv automatically activate when you're in a particular folder/repo. That would look like:



# (Optional)

# Feel free to pick a different virtualenv name than "ansible-tutorial"
pyenv virtualenv 3.9.2 ansible-tutorial

# Create a .python-version file that pyenv will find when your shell is in the 
# same directory (or a sub-directory) and automatically activate the named
# virtualenv
pyenv local ansible-tutorial

Install the ansible pip package, which will install various command-line tools, including ansible-playbook, which we'll use to run a "playbook" of commands that will set up a VPN server and client for us.



pip install ansible

# Confirm installation worked
ansible --version

Get a Server

To use Ansible for a VPN server, we need... a server! Ansible could provision a server from a cloud provider for us (and I'll touch on this briefly later), but we'll keep our playbook hardware-provider-agnostic for now, so you can run it as easily against a cloud server as a Raspberry Pi on your home network. I'm going to create a $5/month DigitalOcean droplet to test against, but you could also use Vagrant (to test against a local VM) or any server you can SSH to.

Testing Ansible playbooks against VMs, rather than a bare-metal machine, comes with an advantage — after you've written the playbook, you can start a new, empty VM and test the whole playbook start to finish to ensure that it works consistently.

Connecting to the Server with Ansible

Once you have your server or VM, take note of its IP address use it to create an inventory.ini file like the below:



[vpn]
vpn_server ansible_host=203.0.113.1 ansible_user=root

An inventory file tells Ansible what servers it can act upon and how to access them. Let's use the above inventory file as an example. When we run Ansible and target the vpn group of servers or the vpn_server host, it will try to connect to the server using a command like:



ssh root@203.0.113.1

So, if you can't SSH to the server, then Ansible won't be able to connect either!

Connecting to the server with an SSH key is strongly recommended! Add your SSH key to your server to connect without needing a password. If you must connect with a password, you can sudo apt install sshpass and then provide your SSH password when using Ansible by adding the --ask-pass flag to all ansible commands.

Let's test to make sure that Ansible can connect to the server:



ansible -i inventory.ini -m ping vpn

This runs the ping Ansible module, targeting the vpn group of servers. You should see "pong" in the output, meaning that Ansible could connect to the server and the server has a Python installation that Ansible can use.

Ansible's Built-in Variables and Facts

There are other useful Ansible modules that we can use with the ansible command:

The setup module fetches system information, also known as "facts", about the server. You can use these facts as variables in Ansible commands and playbooks.
The debug module can evaluate variables, which is useful for... well, debugging!

Try running both of these modules with your server so you can see what facts and information Ansible makes available:



ansible -i inventory.ini -m setup vpn
ansible -i inventory.ini -m debug -a "var=hostvars" vpn

This was one of the most confusing parts for me when learning Ansible — figuring out what all these built-in variables and facts (like groups, inventory_dir, and ansible_distribution) were and how to find them.

Writing an Ansible Playbook

The ansible command lets you run ad-hoc commands across groups of servers. This is powerful, but we probably shouldn't try to automate server setup and configuration in a single ansible command... probably. 🤔 Instead, we can organize multiple tasks in one or multiple YAML files, which we will run with the ansible-playbook command.

Let's write a playbook.yml file In the same folder as inventory.ini. Here are its contents:



---
- name: setup vpn server
  hosts: vpn_server
  tasks:
  - name: ping
    ping:
  - name: show variables and facts
    debug: var=hostvars

If you're not familiar with YAML, the above is equivalent to this JSON structure:



[{'name': 'setup vpn server',
  'hosts': 'vpn_server',
  'tasks': [{'name': 'ping', 'ping': None},
            {'name': 'show variables and facts', 'debug': 'var=hostvars'}]}]

Breaking down the above:

The top-level structure is a "play" in Ansible lexicon. Our play above has a name, a hosts pattern which describes which servers the play will run against, and a list of tasks.
We have 2 tasks, each has a name and the name of an Ansible module that will do something.

Run the playbook...



ansible-playbook -i inventory.ini playbook.yml

... and you'll see that it gathers facts from the server (just like the ansible -m setup command above did), and then runs the "ping" task and the "debug" task to show all the gathered facts and variables defined for vpn_server.

There are tons of built-in Ansible modules, even more curated Ansible community modules, and even more published to Ansible Galaxy (an open repository for Ansible collections and roles).

WireGuard Server Setup

There's much more to learn about Ansible! But let's stop here and apply what we've learned in order to set up a WireGuard server.

Referring to the steps we took in the previous tutorial, we want to:

Install the wireguard system package
Create public and private keys with correct permissions
Create the server's WireGuard configuration file
(Optionally) Enable IP forwarding for relaying traffic
Start the VPN

Managing the Keys

As hinted at in the previous tutorial, if we want to repeatably deploy the VPN server without needing to reconfigure all VPN clients, we need to use the same private key every time.

Put another way: if we generated a private key while deploying the server and used the corresponding public key on various clients, and the server ends up dying, we could deploy it again by generating a new private key. However, all of our VPN clients would then need to update to the new public key to be able to connect to the new VPN server. This would be inconvenient!

Instead, we'll generate the server keys once by hand and use them in the playbook so they're consistent between every deploy. This means we won't include step #2 from above in the Ansible playbook.

Generate the keys with wg genkey and wg pubkey commands. You can output both with the following command:



privkey=$(wg genkey) sh -c 'echo "
    server_privkey: $privkey
    server_pubkey: $(echo $privkey | wg pubkey)"'

Copy the output lines and add them to a new vars mapping under the play in playbook.yml. Here's what mine looks like now (your keys will be different):



---
- name: setup vpn server
  hosts: vpn_server
  vars:
    server_privkey: aBYk1JZyP8ck+FeaTjb3xi94U4Nv8V+gWoTW1hRLQlo=
    server_pubkey: 7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=
  tasks:
  # ...

Encrypting the Private Key

It's a good practice to AVOID having secrets in plaintext (like the VPN private key above). This is especially true if those secrets will be shared with anyone else, like via a git repo. Let's prevent this by using Ansible Vault. Vault is a tool for encrypting secret values and using them in playbooks. Encrypt the private key with:



ansible-vault encrypt_string --ask-vault-password --stdin-name server_privkey

You'll be prompted twice for a Vault encryption password, after which you'll paste your privkey value and hit Ctrl+d twice. If the command completed after a single Ctrl+d, try again and make sure you're not copy-pasting an invisible newline character at the end of the privkey value. Copy the output into your playbook, which will now look like:



---
- name: setup vpn server
  hosts: vpn_server
  vars:
    server_privkey: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          646438636565343063343631326136386239623935393637336539653636386135363
          663386639393232346534643163656363316234306439306566306534610a31326664
          363763663139383034636632343230376365333130333230373866353033326563303
          5636138373830633534373033303536303566663166616539360a3936353033663263
          336662663034376661616631343661333164363134373061343739633637623739306
          465653532383838393662396333623966343165366635353132396332313762343534
          65313761623964653532623839356633343838
    server_pubkey: 7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=
  tasks:
  ...

Make sure to remember your encryption password (and save it in a password manager); you'll need to enter it every time you run the playbook.

Installing and Configuring WireGuard

Next, we'll remove our testing ping and debug tasks and write tasks for steps 1, 3, 4, and 5 from the above list. These steps translate neatly into Ansible tasks in our updated playbook.yml:



---
- name: setup vpn server
  hosts: vpn_server
  vars:
    server_privkey: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          646438636565343063343631326136386239623935393637336539653636386135363
          663386639393232346534643163656363316234306439306566306534610a31326664
          363763663139383034636632343230376365333130333230373866353033326563303
          5636138373830633534373033303536303566663166616539360a3936353033663263
          336662663034376661616631343661333164363134373061343739633637623739306
          465653532383838393662396333623966343165366635353132396332313762343534
          65313761623964653532623839356633343838
    server_pubkey: 7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=
  tasks:
  # https://docs.ansible.com/ansible/latest/collections/ansible/builtin/apt_module.html
  - name: install wireguard package
    apt:
      name: wireguard
      state: present
      update_cache: yes

  # https://docs.ansible.com/ansible/latest/collections/ansible/builtin/copy_module.html
  - name: create server wireguard config
    template:
      dest: /etc/wireguard/wg0.conf
      src: server_wg0.conf.j2
      owner: root
      group: root
      mode: '0600'

  # https://docs.ansible.com/ansible/latest/collections/ansible/posix/sysctl_module.html
  - name: enable and persist ip forwarding
    sysctl:
      name: net.ipv4.ip_forward
      value: "1"
      state: present
      sysctl_set: yes
      reload: yes

  # https://docs.ansible.com/ansible/latest/collections/ansible/builtin/systemd_module.html
  - name: start wireguard and enable on boot
    systemd:
      name: wg-quick@wg0
      enabled: yes
      state: started

Ok ok, yes, this is a bit like drawing an owl.

Source: https://29.media.tumblr.com/tumblr_l7iwzq98rU1qa1c9eo1_500.jpg

...but usually an ansible playbook like the above can be written quickly. I follow a cycle:

Type "ansible module install package" into a search engine
Open the docs.ansible.com result that looks most helpful
Read through available parameters and the (often helpful) examples at the bottom
Copy an example into my playbook and modify parameters as needed
Go back to step 1, searching for the next task (e.g. "ansible module template file")

I've included a comment line linking to the Ansible docs page for each module used in the playbook.yml above, in case you want to read about the parameters.

Testing our First Attempt

Let's test our playbook.



$ ansible-playbook -i inventory.ini --ask-vault-password playbook.yml
Vault password: 

PLAY [setup vpn server] ********************************************************

TASK [Gathering Facts] *********************************************************
ok: [vpn_server]

TASK [install wireguard package] ***********************************************
changed: [vpn_server]

TASK [create server wireguard config] ******************************************
fatal: [vpn_server]: FAILED! => {"changed": false, "msg": "Could not find or access 'server_wg0.conf.j2'\nSearched in: ..."}

PLAY RECAP *********************************************************************
vpn_server                 : ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Oh no! Installing WireGuard was successful, but creating the config failed. Ansible's error messages are usually helpful, and this one indicates that the template file (server_wg0.conf.j2) we're trying to use to create the server's configuration couldn't be found. Let's create it at templates/server_wg0.conf.j2:



# {{ ansible_managed }}
[Interface]
Address = 10.0.1.1/24
ListenPort = 51820
PrivateKey = {{ server_privkey }}

A few notes about the above:

Ansible automatically searches in relative paths like templates/ and files/ when running Ansible modules that have a src parameter. Our template task has a parameter src: server_wg0.conf.j2, so Ansible will search for it in the templates/ folder.
It's convention to suffix template files with .j2, to indicate that the file will be templated with Jinja2.
In Jinja2, values inside double curly braces ({{ variable }}) will be replaced with the value of the variable. In this template, the server_privkey variable will be decrypted and its value inserted into the resulting file in place of {{ server_privkey }}.
The {{ ansible_managed }} text is replaced with the string "Ansible managed". It's a good convention to put this in a comment at the top of templated files, because it signals to anyone reading the file on the server that the file is managed by Ansible — any edits they make could be overwritten when Ansible next runs, so they should find and make edits in the corresponding Ansible playbook and template files instead.

Let's run the test again:



$ ansible-playbook -i inventory.ini --ask-vault-password playbook.yml
Vault password: 

PLAY [setup vpn server] ********************************************************

TASK [Gathering Facts] *********************************************************
ok: [vpn_server]

TASK [install wireguard package] ***********************************************
ok: [vpn_server]

TASK [create server wireguard config] ******************************************
changed: [vpn_server]

TASK [enable and persist ip forwarding] ****************************************
changed: [vpn_server]

TASK [start wireguard and enable on boot] **************************************
changed: [vpn_server]

PLAY RECAP *********************************************************************
vpn_server                 : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

It succeeded! The WireGuard interface is now running on the server.

Notice that the "install wireguard package" step shows ok instead of changed this time. The apt module (and most modules) detect that the server is already in the desired state (the wireguard package was installed last time we ran the playbook, so it satisfies state=present) and perform no actions. The task is idempotent, meaning you can run it repeatedly and the outcome is the same. Idempotent tasks make it easy to see what changed and what didn't each time a playbook is run.

WireGuard Client Setup

Ansible can also operate on the local machine. To set up our local machine as a client, we want to:

Install the wireguard system package
Create public and private keys with correct permissions
Create the client's WireGuard configuration file, which must include the server's public key
Start the VPN

We also need to update the server's configuration file with a [Peer] section including the client's public key, so the client can connect to the server. The client's public key isn't known until after we create it — we could create client keys manually like we did for the server's keys, but then the playbook wouldn't be able to set up multiple clients without having to manually edit the keys for each client.

Acting on Localhost

Because we're targeting a new host (localhost), we need to write a new play in playbook.yml. We can put it above the existing play (which targets vpn_server), so the client's keys are generated before the server config is templated.



---
- name: setup vpn client
  hosts: localhost
  connection: local
  become: yes
  vars:
    # Use system python so apt package is available
    ansible_python_interpreter: "/usr/bin/env python"
  tasks:
    # Coming soon

- name: setup vpn server
  hosts: vpn
  # Rest of server vars/tasks here...

Lots of new things here!

We target the local machine with using [localhost](http://localhost) for the hosts pattern.
We "connect" locally by using the local connection plugin.
The become: yes line indicates that the play will run as root, which we need to be able to install the wireguard package. Ansible will effectively run sudo apt-get install wireguard, rather than just apt-get install wireguard (which would fail). Because of this setting, we'll need to run the playbook with the --ask-become-pass flag. We didn't need this line for the server setup play, because we're already connecting as root via the ansible_user=root connection variable.
With the ansible_python_interpreter var, we tell Ansible to use the system python (which includes the apt python package). Alternatively, we could install that package for our current python 3.9.2 installation. If you get a No such file or directory error, you may need to change the line from python to python3.

Client Setup Tasks and Config

Writing the Ansible tasks for the client-side VPN setup is similar to the server side.



---
- name: setup vpn clients
  hosts: localhost
  connection: local
  become: yes
  vars:
    # Use system python so apt package is available
    ansible_python_interpreter: "/usr/bin/env python"
  tasks:
  - name: install wireguard package
    apt:
      name: wireguard
      state: present
      update_cache: yes

  - name: generate private key
    shell:
      cmd: umask 077 && wg genkey | tee privatekey | wg pubkey > publickey
      chdir: /etc/wireguard
      creates: /etc/wireguard/publickey

  - name: get public key
    command: cat /etc/wireguard/publickey
    register: publickey_contents
    changed_when: False

  # Save pubkey as a fact, so we can use it to template wg0.conf for the server
  - name: set public key fact
    set_fact:
      pubkey: "{{ publickey_contents.stdout }}"

  - name: create client wireguard config
    template:
      dest: /etc/wireguard/wg0.conf
      src: client_wg0.conf.j2
      owner: root
      group: root
      mode: '0600'

- name: setup vpn server
  hosts: vpn_server
  # Rest of server vars/tasks here...

Breaking this down:

Installing the wireguard package should look very familiar!
We generate keys with the shell module so we can use pipes and file redirection. The keys are only generated if the publickey file doesn't already exist, thanks to the creates parameter.
Next, we need to save the public key so we can add it as a [Peer] section in the server config. Normally, we'd use {{ lookup('file', '/etc/wireguard/publickey') }} to look up a value from a file, but the file lookup modules seems not to respect become: yes; it tries to read the file without escalating to root privileges and fails as a result. So, we instead cat the file and save the resulting output as a fact.
Finally, template the client config file. Its contents closely match the previous tutorial's, but we use the ansible_host IP address of the VPN server from inventory.ini to set the server's endpoint.



[Interface]
# The address your computer will use on the VPN
Address = 10.0.0.8/32

# Load your privatekey from file
PostUp = wg set %i private-key /etc/wireguard/privatekey
# Also ping the vpn server to ensure the tunnel is initialized
PostUp = ping -c1 10.0.0.1

[Peer]
# VPN server's wireguard public key
PublicKey = {{ server_pubkey }}

# Public IP address of your VPN server (USE YOURS!)
# Use the floating IP address if you created one for your VPN server
Endpoint = {{ hostvars['vpn_server'].ansible_host }}:51820

# 10.0.0.0/24 is the VPN subnet
AllowedIPs = 10.0.0.0/24

# To also accept and send traffic to a VPC subnet at 10.110.0.0/20
# AllowedIPs = 10.0.0.0/24,10.110.0.0/20

# To accept traffic from and send traffic to any IP address through the VPN
# AllowedIPs = 0.0.0.0/0

# To keep a connection open from the server to this client
# (Use if you're behind a NAT, e.g. on a home network, and
# want peers to be able to connect to you.)
# PersistentKeepalive = 25

Managing Variables

If we run the playbook now, it will fail with a 'server_pubkey' is undefined error. That's because server_pubkey is defined for the play that targets the server, it's not available for the play targeting the client. We need to move the variable somewhere so that it's readable by the entire playbook. Ansible looks for YAML files in a group_vars/ folder where the filename matches server groups in the inventory file. So, we could create a group_vars/vpn.yml file and declare variables in it, which would be directly usable when running a play against any servers in the vpn group. We don't include localhost as a host in the vpn group (though we could). We'll instead use the special group_vars/all.yml file, which makes variables available to all hosts.

Move the server keys' variables from playbook.yml to group_vars.all.yml:



---
server_privkey: !vault |
      $ANSIBLE_VAULT;1.1;AES256
      646438636565343063343631326136386239623935393637336539653636386135363
      663386639393232346534643163656363316234306439306566306534610a31326664
      363763663139383034636632343230376365333130333230373866353033326563303
      5636138373830633534373033303536303566663166616539360a3936353033663263
      336662663034376661616631343661333164363134373061343739633637623739306
      465653532383838393662396333623966343165366635353132396332313762343534
      65313761623964653532623839356633343838
server_pubkey: 7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=

Your directory should now look like this:



.
├── group_vars
│   ├── all.yml
├── inventory.ini
├── playbook.yml
└── templates
    ├── client_wg0.conf.j2
    └── server_wg0.conf.j2

Run the playbook and the client should run all its tasks successfully:



ansible-playbook -i inventory.ini --ask-vault-password --ask-become-pass playbook.yml

The VPN client is now set up. The only remaining step for the client is to start the VPN after the server is running and configured to accept connections from the client (so the client's PostUp ping will succeed).

Adding a Peer to the Server Config

Add a [Peer] section to the server template at templates/server_wg0.conf.j2:



# {{ ansible_managed }}
[Interface]
Address = 10.0.0.1/24
ListenPort = 51820
PrivateKey = {{ server_privkey }}

[Peer]
PublicKey = {{ hostvars['localhost'].pubkey }}
AllowedIPs = 10.0.0.8

We read the {{ server_privkey }} from group_vars/all.yml and we read {{ hostvars['localhost'].pubkey }} from the set_fact module that runs during the client-targeted play in the playbook.

Reloading the Server Config

If we run the playbook, the config file on the server will be updated with the new [Peer] section, but the WireGuard interface is already running and configured based on the old file contents. We need to reload the configuration when it changes. Handlers are the Ansible-provided mechanism for this, and they trigger when a task referencing them changes. Handlers run at the end of the play in which they're notified, so many tasks could notify a "reload config" handler, but the handler would only run once at the end. Let's create a couple handlers in a handlers list after the tasks lists in playbook.yml and notify them from the create client wireguard config and create server wireguard config tasks:



  # ...
  - name: create client wireguard config
    template:
      dest: /etc/wireguard/wg0.conf
      src: client_wg0.conf.j2
      owner: root
      group: root
      mode: '0600'
    notify: restart wireguard

  handlers:
  # Restarts WireGuard interface, loading any new config and running PostUp
  # commands in the process. Notify this handler on client config changes.
  - name: restart wireguard
    shell: wg-quick down wg0; wg-quick up wg0
    args:
      executable: /bin/bash

- name: setup vpn server
  hosts: vpn_server
  tasks:
  # ...
  - name: create server wireguard config
    template:
      dest: /etc/wireguard/wg0.conf
      src: wg0.conf.j2
      owner: root
      group: root
      mode: '0600'
    notify: reload wireguard config
  # ...

  handlers:
  # Reloads config without disrupting current peer sessions, but does not
  # re-run PostUp commands. Notify this handler on server config changes.
  - name: reload wireguard config
    shell: wg syncconf wg0 <(wg-quick strip wg0)
    args:
      executable: /bin/bash
# ...

The template Ansible module only performs an action and marks the task as changed if the config file changes — it is idempotent. Idempotence is valuable when used with handlers, because the handler will only run when the task changes. Notifying a handler on a task that isn't idempotent may result in the handler always running (e.g. a service is unnecessarily restarted everytime the playbook is run).

Start the VPN Client

Add one final play to the end of the playbook to start the client VPN now that the server is configured to accept its connection:



# ...
- name: start vpn on clients
  hosts: localhost
  connection: local
  become: yes
  tasks:
  - name: start vpn
    command: wg-quick up wg0

Automation Complete!

Now we can run the whole playbook and — whether the server and client are brand-new or in some intermediate state — this single command will set up a WireGuard VPN server and client!



ansible-playbook -i inventory.ini --ask-vault-password --ask-become-pass playbook.yml

The complete Ansible code can be found at: https://gitlab.com/tangram-vision-oss/tangram-visions-blog

There are many improvements that could be made:

Provision a cloud server automatically, using an Ansible module such as community.digitalocean.digital_ocean_droplet.
Automatically update a floating IP address when provisioning a new cloud VPN server.
Configure multiple clients automatically. One approach is to add a vpn_clients group to the inventory, define VPN IPs in the inventory (e.g. vpn_ip=10.0.0.8), and use those host variables in the config templates. When templating the server config, loop over hostnames in the clients group, adding a new [Peer] block for each.
Organize the playbook as roles, one for the server and one for the client. Roles are more reusable and shareable than playbooks.
Test and lint with molecule and ansible-lint.

Thanks for joining me on this Ansible-learning journey! If you have any suggestions or corrections, please let me know or send us a tweet, and if you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

Making Remote Work Work With GitLab

Jeremy Steward — Thu, 18 Feb 2021 21:10:57 +0000

Tangram Vision started our mission to make integrating and deploying multi-sensor applications easy in 2020. While starting a new venture is invariably hard, the pandemic has changed many of the ways in which we all work. Notably, our engineering team has been working fully remote from the very beginning. There are a lot of ways in which remote collaboration is made possible, from various tools to the way one’s team interacts. In particular, our engineering team has placed GitLab at the core of our remote workflow, because it reinforces our values and perspectives around working well remotely.

Photo by Nail Gilfanov on Unsplash

Aspects of Success When Working Remote

“Working remote” is a pretty broad topic. At Tangram Vision, we wanted to list some of the practices we believe promote successfully working remote. In the context of writing software, these are some of the most important high-level considerations for our remote workflow:

Mutual sense of trust and understanding
Asynchronous work-cycle, and
Division of responsibility

Fostering a mutual sense of trust and understanding is a crucial component to any team. Especially when dealing with large or complex software systems, it is not possible for a single person to keep the entirety of the project in their head at one time. Being able to bring others up to a shared level of understanding about parts of the project they may be unfamiliar with is one way to help foster trust among your teammates. Trust and understanding go hand-in-hand, and lead to a culture where positive collaboration is the default.

Working asynchronously can seem nerve-wracking if you’re not used to working remotely. However, one of the strongest advantages of working remotely is the flexibility. Schedules are far less constrained (for better or worse) than when working at a fixed location and time. In order to minimize the dependency on needing others to be available to accomplish our goals, we structure our workflow in such a way that information is not lost if something cannot be addressed immediately. This means documenting everything ruthlessly and building a “digital paper trail” of all our work. Through transparency building on top of trust, we are able to make more informed decisions on the urgency and importance of different tasks, and we’re able to operate without blocking on every task.

Division of responsibility may seem a bit out-of-place on this list but is a crucial part of working remotely and encourages an asynchronous feedback loop. Dividing responsibility allows our team to operate independently, but more important than that is that it encourages us to not build silos. With strong trust as a base, we divide responsibility for documenting, reviewing, and ensuring the quality and spirit of work on the team. While these are all smaller examples of a greater process, the division of responsibility is a crucial part in building sustainable processes.

In the following sections, we’ll try to demonstrate which features of GitLab contribute towards this remote strategy.

Why GitLab? Why Not X?

There are many providers out there for Git hosting. Picking a provider can be difficult since there’s a lot of feature overlap between different providers. However, GitLab as a company operates entirely remotely, and they publish some great guides to working all-remote. This can be seen in a lot of the tools and workflows they promote. More importantly, GitLab’s own messaging around what aspects and behaviors encourage a strong remote environment echo our own values with regards to remote work.

Many of the features in the standard GitLab workflow help us express the three aspects of success we listed above.

What’s in a Workflow, Anyway?

There’s probably a million different git workflows out there. Git is extremely flexible and can accommodate a variety of ways to organize work across branches, tags, commits, etc. In many cases a workflow that works well for one team may not work for another. This can be because of any number of these factors:

Team size
Team distribution
- Do you have separate teams for development & operations, or does your organization rely on some kind of DevOps roles?
Project requirements
- Safety critical code vs. end-user application
- Device firmware vs. Web application
- Monorepo vs. many small repositories
Public vs. private work
- Are you building an open-source project?
- Do you incorporate open-source alongside proprietary code?
Programming language
- Your choice in language might dictate your continuous integration pipeline, or how you structure the files in your repository

Many of these questions can dictate different aspects of the process. Unfortunately, there’s no way to immediately make recommendations without knowing your team, but at a high level many of the steps are the same for typical workflows.

GitLab Workflows That Works Remotely

At Tangram Vision, we’ve settled into a pretty typical workflow using GitLab’s free offering. Work tends to follow this formula:

Make a new branch to address a specific feature or issue
- This branch is often named according to what you’re working on. We often use a prefix for our branches. e.g. if you’re working on adding a feature to a sub-module in the code, we might call our branch <submodule>/<feature-name>.
💡 This is mostly a convention, so experiment with what works for your team!
Create a new merge request for that branch, titled “WIP: ”
- The WIP at the beginning stands for “work in progress.” This signals to others that this branch isn’t ready yet, so don’t start reviewing the branch as the work is incomplete.
- Making the MR early ensures that all your work will have continuous integration (CI) pipelines run on every push.
Work on the branch, committing and pushing code as you go!
When the branch is ready for review, we add a reviewer, remove “WIP” from the name of the merge request, and wait for our reviewer to review our code.
- The reviewer is typically a maintainer on the project, or if you are the only maintainer, someone else on the project who can give your code a second pass.
Once the code is reviewed and approved, a maintainer needs to merge it into the default branch!

And that’s it. From a surface level, we’re really not doing anything particularly special here. At each step along the way though, there’s a lot to appreciate when working remotely. Let’s get into what that means.

Using Branches

Having everybody commit directly to your default branch is generally a mistake. Using branches breaks up your work into atomic components, and allows for the use of merge requests (discussed in the next section). But what is a default branch anyway? Well, this may be a branch named master, main, develop, or something else entirely. This is usually a branch that either needs to always be working (e.g. what you ship if you're doing continuous deployment), or is a branch that you cut other releases off of.

Again, it is generally a mistake to have everyone in your org committing to the same branch at the same time. Doing so quickly devolves into chaos as changes from one team member start propagating to every other team member, and soon others will get frustrated that nothing ever builds.

In the short term, team members lose trust and confidence in their work since they can’t be sure what changes they’re running every time they pull and then build. Additionally, this hurts understanding, as team members are less likely to review a single commit in isolation, compared to reviewing an entire branch when it comes time for a unit of work to be merged back in. Instead, opt for smaller branches that attempt to solve a single issue (or at least, a group of related issues) at a time. This divides responsibility across the team to be considerate of others working, and helps keep work asynchronous, since changes pushed to branches aren’t going to derail work for the entire team if they cause issues or break builds.

By using branches we can leverage merge requests, which are an excellent means to collaborate with others. One consequence of remote work is that it can sometimes encourage “lone-wolf” behaviour, where members of the team do not communicate or collaborate at all, and just push more and more work to the repository. Branches and merge requests are at the core of our workflow because they allow us to both enforce and contribute to healthy collaboration among team members.

How can we enforce this then? Fortunately, GitLab comes with some built-in tools to prevent your team from committing willy-nilly to any branch. The feature is called:

Protected Branches

At Tangram Vision, we utilize the protected branches feature of GitLab. What that means is we prevent others from being able to directly push to our default branch at all. You can find this setting in Settings → Repository → Protected Branches in your repository settings.

Example of protecting the default (main) branch so that only Maintainers can directly push to the main branch, but Developer and Maintainer roles can "merge" a merge request. These permissions have to be set per-repository, but group / organization settings can create defaults.

Protecting branches is one way in which we divide responsibility, by enforcing the use of merge requests to integrate code into the default branch. It’s important to understand roles here, as it is not typically desirable to have everyone in your organization be a maintainer on every repository. As the old saying goes, “the fastest way to starve a horse is to assign two people to feed it.”

Merge Requests

Merge requests are one of the most integral tools to our workflow. Merge requests are where we:

Review code
Run automated tests

Merge requests gate code from being merged into our default branch. This effectively means that all code needs to be reviewed and all tests need to be run (and pass) before code makes it back into any kind of release. Let’s look at a few ways that these properties of merge requests help us work better remotely, starting with code review!

Code Review

All code at Tangram has to undergo a code review. Responsibility for getting code reviewed is divided across the team. While we usually ask the maintainer of a repository to participate in the code review, maintainers themselves also have to have their code reviewed. This is beneficial for a number of reasons:

Reviews are a mechanism for sharing knowledge and fostering understanding between teammates working on the same code base. As the author of a merge request, you have the opportunity to teach others about what approach you took, and why. As the reviewer, you can help contribute to the context surrounding a change, helping build understanding around the decisions made.
They allow a second pair of eyes to check your work, or suggest next steps.
They’re one of many mechanisms for seeking dissent. This means understanding concerns and conflicting viewpoints from others affected by the decision. Receiving dissent does not mean that a choice shouldn’t be made; it just needs to be made with consideration.

In direct contrast, some things code reviews are not:

Are not for criticizing code style. Formatters and linters exist to do this job consistently for an entire codebase; these tools should be relied on and deployed automatically, rather than relying on a reviewer’s subjective style preferences.
Are not a tool for rejecting changes. The focus should always be on an attitude of finding solutions and improving the end result of the code. A healthy attitude can go a long way towards empowering others and making them feel like their contributions are not only welcome but desired. The purpose is to build trust and understanding, not to shoot down ideas for improvement.

Code reviews are one of the primary ways in which a remote team gets to collaborate, so the spirit of our reviews is really important. When done correctly, code reviews help promote understanding of the code being reviewed, while building stronger trust in the systems we’re designing since we always have (at least) a second pair of eyes on every change. They also are one way in which we divide responsibility, since everyone on the team has to participate in having their code reviewed.

Lastly, by doing all code reviews in the merge request itself, we build a paper trail of documentation for the future. This means that we can look back at exactly when a change was introduced, and find any discussion about potential trade-offs next to a change. This gives the team confidence in understanding the context behind a change, months or years after it has been introduced. Moreover, it encourages team members to be able to work asynchronously, as that context is not held in any single individual’s head, but instead written and made explicit.

Automated CI and Testing

There’s probably enough here we could write a whole blog post on its own, but suffice to say we set up our own GitLab CI runners that can be hosted on DigitalOcean or Amazon infrastructure. The number of free minutes that come with GitLab’s free offering are pretty limited so having our own runners is pretty crucial to having CI.

Code reviews do not need to focus on determining which tests are broken as a result of some work (if any), because the CI pipeline does this for us. Responsibility for ensuring that the pipeline is in a good state is divided across the team. By keeping the pipeline in a good state, we can build trust in the state of the code-base, which is essential when beginning new work.

One thing that isn’t required, but is helpful, is making work-in-progress (WIP) merge requests before a branch is complete and ready for review. The advantage of these is that our CI pipeline runs in the background on every push to Gitlab. This is useful even if you’re not working remotely, but means that you get early feedback on your changes from CI before you submit it for full review. This helps decouple the feedback process from development a little bit, making the whole workflow a bit more asynchronous.

Committing Work to Your Branch

There’s little to say here since this will be largely dictated by the individual contributors themselves. One thing to keep in mind is keeping a clean and consistent history through your commit messages. This excellent article by Chris Beams offers more on the topic than we can add ourselves. Here at Tangram Vision, this article helps define our own goalposts for what makes a good commit message.

Commit messages are another great way to help communicate intent across a remote team, and serve as a form of documentation on “why” changes were made. This is another reason we don’t mandate that changes are squashed before merging, as the history helps contextualize how the code grows and changes over time. Overall, this is another way in which we make our context explicit through documenting the process. This builds trust and understanding across the team, and encourages a more asynchronous workflow as team members are encouraged to read the git log to understand changes made to the system!

Bringing This All Together

Our general workflow isn’t terribly complex, and represents a fairly standard workflow on GitLab’s free offering. Despite this, we can see how GitLab’s default workflow and many of the settings available are geared towards encouraging remote work and making it easier.

GitLab is one way in which Tangram is building a remote-first workplace. We lean on trust, an asynchronous workflow, and dividing responsibility across the organization to remote software development a success. Whether it is protecting branches, encouraging the use of merge requests, running CI pipelines or just keeping a collaborative spirit around code review, GitLab helps us do our jobs every day.

As always, feel free to reach out to us through our website or on Twitter if you have something to add! What are some ways in which you or your organization use GitLab to help foster better remote work?

RSBadges: Create Code Badges in Rust

Brandon Minor — Thu, 11 Feb 2021 19:46:07 +0000

We've just launched our first open source project at Tangram Vision.

RSBadges is a Rust-friendly badge generator. The interface strives to be minimal while still providing a feature-rich API. Both the label (the left side) and the message (the right side) of the badge can be customized fully, with the ability to:

Set text
Set color using any valid CSS color code
Embed a link into each side or a link for the whole badge
Add a logo (in SVG format) from a local source or a URL
Embed that logo's data into the badge directly
Set the style of badge, as described in Shields.io

RSBadges can be used as an API or a command line interface (CLI).

Check it out on GitLab here

Got feedback? We'd love to hear it!

Rotate, Scale, Translate: Coordinate frames for multi-sensor systems - Part 1

Jeremy Steward — Wed, 20 Jan 2021 22:42:29 +0000

Update! You can find Part 2, where we explore coordinate frames for 3D systems, here.

Multi-sensor systems are becoming ever more common as we seek to automate robotics, navigation, or create a "smart" world. Any random assortment of sensors could be sensing any number of quantities, from barometric pressure, to temperature, to even more complex sensors that scan the world around us and produce 3D maps (like LiDAR). These multi-sensor systems often carry a great deal of complexity, to be able to learn about the world to the same degree that we can just by listening, feeling, and seeing our environment.

An important aspect of multi-sensor systems is how we relate these assorted sensing platforms together. If we are reading temperature, how do we use that value to make decisions? If we have multiple cameras, how can we tell if an object moves from the view of one camera to the next? Individual sensors on their own may be insufficient for making decisions, and therefore need to be linked together. There are a handful of ways we can correlate data, but one of the most powerful ways is to do so spatially. In this post, we're going to explore some of the language and tools we use for doing so.

Location, location, location

It's all about location. No, really. All sensors have some spatial property, and it is by relating sensors together in this way that we can produce useful results! Knowing that a sensor measured 30° C isn't particularly useful on its own. Knowing that a sensor in your thermostat measured 30° C is a much more useful distinction. The same holds true for more sophisticated sensors such as cameras, LiDAR, accelerometers, etc. These "advanced" sensors are even measuring spatial quantities, which are the backbone of modern robotics and automation.

Many of the useful aspects of multi-sensor systems are derived by a spatial understanding of the world around us. Location helps us decide how related any two quantities might be; rather, it is in relating things spatially that our sensors can derive the context of the world around us. Many of the problems in integrating new sensors into a robotics or automation system are therefore coordinate system problems. To understand this, we first need to talk about coordinate frames and then discuss how to relate any two coordinate frames.

Sensors and coordinate frames

A coordinate frame or coordinate system is a way for us to label the positions of some points using a set of coordinates, relative to the system's origin. A common type of coordinate frame is a Cartesian coordinate frame, which labels positions along a set of perpendicular axes. Consider the following Cartesian grid, defining a coordinate frame:

An example of a Cartesian grid with three points (p, q, r) plotted within. This grid represents a coordinate frame or coordinate system with an origin O and two perpendicular axes x and y.

For a Cartesian frame, we denote the position of our points as a tuple with the offset from the origin O along each of the axes. In the above example, our point p has a coordinate of (2, 2), while q has a coordinate of (3, 4). For any point in the system, you can describe its coordinate as (xp, yp) for a point p, where xp is the offset from the origin O along the direction of the x-axis and yp is the offset from the origin O along the direction of the y-axis.

Other types of coordinate frames exist as well, including polar coordinate systems and spherical coordinate systems. Map projections such as Mercator or Gnomonic maps are also a type of projected coordinate system, that exist as a handy way to plot and interpret the same data in different ways. For now however, we will focus on Cartesian coordinate frames, as they tend to be the most commonly used coordinate frame to represent and interpret spatial data.

From the example above, we can pick out a few salient components of our Cartesian frame:

O: The origin of our coordinate frame. This tells us to what point every point is relative. Every point is relative to the origin, and the origin can be defined as a point that has zero offset relative to itself.
x- and y-axes: Every coordinate frame will have a number of axes used to describe the various dimensions of the coordinate system. For now, we're sticking with 2-dimensional (2D) data, so we'll stick to labeling our axes as the x- and y-axes. These could actually be called anything, like the a- and b-axes, but the typical convention for Cartesian coordinate frames is to name them x and y.
Axes order: oftentimes you'll hear about left-handed vs. right-handed coordinate systems. There are ways to distinguish the difference, but we're choosing to gloss over that for now in this primer.

Typically, a Cartesian frame is represented as a "right-handed" coordinate frame. The distinction isn't super important, and the idea of left vs. right-handed frames can be extremely confusing at first. More often than not, you won't see a left-handed Cartesian coordinate system that uses x / y / z terminology. Moreover, right-handed frames are more-or-less the standard for conventional robotics sensing. All you need to know is that for what we're doing here we will be representing all our math for right-handed, Cartesian frames.

In order to spatially relate our sensors together, we need to give each sensor its own local coordinate frame. For a camera, this might be where the camera lens is centered over the image plane, or what we call the principal point. For an accelerometer, this may be the centre of where accelerations are read. Regardless of where this coordinate frame is defined though, we need to have a "local" frame for each of these sensors so that we can relate these together.

The "world frame"

When relating multiple coordinate frames together, it is often helpful to visualize a "world" frame. A world frame is a coordinate frame that describes the "world," which is a space that contains all other coordinate frames we care about. The world frame is typically most useful in the context of spatial data, where our "world" may be the real space we inhabit. Some common examples of "world" frames that get used in practice:

Your automated vehicle platform might reference all cameras (local systems) to an on-board inertial measurement unit (IMU). The IMU may represent the "world" frame for that vehicle.
A mobile-mapping system may incorporate cameras and LiDAR to collect street-view imagery. The LiDAR system, which directly measures points, may be treated as your world frame.

In the world frame, we can co-locate and relate points from different local systems together. See the below figure, which shows how a point p in the world frame can be related between multiple other frames, e.g. A and B. While we may only know the coordinates for a point in one frame, we eventually want to be able to express the coordinates of that point in other coordinate frames as well.

A world frame, denoted by the orange and teal axes with origin OW. This world frame contains two "local" coordinate frames A and B, with origins OA and OB. A common point p is also plotted. This point exists in all three coordinate frames, but is only listed with a coordinate for pW, which is the point's coordinates in the world frame.

The world frame as shown in the figure above is useful as it demonstrates a coordinate frame in which we can reference the coordinates of OA and OB relative to the world frame. All coordinate systems are relative to some position; however, if OA and OB are only relative to themselves, we have no framework with which to relate them. So instead, we relate them within our world frame, which helps us visualize the differences between our coordinate systems A and B.

Relating two frames together

In the previous figure, we showed off two local frames A & B inside our world frame W. An important aspect in discussing coordinate frames is establishing a consistent notation in how we refer to coordinate frames and their relationships. Fortunately, mathematics comes to the rescue here as it provides some tools to help us derive a concise way of relating coordinate frames together.

Let's suppose we still have two coordinate frames we care about, namely coordinate frame A and coordinate frame B. We have a coordinate in frame A, pA, that we want to know the location of in frame B (i.e. we are searching for pB). We know that this point can exist in both frames, and therefore there is a functional transformation to convert to pB from pA. In mathematics, we express relations as a function:

p_B = f(p_A)

We don't know what this function f is yet, but we will define it soon! It's enough to know that we are getting a point in coordinate frame B from coordinate frame A. The function f is what we typically call a transform. However, saying "the coordinate frame B from coordinate frame A transform" is a bit wordy. We tend to lean towards referring to this transform as the B from A transform, or B←A transform for brevity.

Notice the direction here is not A to B, but rather B from A. While this is a bit of a semantic distinction, the latter is preferred here because of how it is later expressed mathematically. We will eventually define a transformation matrix, Γ^BA, that describes the transform f above. The superscript and subscript are conventionally written this way, denoting the B←A relationship. Keeping many coordinate frames consistent in your head can be difficult enough, so consistency in our notation and language will help us keep organized.

Now that we have a notation and some language to describe the relationship between two coordinate frames, we just need to understand what kinds of transforms we can apply between two coordinate frames. Fortunately, there's only 3 categories of transformations that we need to care about: translations, rotations, and scale!

Translation

Translations are the easiest type of coordinate transform to understand. In fact, we've already taken translation for granted when defining points in a Cartesian plane as offsets from the origin. Given two coordinate systems A and B, a translation between the two might look like this:

A world frame that shows two local frames, A and B, which are related to each other by a translation. This translation, T^BA, is just a set of linear offsets from one coordinate frame to the other.

Mathematically, we might represent this as:

Pretty simple addition, which makes this type of transformation easy!

Rotation

Rotations are probably the most complex type of transform to deal with. These transforms are not fixed offsets like translations, but rather vary based on your distance from the origin of your coordinate frame. Given two coordinate frames A and B, a rotation might look like so:

A world frame that shows two local frames, A and B, which are related to each other by a rotation. This rotation, R^BA, is a linear transformation that depends on the angle of rotation, θ

In the 2D scenario like shown above, we only have a single plane (the XY-plane), so we only need to worry about a single rotation. Mathematically, we would represent this as:

This rotation matrix assumes that positive rotations are counter-clockwise. This is what is often referred to as the right-hand rule.

To get a point pB from pA we then do:

The matrix multiplication is a bit more involved this time, but remains quite straightforward. Fortunately this is the most difficult transformation to formulate, so we're almost there!

Scale

The final type of transformation we need to consider is scale. Consider the following diagram, similar to the ones we've shown thus far:

A world frame that shows two local frames, A and B, which are related to each other by both a translation and scale. The translation exists to compare A and B side-by-side, but the difference in axes lengths between the A and B frames is a visual trick used to demonstrate what a scale factor might look like, assuming that two coordinate frames are compared in reference to a world frame.

The two frames are again translated, but this is not important for what we're looking at here. Notice that the axes of A are a different length than the axes of B. This is a visual trick to demonstrate what scale transformations do between two coordinate frames. An example of a real-world scale issue might be a unit conversion. Namely, B might be in units of meters, while A is in units of millimeters. This scale difference will change the final results of any point pB relative to pA, by a multiplicative factor. If we have an isometric scale, we might represent this mathematically as:

Now, in this way, we are using a scalar value to represent an isometric scale across both x and y axes. This is not always the case, as sometimes our scale is not isometric. In robotics, we typically treat most of our sensors as having an isometric scale, but it is worth showing the mathematics for how one might generalize this if the scale in x (sx) is different from the scale in y (sy):

By utilizing this more general matrix equation over the scalar form above, it is easy to abstract between isometric scales (where sx = sy), and affine transforms. Fortunately it's often very easy to get away with assuming that our scale is isometric.

Putting it all together

Now that we know the three types of transforms, how do we put it all together? Any two coordinate frames A and B could have any number of translations, rotations, and scale factors between them. A transform in the general sense incorporates all three of these operations, and so we need a more formal way to represent them. Using our previous notation, we can formulate it as follows:

Keep in mind the order here:

Rotation
Scale
Translation

With all the multiplications going on, this can get very confusing very quickly! Moreover, remembering the order every time can be pretty difficult. To make this more consistent, mathematicians and engineers often try to represent it as a single matrix multiplication. This way, the order is never confusing. We call this single matrix Γ^BA. In plain English, we typically call this the B from A transformation matrix, or just the B←A transform. Unfortunately, however, you'll notice that not every operation in our above equation is a matrix multiplication, so the following doesn't work!

This would be fine for rotation and scale, but doesn't allow us to do anything about translations. Fortunately, we can work around this somewhat by leveraging a small trick of mathematics: increasing the dimensionality of our problem. If we change some of our definitions around, we can create a nuisance or dummy dimension that allows us to formulate Γ^BA as:

Notice that our last dimension on each point remains equal to 1 on both sides of the transformation (this is our nuisance dimension)! Additionally, the last row of Γ^BA is always zeros, except for the value in the bottom right corner of the matrix, which is also a 1!

If you want to learn more about the trick we're applying above, search for homogeneous coordinates or projective coordinates!

Try it for yourself with these Python functions! Find this and the code used to generate our above figures in our Tangram Visions Blog repository.

Tangram-Vision / Tangram-Vision-Blog

Code pertaining to posts made on our official blog!

Tangram Visions Blog

This repo holds code to generate the assets used in the company blog for Tangram Vision! The main page of the blog can be found here.

See the README in each directory for instructions on installation, operation, and output.

One to Many Sensors

Code here
2020.11.30 post: One To Many Sensors, Part I
2020.12.04 post: One To Many Sensors, Part II

Coordinate Frames

Code here
2021.01.21 post: Coordinate systems, and how to relate multiple coordinate frames together Part I

Exploring Ansible via Setting Up a WireGuard VPN

Code here
2021.03.04 post: Exploring Ansible via Setting Up a WireGuard VPN

Color (Or Not)

Code here
2021.03.?? post: Color (Or Not)

Contributing to this project

This is an open-source project; if you'd like to contribute a feature or adapt the code, feel free! We suggest you check out our contributing guidelines. After that, make…

View on GitHub

What was it all for?

Knowing how to express relationships between coordinate frames both in plain English (I want the B←A or B from A transformation) and in mathematics (I want Γ^BA) helps bridge the gap for how we relate sensors to each other. In a more concrete example, suppose we have two cameras: depth and color. We might want depth←color, so that we can fuse semantic information from the color camera with spatial information from the depth camera. Eventually, we want to then relate that information back to real world coordinates (e.g. I want world←depth).

Coordinate frames and coordinate systems are a key component to integrating multi-sensor frameworks into robotics and automation projects. Location is of the utmost importance, even more when we consider that many robotics sensors are spatial in nature. Becoming an expert in coordinate systems is a path towards a stable, fully-integrated sensor suite on-board your automated platform of choice.

While these can be fascinating challenges to solve while creating a multi-sensor-equipped system like a robot, they can also become unpredictably time consuming, which can delay product launches and feature updates. The Tangram Vision SDK includes tools and systems to make this kind of work more streamlined and predictable — and it's free to trial, too.

We hope you found this article helpful—if you've got any feedback, comments or questions, be sure to tweet at us!

What They Don’t Tell You About Setting Up A WireGuard VPN

Greg Schafer — Tue, 12 Jan 2021 19:34:03 +0000

WireGuard is a relatively new VPN implementation that was added to the Linux 5.6 kernel in 2020 and is faster and simpler than other popular VPN options like IPsec and OpenVPN.

We'll walk through setting up an IPv4-only WireGuard VPN server on DigitalOcean, and I'll highlight tips and tricks and educational asides that should help you build a deeper understanding and, ultimately, save you time compared to "just copy these code blocks" WireGuard tutorials.

Let's get a server!

To set up a VPN, we need two computers that we want to connect. One of these is typically a desktop/laptop/phone in your possession. If you're looking to remotely access company intranet sites and services, the other computer would be a server in an office or on a company cloud network. If you're looking to remotely access your own home network, privately network with family/friends, or encrypt all of your internet traffic, then the other computer would be a personal server on a cloud provider like DigitalOcean or AWS.

VPN connectivity overview. CC BY-SA 4.0, Image attribution: Creative Commons License

For this walkthrough, we'll use a new Ubuntu 20.04 server on DigitalOcean, though you could follow similar steps using any cloud provider. To create a new DigitalOcean server, follow their guide to creating a droplet. A "droplet" is the term DigitalOcean uses for a "server" or a "VM" or an "instance".

VPCs and Private Networks

DigitalOcean servers are automatically created in a Virtual Private Cloud aka VPC (most cloud providers have VPC or private networking functionality), meaning they have an additional network interface (eth1 in addition to eth0) and an additional private IP address. All servers, databases, and load balancers created in the same VPC can communicate with each other via their private IP addresses, which is a boost to security because all inbound traffic from the public internet (on eth0) can be blocked with a firewall.

You can use your VPN server as a sort of bastion host to access other resources inside your VPC using their private IP addresses. That is, your VPN server can route traffic to any IP address in the VPC and all the servers in your VPC can accept traffic only to their private IP addresses (to eth1), which protects those servers and the services they run from all sorts of attacks. The server configuration section below will mention how to set up this sort of architecture.

How can I keep my VPN server up?

Given the importance of VPN uptime — especially if it serves as the only way to access important servers in a VPC or remote company network — it's worth considering how to handle or avoid downtime. There is a range of options and tradeoffs to consider, ordered below in increasing complexity/effort:

Do nothing! If you set up a server on DigitalOcean, install and configure the VPN, and take no further actions, then your VPN will go down when the server does. It's not uncommon for DigitalOcean to migrate droplets between physical machines due to hardware issues, and the VPN will be unavailable if the migration can't be performed without downtime. If a more serious issue causes downtime (e.g. accidental rm -rf /, networking misconfiguration, or a successful attack), then you'll need to set up and configure a new server from scratch to bring your VPN back up. If you didn't save the VPN server's private key offline, you'll need to generate a new private key and reconfigure all VPN clients to be able to connect to the new VPN server.
Enable droplet backups. You can enable backups for an extra +20% of the droplet price, which will take weekly snapshots of the server. If the droplet ends up horribly broken or unresponsive, you can restore the latest backup and your VPN will be working again (in about 1 minute for a 1 GB droplet).
Set up manual failover. Set up the VPN server and take a snapshot, then restore the snapshot to a new droplet. Point a floating IP to one of the servers and use that IP address when connecting to the VPN. When the primary/active VPN server goes down for any reason, you can update the floating IP to point to the secondary/standby VPN server and your VPN will work again!
Set up automatic failover / high-availability. The next step up in sophistication is to either:
- detect when the VPN server goes down and automatically switch (point a floating IP address) to a healthy standby using something like Pacemaker, or
- put a UDP load balancer in front of multiple VPN servers, but... you might need some network trickery to allow multiple active VPN servers with the same IP address and you might also need sticky sessions, which breaks down for roaming clients without some protocol-level changes like Cloudflare made for WARP.

Set up a WireGuard server

With your shiny new server running, let's install and configure WireGuard. For non-Linux platforms, follow the WireGuard website's instructions and links. For this walkthrough, I'll show instructions for Ubuntu 20.04, starting with installing the wireguard package:



sudo apt update
sudo apt install wireguard

The wireguard package installs two binaries:

wg — a tool for managing configuration of WireGuard interfaces
wg-quick — a convenience script for easily starting and stopping WireGuard interfaces

I encourage reading the manpages (man wg and man wg-quick), because they are concise, well-written, and contain a lot of information that is glossed over in most WireGuard tutorials!

To encrypt and decrypt packets, we need keys. 🔑



# Change to the root user
sudo -s

# Make sure files created after this point are accessible only to the root user
umask 077

# Generate keys in /etc/wireguard
cd /etc/wireguard
wg genkey | tee privatekey | wg pubkey > publickey

Now we have a private key (which only the server should possess and know about) and a public key (which should be shared to all VPN clients that will connect to this server).

Next, create a configuration file at /etc/wireguard/wg0.conf.

If we use wg-quick (spoiler: we will) to start/stop the VPN interface, it will create the interface with wg0 as the name. You can create other interface config files with other names, such as wg1.conf, my-company-vpn.conf, or us_east_1.conf. The wg-quick script will create interfaces with names that match the config filename (minus the .conf part), as long as the name fits the regex tested in /usr/bin/wg-quick.

Print out your private key with cat /etc/wireguard/privatekey and then add the following to the configuration file:



# /etc/wireguard/wg0.conf on the server
[Interface]
Address = 10.0.0.1/24
ListenPort = 51820
# Use your own private key, from /etc/wireguard/privatekey
PrivateKey = WCzcoJZaxurBVM/wO1ogMZgg5O5W12ON94p38ci+zG4=

We'll add the public keys of clients that are allowed to connect to the VPN later, but the above is all you need to run the VPN server for now. Here's what it means:

Address = 10.0.0.1/24 — The server will have an IP address in the VPN of 10.0.0.1. The /24 at the end of the IP address is a CIDR mask and means that the server will relay other traffic in the 10.0.0.1-10.0.0.254 range to peers in the VPN.
ListenPort = 51820 — The port that WireGuard will listen to for inbound UDP packets.
PrivateKey = ... — The private key of the VPN server, used for encryption/decryption.

At this point, you can start the VPN!



# This will run a few commands with "ip" and "wg" to
# create the interface and configure it
wg-quick up wg0

# To see the WireGuard-specific details of the interface
wg

# To start the VPN on boot
systemctl enable wg-quick@wg0

Find more example commands for inspecting the interface at https://github.com/pirate/wireguard-docs#inspect.

Relaying traffic

Recall from above that Address = 10.0.0.1/24 means the server will relay traffic to peers in the subnet. That is, if you connect to the VPN and ping 10.0.0.14 (and a server exists on the VPN at that address), then your ping will go to the VPN server at 10.0.0.1 and be forwarded on to the machine at 10.0.0.14. However, this won't work without one additional piece of configuration: IP Forwarding.

To enable IP Forwarding, open /etc/sysctl.conf and uncomment or add the line:



net.ipv4.ip_forward=1

Then apply the settings by running:



sysctl -p

Now, the VPN server should be able to relay traffic to other VPN hosts. From my understanding, running ping 10.0.0.14 will follow the left-to-right path shown in the diagram below. The diagram doesn't show the ping response from Peer C to Peer A, but you can mentally reverse all the arrows to see what the returning response path would look like.

The path of network packets from a ping command on Peer A to the destination server, Peer C. The packets enter the VPN at Peer A and route to the VPN server (Peer B), which relays the packets to Peer C via the VPN.

Troubleshooting relayed traffic

There are many places where something could go wrong, especially when relaying traffic between multiple servers as in the diagram above. When network requests are failing, tcpdump is a great tool for finding the source of failures and misconfigurations. If you wanted a complete view of the flow in the diagram above, you could run the following tcpdump commands on each machine:



sudo tcpdump -nn -i wg0
sudo tcpdump -nn -i eth0 udp and port 51820

Just be aware that clocks on servers might be slightly out-of-sync, so comparing timestamps in tcpdump output between servers could be misleading!

If you're debugging network packets on a machine with a display like your desktop or laptop, you can use Wireshark, which is a graphical, user-friendly alternative to tcpdump.

For more insight into WireGuard itself, you can enable debug logging by following the instructions at https://www.wireguard.com/quickstart/#debug-info and then running tail -f /var/log/syslog to see the log messages.

Relaying traffic to a VPC or the internet

In addition to using a VPN server to relay traffic between VPN clients, you can use a VPN server as a way to access servers in a VPC (on DigitalOcean or AWS, for example) that are firewalled off from the public internet. This approach requires no change in WireGuard configuration on the server, but you will need to enable masquerading so that responses on one network (e.g. the VPC) can be mapped to the requesting machine on the other network (e.g. the VPN). If you're unfamiliar with masquerading, check out this brief explanation. Assuming your VPN server is connected to the VPC on its eth1 interface, you can enable masquerading on the VPN server with:



iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth1 -j MASQUERADE

Now, a VPN client such as your laptop should be able to ping servers in the VPC, as in the diagram below.

The path of network packets from a ping command on Peer A to the destination server, Peer C. The packets enter the VPN at Peer A and route to the VPN server (Peer B), which terminates the VPN connection and relays the packets to Peer C via the VPC.

If you want to relay traffic through the VPN server to the internet (in which case, the VPN server is often labeled a bounce server), enable masquerading on the public-internet-facing interface (e.g. eth0) of the VPN server:



iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth0 -j MASQUERADE

Now, a VPN client such as your laptop can visit public internet sites via your VPN — if you're on an unsecured coffeeshop wifi connection or you don't trust your ISP, all they'll see is an encrypted VPN connection.

The path of network packets from a ping command on Peer A to the destination server on the internet. The packets enter the VPN at Peer A and route to the VPN server (Peer B), which terminates the VPN connection and relays the packets over the public internet to the destination server.

Firewall rules

We've used iptables above for masquerading, but iptables is also important for managing the VPN server's firewall. You can use ufw instead, but learn and use iptables if you have the time — iptables is more foundational and powerful. Regardless of how you manage your firewall (I like this sort of approach), you'll need to:

allow UDP traffic to the WireGuard ListenPort (51820 in the sample server config above)
allow traffic forwarded to or from the WireGuard interface wg0

The iptables commands for those changes are:



iptables -A INPUT -p udp -m udp --dport 51820 -j ACCEPT

iptables -A FORWARD -i wg0 -j ACCEPT
iptables -A FORWARD -o wg0 -j ACCEPT

Many WireGuard tutorials suggest putting these iptables commands in the PostUp lines of the server WireGuard configuration, meaning the commands will be run when the wg0 interface is created. Be warned that, depending on how you manage your firewall, you may end up erasing these commands if you restart your firewall while the WireGuard interface is running, thereby making the VPN unreachable. Consider managing WireGuard firewall rules in the same place and with the same tool that you manage all your other firewall rules.

Set up a WireGuard client

Similar to the server setup, install WireGuard (follow the WireGuard website's instructions and links for non-Linux platforms):



sudo apt update
sudo apt install wireguard

Generate keys, similar to server setup:



# Change to the root user
sudo -s

# Make sure files created after this point are accessible only to the root user
umask 077

# Generate keys in /etc/wireguard
cd /etc/wireguard
wg genkey | tee privatekey | wg pubkey > publickey

Next, create a configuration file at /etc/wireguard/wg0.conf with the following content:



# /etc/wireguard/wg0.conf on the client
[Interface]
# The address your computer will use on the VPN
Address = 10.0.0.8/32

# Load your privatekey from file
PostUp = wg set %i private-key /etc/wireguard/privatekey
# Also ping the vpn server to ensure the tunnel is initialized
PostUp = ping -c1 10.0.0.1

[Peer]
# VPN server's wireguard public key (USE YOURS!)
PublicKey = CcZHeaO08z55/x3FXdsSGmOQvZG32SvHlrwHnsWlGTs=

# Public IP address of your VPN server (USE YOURS!)
# Use the floating IP address if you created one for your VPN server
Endpoint = 123.123.123.123:51820

# 10.0.0.0/24 is the VPN subnet
AllowedIPs = 10.0.0.0/24

# To also accept and send traffic to a VPC subnet at 10.110.0.0/20
# AllowedIPs = 10.0.0.0/24,10.110.0.0/20

# To accept traffic from and send traffic to any IP address through the VPN
# AllowedIPs = 0.0.0.0/0

# To keep a connection open from the server to this client
# (Use if you're behind a NAT, e.g. on a home network, and
# want peers to be able to connect to you.)
# PersistentKeepalive = 25

There's lots to talk about here!

Address = ... — Set the IP address of this client in the VPN. Packets sent to the VPN server with a destination of this address will be sent to whatever public IP address (endpoint) this client was last seen at.
PostUp = wg set %i private-key ... — Load the private key from the file after the wg0 interface is up. You can copy-paste the contents of the private key file into a PrivateKey line directly (as in the server config) if you prefer. I suggest not loading the private key via PostUp in the VPN server config however, because reloading the config (e.g. after adding a new client/peer) does not re-run PostUp commands, so the VPN will no longer know its private key and the VPN won't work as a result.
PostUp = ping -c1 10.0.0.1 — Ping the VPN server after the wg0 interface is up to test that the VPN connection was successful. If the ping fails, wg-quick will take the interface back down. In my testing, sending traffic from the VPN server to the client didn't work until something was sent from the client to the server — sending 1 ping packet to the server with PostUp does the trick.
[Peer] — There can be multiple peer sections in the config, one for each VPN peer you wish to connect directly to. Often, the VPN server will be the only peer in a client's config file. Lines under the [Peer] header define how and where the client will connect to the peer.
PublicKey = ... — The public key of the VPN server.
EndPoint = ... — The (usually publicly-accessible) IP address of your VPN server. This could be a floating IP address if you're using a cloud provider like DigitalOcean or AWS.
AllowedIPs = ... — For incoming packets from the VPN server, their source IP address must match the addresses or ranges in AllowedIPs. For outgoing packets, the AllowedIPs is the mapping that tells WireGuard what peer (specifically their public key and endpoint) should be used when encrypting and sending. The last example (AllowedIPs = 0.0.0.0/0) would enable WireGuard to send traffic destined for any IP address to the VPN server. With AllowedIPs = 0.0.0.0/0, wg-quick up will conveniently run ip route and ip rule commands to route all your traffic through the VPN (useful in the aforementioned unsecured coffeeshop wifi or malicious ISP scenarios). For more info on how AllowedIPs works, check out WireGuard's documentation.
PersistentKeepalive = 25 — Send a packet to the VPN server every 25 seconds, to ensure that the server can successfully route traffic to the client when the client doesn't have a public or stable IP address. Without this setting, the client can still send traffic to the VPN server and receive responses, but routers between the client and the server only keep their NAT/masquerade mapping for a few dozen seconds. After the mapping expires, the server won't be able to send anything to the client until the client sends something first. You typically won't enable this setting, unless you want to allow new connections from other devices on the VPN — for example, you would enable this on your home desktop if you wanted to connect to it from your laptop or phone while traveling.

Before starting the VPN on the client, the VPN server needs to be configured to allow connections from the client. Open /etc/wireguard/wg0.conf on the VPN server again and update the contents to match:



# /etc/wireguard/wg0.conf on the server
[Interface]
Address = 10.0.0.1/24
ListenPort = 51820
# Use your own private key, from /etc/wireguard/privatekey
PrivateKey = WCzcoJZaxurBVM/wO1ogMZgg5O5W12ON94p38ci+zG4=

[Peer]
# VPN client's public key
PublicKey = lIINA9aXWqLzbkApDsg3cpQ3m4LnPS0OXogSasNW5RY=
# VPN client's IP address in the VPN
AllowedIPs = 10.0.0.8/32

The added [Peer] section enables the VPN server to coordinate encryption keys with the client and validate that traffic from and to the client is allowed. To apply these changes, you can restart the WireGuard interface on the server:



wg-quick down wg0 && wg-quick up wg0

If you want to avoid disrupting or dropping active VPN connections, reload the config with:



wg syncconf wg0 <(wg-quick strip wg0)

At this point, you can start the VPN on the client!



# This will run a few commands with "ip" and "wg" to

# create the interface and configure it

wg-quick up wg0

# To see the WireGuard-specific details of the interface

wg

Connecting from a Chromebook

If you're connecting to a WireGuard VPN from a Chromebook, I suggest using the official Android WireGuard app. My efforts to run WireGuard under crouton failed, because crouton uses a chroot, so I was stuck with the Chromebook's old Linux kernel (4.19) and unable to add kernel modules or network interfaces from within crouton. Similarly, crostini doesn't allow updating or using custom kernel modules, but it does provide a great way to SSH into VPN-accessible servers while the Android WireGuard app is active.

Connecting from other devices

If you want to connect to a VPN from devices where you don't have root access, you can try installing a userspace implementation of WireGuard such as wireguard-go.

If you want to connect to a VPN from devices you don't control (e.g. smart TVs, IoT sensors), look into setting up WireGuard on your router (e.g. instructions for OpenWRT), so you can route all those devices' outbound traffic through a VPN.

Thanks for reading! Hopefully, I’ve saved you time by passing on some of the insights and tips that I learned while digging deeper into the many facets of setting up a WireGuard VPN. If you have any suggestions or corrections, please let me know or send us a tweet, and if you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

If you're setting up multiple VPNs or multiple VPN clients — or if you're interested in learning about infrastructure and configuration automation — check out the next tutorial I wrote: Exploring Ansible via Setting Up a WireGuard VPN.

Corrections

2020-01-13: Previously, my explanation of what AllowedIPs does and how to route all traffic through the VPN was incomplete/misleading. Thanks to Chris Siebenmann on Twitter for catching that!

References

The Current State of Event Cameras

Adam Rodnitzky — Wed, 30 Dec 2020 21:22:27 +0000

We took one last moment before 2020 disappears to reflect on why event cameras have yet to take the sensor world by storm.

Event Cameras: Where Are They Now? | by Adam Rodnitzky | Tangram Visions | Dec, 2020 | Medium

Adam Rodnitzky ・ Dec 30, 2020 ・ 8 min read
Medium

One-to-Many Sensor Trouble, Part 2

Adam Rodnitzky — Mon, 07 Dec 2020 21:48:14 +0000

Phase 2: Updating Your Prediction

From State to Sensors

While we were busy predicting, the GPS on our RC car was giving us positional data updates. These sensor measurements $Zt\text{Z}_{t}$ give us valuable information about our world.

However, $zt\text{z}{t}$ and our state vector $xt+1\text{x}{t+1}$ may not actually correspond; our measurements might be in one space, and our state in another! For instance, what if we’re measuring our state in meters, but all of our measurements are in feet? We need some way to remedy this.

Let’s handle this by converting our state vector into our measurement space using an observation matrix H:

These equations represent the mean $μ{ₑₓₚ}$ and covariance $Σ{ₑₓₚ}$ of our predicted measurements.

For our RC car, we’re going from meters to feet, in both position and velocity. 1 meter is around 3.28 ft, so we would shape H to reflect this:

…leaving us with predicted measurements that we can now compare to our sensor measurements $Zt\text{Z}_{t}$ . Note that H is entirely dependent on what’s in your state and what’s being measured, so it can change from problem to problem.

Fig. 1: Moving our state from state space (meters, bottom left PDF) to measurement space (feet, top right PDF).

Our RC car example is a little simplistic, but this ability to translate our predicted state into predicted measurements is a big part of what makes Kalman filters so powerful. We can effectively compare our state with any and all sensor measurements, from any sensor. That’s powerful stuff!

An aside: The behavior of H is important. Vanilla Kalman filters use a linear Fₜ and H; that is, there is only one set of equations relating the estimated state to the predicted state (Fₜ), and predicted state to predicted measurement (H).

If the system is non-linear, then this assumption doesn’t hold. Fₜ and H might change every time our state does! This is where innovations like the Extended Kalman Filter (EKF) and the Unscented Kalman Filter come into play. EKFs are the de facto standard in sensor fusion for this reason.

Let’s add one more term for good measure: $R_{t}$ , our sensor measurement covariance. This represents the noise from our measurements. Everything is uncertain, right? It never ends.

The Beauty of PDFs

Since we converted our state space into the measurement space, we now have 2 comparable Gaussian PDFs:

$μ{ₑₓₚ}$ and $Σ{ₑₓₚ}$ , which make up the Gaussian PDF for our predicted measurements
$Z_{t}$ and $R_{t}$ , which make up the PDF for our sensor measurements

The strongest probability for our future state is the overlap between these two PDFs. How do we get this overlap?

Fig. 2: Our predicted state in measurement space is in red. Our measurements z are in blue. Notice how our measurements z have a much smaller covariance; this is going to come in handy.

We multiply them together! The product of two Gaussian functions is just another Gaussian function. Even better, this common Gaussian has a smaller covariance than either the predicted PDF or the sensor PDF, meaning that our state is now much more certain.

ISN’T THAT NEAT.

Update Step, Solved.

We’re not out of the woods yet; we still need to derive the math! Suffice to say… it’s a lot. The basic gist is that multiplying two Gaussian functions results in its own Gaussian function. Let’s do this with two PDFs now, Gaussian functions with means μ₁, μ₂ and variances σ₁², σ₂². Multiplying these two PDFs leaves us with our final Gaussian PDF, and thus our final mean and covariance terms:

When we substitute in our derived state and covariance matrices, these equations represent the update step of a Kalman filter.

See reference (Bromiley, P.A.) for a good explanation on how >to multiply two PDFs and derive the above. It takes a good >page to write out; you’ve been warned.

This is admittedly pretty painful to read as-is. We can simplify this by defining the Kalman Gain $K_{t}$ :

With $K_{t}$ in the mix, we find that our equations are much kinder on the eyes:

We’ve done it! Combined, the new $X$ and P create our final Gaussian distribution, the one that crunches all of that data to give us our updated state. See how our updated state spikes in probability (the blue spike in Fig. 3). That’s the power of Kalman Filters!

Fig. 3: Original state in green. Predicted state in red. Updated final state in blue. Look at how certain we are now! Such. Wow.

What Now?

Well… do it again! The Kalman filter is recursive, meaning that it uses its output as the next input to the cycle. In other words, the final $X$ is your new $X_{t}$ ! The cycle continues in the next prediction state.

Kalman filters are great for all sorts of reasons:

They extend to anything that can be modeled and measured. Automotives, touchscreens, econometrics, etc etc.
Your sensor data can come from anywhere. As long as there’s a connection between state and measurement, new data can be folded in.
Kalman filters also allow for the selective use of data. Did your sensor fail? Don’t count that sensor for this iteration! Easy.
They are a good way to model prediction. After completing one pass of the prediction step, just do it again (and again, and again) for an easy temporal model.

Of course, if this is too much and you’d rather do… anything else, Tangram Vision is developing solutions to these very same problems! We’re creating tools that help any vision-enabled system operate to its best. If you like this article, you’ll love seeing what we’re up to.

Code Examples and Graphics

The code used to render these graphs and figures is hosted on Tangram Vision’s public repository for the blog. Head down, check it out, and play around with the math yourself! If you can improve on our implementations, even better; we might put it in here. And be sure to tweet at us with your improvements.

Check it out here:

Tangram-Vision / Tangram-Vision-Blog

Code pertaining to posts made on our official blog!

Tangram Visions Blog

This repo holds code to generate the assets used in Tangram Visions, the company blog of @Tangram-Vision!

The Blog

Found here in its native Notion format
Found here on Medium
Found here on Dev

Find us wherever fine publications are syndicated.

Demo and Contribute

We take pride in our posts, but we realize that they aren't perfect. If you have an improvement or interesting modification in mind, write it up and submit a merge request! We might incorporate it here, and better yet, into the blog.

Examples in different programming languages
Improved rendering
Improved mathematics and numerical techniques
etc, etc, etc

View on GitHub