Forem: Brian Carroll

Porting Elm to WebAssembly

Brian Carroll — Tue, 28 Sep 2021 19:28:58 +0000

For a few years now, on and off, I've been working on an unofficial port of the Elm language to WebAssembly. It's not production ready but I'm at the stage now where I have some good working demos, and things are taking shape.

For the past year or so I've been mainly working on robustness, doing a lot of debugging, which also led to some rewriting and architecture changes. I rewrote a large part of the GC, fixed lots of edge cases in the language implementation, and made the Wasm/JavaScript interop a lot more efficient.

After all that I've managed to reach my goal of being able to run Richard Feldman's Elm SPA Example in my system! 😃 Here's a working implementation compiled to WebAssembly. And for comparison, you can also check out the same code compiled to JavaScript. (Unfortunately the publicly available APIs don't seem to be returning very much data at the moment but there's not much I can do about that!)

Robustness

My early attempts to get the SPA example running failed pretty badly. There were just too many compiler and core library bugs to be able to disentangle everything. I realised I needed to be patient and work on robustness. So I started by writing lots of unit tests for the low-level C code. You can run the tests in the browser.

And there was a specific part of the GC that was always throwing up hard-to-find bugs and was too complicated and hard to understand. It's the system that tracks references from the stack to the heap. I decided to just throw it out and keep trying different approaches until I found something that just felt stable and obvious and robust. I ended up rewriting it 4 times. The end result is something that's much more straightforward and a lot less scary, and I haven't had any bugs there since.

Once all that handwritten C code was solid, I needed to make sure the C generated from Elm was working properly. I found the source for the core library's unit tests and decided to port them into my project and add some of my own tests. You can run the tests in WebAssembly in your browser too. (Funnily enough, one of the biggest challenges was getting the Elm Test framework itself to run! The framework is more complex than the tests themselves. I still need to come back to the fuzzer tests!)

Then finally, with a bit more debugging, the SPA example came together. That has a lot of code in it and I figure if I can get it to run, I can get most things to run.

Performance

I haven't really focused on performance yet, but I did a quick analysis using Lighthouse from Chrome devtools. Already, without having optimised the JS/Wasm interface for each kernel module, it looks like the SPA example has similar performance to the official compiler's JS output with --optimize and uglify-js. That's a great place to be! There is a lot of encoding and decoding to optimise away to get more performance, which suggests the app code is running a lot faster in WebAssembly. And VirtualDom should get a lot faster.

Still canned demos only!

So overall I think the project is in a pretty good place! But it's not ready for general use. Currently it's only set up to run on the "canned" demo apps in my repo, which all have their own build scripts with minor variations. And there's no solution for package management, so you can't have two apps with different versions of Kernel code.

How does it work?

The system breaks down into a few different areas:

Compiler: I chose to use C as an intermediate language. My forked Elm compiler generates C, then I use Emscripten/Clang to go from C to WebAssembly & JS. (I can also compile C to native code, which is a much better debugging experience.)

Kernel code: Elm's runtime and its main data structures are implemented in handwritten JavaScript, so most of that had to be ported to C. (Again, having both the compiled code and kernel code in C is very helpful).

Garbage Collector: The Elm language expects its target platform to automatically manage memory for it. Browsers don't implement garbage collection for WebAssembly so I built a mark/sweep garbage collector in C. My measurements estimate it only adds 7kB of Wasm to the bundle.

Elm/JS interop: This is actually the toughest part of the project! Let's get into it a bit more below, because it isn't obvious.

Targeting two languages

The single most important thing to know about targeting WebAssembly for browsers is: WebAssembly doesn't have access to any of the Web APIs yet. That means that if you want to do anything useful, you need to produce both WebAssembly and JavaScript. This one crucial fact drives a lot of the system design.

"Web API" here means things like document.createElement, XMLHttpRequest and so on. They are the interfaces between user code and the browser's internal functionality, often with an underlying implementation written in C++.

This is obviously a major drawback and the WebAssembly project has several proposals to work towards better host integration. One of the key issues is how to manage reference lifetimes - if a Wasm module is holding a reference to a DOM node, then it can't be garbage-collected. And if it is just a number in Wasm then it can be copied, which makes it hard to keep track of the copies. These issues are addressed in the GC proposal, which has been at "stage 1" since I first looked at it in 2018.

So our app gets compiled to a mix of WebAssembly and JS, and they need to talk to each other. That in turn means we need to encode and decode values between WebAssembly and JavaScript representations of the same values. For plain numbers, that's easy. For more complex structures it means serialising and deserialising to a binary format.

This turns out to be a huge deal!

Sure, WebAssembly "makes things fast"... but lots of serialising and deserialising "makes things slow"!

In practice, what I've found is that the performance of the system depends almost entirely on how you design this interface between WebAssembly and JavaScript. In my initial versions, there was a lot of unnecessary converting back and forth, and the WebAssembly+JS versions of apps were much slower than the plain JS versions. That traffic was reduced a lot when I ported the Platform and Scheduler Kernel code to C.

We know the kernel code has to be split between C and JS, but exactly where do you draw that line? It's fast for Elm code to call into Kernel C code, but slow to to call into Kernel JS code. So for performance, you want most of it to be in C. But on the other hand, the more Kernel code we decide to leave in JavaScript, the less work we have to do to port it.

Obviously when the kernel libraries were designed, there was no slow "barrier" in between Elm code and Kernel code, or between two different parts of the Kernel code. I wonder if that constraint might have resulted in slightly different designs for some libraries? In practice, I want existing Elm apps to run in my system without modification, so I need all ported core libraries to at least retain the same APIs.

Targeting two memory managers

The two target languages are running in two isolated memory management zones. WebAssembly doesn't have access to the browser's main garbage collector.

Unfortunately there are cases where WebAssembly code may want to hold a long-lived reference to some JS value, and vice versa. We need to make sure we don't end up with stale references to values after they've been collected.

For example if we pass a value from external JS code through a port, it will appear in Elm as a Json.Decode.Value. The Elm app could decide to store it in the model and keep it forever. But this is a general JavaScript value that could be unserialisable! We have to keep a long-lived reference to it in Wasm somehow. To do that, we push it into a dedicated JS array, and just pass the array index to Wasm. So the Wasm representation of Json.Decode.Value is just an array index that tells it where to find the value in that array. When our Wasm Garbage Collector does a collection, it also calls out to JavaScript to remove any references that are no longer needed.

Going the other direction is harder and we have to find workarounds. Most references from JS code to Wasm values are user-defined callbacks, sending a message back to the app from an effect module. Wasm functions themselves live at fixed addresses and don't get moved around by the Garbage Collector. But those functions may contain partially-applied arguments or closed-over values, which could get moved around in a major GC, so they're not safe. The current solution is to synchronously deserialise those values to JavaScript, and avoid having a JS-to-Wasm reference altogether. The values are serialised back to Wasm whenever the function is called.

What's next?

Probably the most important practical issues are usability and scalability. I'd like to make the build system general enough and usable enough for people to try out the system on their own apps. And, related to that, I'd like to come up with a more general and scalable way to deal with packages so that all apps don't have to use the same package versions! Maybe we can get some real apps running.

There's also lots of performance ideas I'd like to try out

Set up a benchmark for some more focused performance work (perhaps this one)
Port more kernel modules to C/Wasm. It looks like this could be one of the key performance drivers but there's a lot of code to port.
Finish building a VirtualDom implementation in C using cache-friendly "data-oriented design" techniques and an arena allocator
Remove the Emscripten layer and just use clang. Emscripten was handy to get going but it bloats code size a lot.
Implement some optimisations that should make function calls faster

Elm in Wasm: Custom types and extensible Records

Brian Carroll — Mon, 02 Aug 2021 15:41:40 +0000

In the last post we discussed Elm's built-in types: Int, Float, Char, String, List, and tuples. This time let's look at user-defined "custom" types and extensible Records!

This is part of a series about porting the Elm language to WebAssembly. It's a project I am doing as a hobbyist, not an official project by the core team.

Custom types

Elm Custom types are collections that the Elm programmer can define. They can have different variants, each with a "constructor" to identify it. And each constructor can contain some collection of values.

Let's take a simple example where one constructor takes no parameters and the other takes two. We'll then look at how to implement this as a byte-level data structure.

type MyCustomType
  = Ctor0
  | Ctor1 Int Char

myCtor1 = Ctor1 42 'x'

Each value of this type will need

A value header with some runtime metadata (which helps with GC and some built-in operators like ==)
Some way of identifying the associated constructor
The contained values, if any

An implementation is shown in the diagram below. (Each small box represents a 32-bit value.)

The ctor field identifies the constructor variant of the type the value is. It value needs to be unique within a given type so that we can implement pattern matching. It doesn't need to be unique within the program because the Elm compiler ensures that we can never compare or pattern-match values of different types. (Though with 32 bits, we get around 4 billion constructors. So it's easy to make them globally unique, and it can be handy for debugging).

Variants that take parameters will have an associated constructor function, generated by the compiler.

Variants that take no parameters are static constants in the program, without a constructor function. There only needs to be one instance of the value per program. For example the list [Ctor0, Ctor0, Ctor0] would just contain three pointers to the same memory address where Ctor0 is located.

Unit

The Unit type is just like a custom type with a single constructor. It has its own special symbol () in Elm source code but it's otherwise equivalent to the definition below.

type Unit = Unit

Bool

Bool follows the same structure as custom types. It has two constructors:

type Bool
    = True
    | False

True and False are constructors without any parameters, so they can be global constant values, defined once per program at a fixed memory location.

An alternative way to implement Bool would be to just use the integers 1 and 0, and that would make a lot of code more efficient. However a complication arises when we want to create something like a List Bool. Normally a Cons cell in a List contains a pointer to its head value. But now we're saying that True is not a pointer, it's just a literal number! That means the head of a Cons cell might be a pointer or it might be just an integer, it depends! We need infrastructure in the compiler and runtime to resolve that ambiguity. Almost all mature language runtimes do this, but my project is not quite that mature yet.

Extensible Records

Records are one of the most interesting parts of Elm's type system. In the following code, each function takes an extensible record type, which allows us to pass it a value of either type Rec1 or Rec2. But values are either definitely of type Rec1, or definitely of type Rec2.

type alias Rec1 = { myField : Int }
type alias Rec2 = { myField : Int, otherField : Bool }

sumMyField : List { r | myField : Int } -> Int
sumMyField recList =
    List.sum .myField recList   -- accessor .myField is an Elm function

incrementMyField : { r | myField : Int } -> r
incrementMyField r =
    { r | myField = r.myField + 1 }  -- record update expression

The basic operators that work on extensible record types are accessor functions and update expressions. In both cases we need to find the relevant field in a particular record before we can do anything with it. So there needs to be some mechanism to look up the position of a field within a record.

Field IDs as integers

In Elm source code, fields are human-readable labels for parameters of a Record. But they can be represented in more efficient ways. For example the 0.19 compiler in --optimize mode converts field names to unique shortened names in the generated JavaScript.

For a lower-level target like Wasm, we can transform field names to integer IDs rather than shortened names.

There are two ways to do this

Copy and modify the compiler code for generating short names and generate integers intead
If we're targeting an intermediate language, let it generate integer values for us. For example my C implementation emits an enum like this:

enum field {
  FIELD_init,
  FIELD_subscriptions,
  FIELD_update,
  FIELD_view,
};

All C code refers to Record fields using the enum members, which the C compiler later transforms to numbers. This approach helps with readability of the C code when debugging the compiler.

Record implementation

Let's see how we can represent records, using the following value as an example

type alias ExampleRecordType =
    { field123 : Int
    , field456 : Char
    }

myRecord : ExampleRecordType
myRecord =
    { field123 = 42
    , field456 = 'x'
    }

This can be represented by the collection of low-level structures below. For illustration, we assume the compiler has converted the field name field123 to the integer 123, and field456 to 456. Integers and pointers are 32 bits and so take up 1 size unit, Floats are 64 bits and take up 2 size units.

The FieldGroup data structure is an array of integers with a size. It is a static piece of metadata about the record type ExampleRecordType. My fork of the Elm compiler generates one instance for each record type, and populates it with the relevant integer field IDs. All records of the same type point to a single shared FieldGroup. The FieldGroup does not need a header field since it can never be confused with any other type. It only be accessed through a Record and is not garbage-collected since it's static data.

The Record itself is a collection of pointers, referencing its FieldGroup and its parameter values. The value pointers are arranged in the same order as the field IDs in the FieldGroup, so that accessor functions and update expressions can easily find the value corresponding to a particular field ID.

Field access

To implement an expression like myRecord.field123, the algorithm is as follows

Look at the fieldgroup property of myRecord
Follow the pointer to the FieldGroup instance
Search for the field ID 123, finding it at position 0
Look up the value at position 0 in myRecord itself

Accessor functions

An Elm accessor function is a special function that looks up a particular field name.

.field123 myRecord -- 42

Here, .field123 is a function that will access field123 of any record that contains it.

The simplest way to implement this is to define a kernel function that can access any Record field by ID, taking the field ID as the first parameter. Then we can just use partial application to specialise it to a specific field in the generated code.

If you're interested in more details, check out the full source, and perhaps read my previous post on Elm functions in Wasm to see how partial application is implemented.

Update expressions

Elm update expressions look like this:

updatedRecord =
    { originalRecord
          | updatedField1 = newValue1
          , updatedField2 = newValue2
    }

In Elm 0.19 this is implemented by a JavaScript function that clones the old record, and then updates each of the selected fields in the new record.

function _Utils_update(oldRecord, updatedFields) {
    var newRecord = {};
    for (var key in oldRecord) {
        newRecord[key] = oldRecord[key];
    }
    for (var key in updatedFields) {
        newRecord[key] = updatedFields[key];
    }
    return newRecord;
}

We can do something similar in C as follows:

Record* Utils_update(Record* r, int n_updates, int fields[], void* values[]) {
    Record* r_new = clone(r);
    for (int i=0; i < n_updates; ++i) {
        int field_pos = fieldgroup_search(r_new->fieldgroup, fields[i]);
        r_new->values[field_pos] = values[i];
    }
    return r_new;
}

I've left out the details of clone and fieldgroup_search but they pretty much do what you'd expect. Feel free to take a look at the full source code, which includes tests that mimic generated code from the compiler.

Records in similar languages

OCaml has records, but not extensible record types. That means a given field always refers to the same position in a record type, so there's no need to search for it at runtime. All field names can safely be transformed into position offsets at compile time.

Haskell has extensible records, and the original paper on them is here. The focus is very much on trying to make the record system backwards-compatible with Haskell's pre-existing types, which were all positional rather than named. Unfortunately this means that most of their design decisions were driven by a constraint that Elm just doesn't have, so I didn't find it directly useful.

However the FieldGroup concept is very much inspired by the InfoTable that is generated for every type in a Haskell program.

Elm in Wasm: Built-in typeclasses

Brian Carroll — Mon, 02 Aug 2021 11:51:39 +0000

In my last post, I proposed some ideas for how Elm's first-class functions could work in WebAssembly.

This time, let's start looking at some of the other value types in Elm. What do the most fundamental value types look like?

This is part of a series about porting the Elm language to WebAssembly. It's a project I am doing as a hobbyist, not an official project by the core team.

Comparables, Appendables and Numbers

Let's start with the fundamentals: Int, Float, Char, String, List and Tuple. It's fairly straightforward to design binary representations for these, but there are also some subtleties!

The trickiest aspect of these types in Elm is that they are all members of constrained type variables. This is the mechanism that allows some functions like ++, + and >, to work on more than one, but not all value types.

The table below lists the four constrained type variables, and which functions from the core libraries use them.

Type variable	Core library functions
`appendable`	`++`
`number`	`+`, `-`, `*`, `/`, `^`, `negate`, `abs`, `clamp`
`comparable`	`compare`, `<`, `>`, `<=`, `>=`, `max`, `min`, `Dict.`, `Set.`
`compappend`	(Internal compiler use only)

Here's a breakdown of which types belong to which type variables

	number	comparable	appendable	compappend
`Int`	✓	✓
`Float`	✓	✓
`Char`		✓
`String`		✓	✓	✓
`List a`		✓*	✓	✓*
`(a, b)`		✓*
`(a, b, c)`		✓*

* Lists and Tuples are only comparable only if their contents are comparable

Low-level functions that operate on these type variables need to be able to look at an Elm value and decide which concrete type it is. For example the compare function (which is the basis for <, >, <=, and >=) can accept five different types, and needs to run different low-level code for each.

There's no syntax to do that in Elm code - it's deliberately restricted to Kernel code. Let's look at the JavaScript implementation, and then think about how a WebAssembly version might work. We'll focus on comparable, since it covers the most types.

Comparable values in JavaScript

Well Elm is open source, so we can just take a peek at the Kernel code for compare to see how it's done. For the purposes of this article, we only care about how it tells the difference between different Elm types, so I've commented out everything else below.

function _Utils_cmp(x, y, ord) // x and y will always have the same Elm type in a compiled program
{
    if (typeof x !== 'object')
    {
        // Elm Int, Float or String. Compare using `===` and `<`
    }

    if (x instanceof String)
    {
        // Elm Char. Take x.valueOf() and y.valueOf(), then compare.
    }

    if (x.$[0] === '#')
    {
        // Elm Tuples ('#2' or '#3'). Recursively compare contents.
    }

    //  ... Elm List (the only remaining comparable type). Recursively compare elements.
}

Elm's Int, Float and String values correspond directly to JavaScript primitives and can be identified using JavaScript's typeof operator. This is not something we'll have available in WebAssembly, so we'll have to find another way to get the same kind of information.

The other Elm types are all represented as different object types. Char values are represented as String objects and can be identified using the instanceof operator. Again, instanceof is not available in WebAssembly, and we need something else.

In the next part of the function we get a clue that when Elm values are represented as JS objects, they normally have a $ property. This is set to different values for different types. It's #2 or #3 for Tuples, [] or :: for Lists, and can take on various other values for custom types and records. In --optimize mode it becomes a number.

Now this is something we can do in WebAssembly. The $ property is just an extra piece of data that's bundled along with the value itself. We can add a "header" of extra bytes in front of the runtime representation of every value to carry the type information we need.

Value Headers

Many languages add a "header" to their value representations to carry metadata that's only for the runtime, not for the application developer. We can use that technique to distinguish the different types. All Elm types can be covered with only 11 tags, which only requires 4 bits.

It's also helpful to add a size parameter to the header, indicating the size in memory of the value in a way that is independent of its type. This is useful for memory operations like cloning and garbage collection, as well as for testing equality of strings, custom type values, and records.

In my project I've chosen the following bit assignments for the header. They add up to 32 bits in total, which is convenient for memory layout.

	Bits	Description
Tag	4	Elm value type. See enum definition above
Size	28	Payload size in 32-bit words. Maximum is 2²⁸-1 units = (2²⁸-1) * 4 bytes = 1GB

The following C code represents the header.

typedef enum {
    Tag_Int,
    Tag_Float,
    Tag_Char,
    Tag_String,
    Tag_List,
    Tag_Tuple2,
    Tag_Tuple3,
    Tag_Custom,
    Tag_Record,
    Tag_Closure,
} Tag;

typedef struct {
  u32 size : 28;  // payload size in integers (28 bits => <1GB)
  Tag tag : 4;    // runtime type tag (4 bits)
} Header;

Comparable values in WebAssembly

Using these representations, we can distinguish between any of the values that are members of comparable, appendable, or number.

For example, to add two Elm numbervalues, the algorithm would be:

If tag is Float (1)
- Do floating-point addition
else
- Do integer addition

We need this information because in WebAssembly, integer and floating-point addition are different instructions. We're not allowed to be ambiguous about it like in JavaScript.

Functions operating on appendable values can use similar techniques to distinguish String (7) from List (0 or 1) and execute different code branches for each.

Structural sharing

To have efficient immutable data structures, it's important that we do as much structural sharing as possible. The above implementations of List and Tuple allow for that by using pointers. For example when we copy a List, we'll just do a "shallow" copy, without recursively following pointers. The pointer is copied literally, so we get a second pointer to the same value.

Summary

I've outlined some possible byte-level representations for the most basic Elm data types. We haven't discussed Custom types or Records yet. That's for the next post!

We discussed some of the challenges presented by Elm's "constrained type variables" comparable, appendable, and number needing some type information at runtime. We came up with a way of dealing with this using "boxed" values with headers.

If you like you can check out some of my GitHub repos

A fork of the Elm compiler that generates Wasm (from my Elm AST test data, not from real apps!)
Some of the Elm kernel libraries in C, compiled to Wasm.

Custom types and extensible Records

WebAssembly compiler update

Brian Carroll — Sat, 02 Jan 2021 15:31:17 +0000

This is a repost of a Discourse post from June 2020

I've posted a few times before about my project to compile Elm to WebAssembly.

The project consists of two GitHub repos, one for the compiler and one for the core libraries.

None of this is official. I'm not part of the core team and, as far as I know, they have no plans to move to WebAssembly any time soon. This is a hobby project driven by my own curiosity.

Summary of previous posts

In this post, I described the custom Garbage Collector, and the parts of the core libraries I'd ported to Wasm.

In my last post I described the system architecture. The Elm runtime remains in JavaScript because WebAssembly doesn't have Web APIs yet. The Wasm app talks to the runtime through a JS wrapper.

I also showed a demo of a very basic working app. It was "hand compiled" as I didn't have a working compiler yet.

Latest news

My latest demo is actually fully compiled code!

It's a WebAssembly port of Evan's TodoMVC example from a few years ago (here's his original repo)

Compiler changes

The forked compiler accepts --output elm.c as a command-line option as well as --output elm.js and --output elm.html. Once I have the C file, I use Emscripten to further compile it to WebAssembly. There are a few build steps that I coordinate using GNU make.

I ran into a few challenges with type information. The compiler has several different stages. I only worked on the last stage, code generation, to limit the scope. But all type information has been dropped from the AST by then, and that created some challenges.

Currently it's unsafe to use a Float parameter in an app-level Msg type. I have no way to tell Int from Float when passing messages from the JS runtime to the Wasm app.
The Time module doesn't work because it uses Int for timestamps. Realistic values require at least 42 bits but I'm using 32 bits. Some low level details work out nicely that way, because Wasm pointers are 32 bits. And the Json and Bitwise libraries rrequire 32-bit integers as well.
I need to be able to distinguish custom types from tuples and lists. I'm using runtime type detection, but I'd prefer not to.

More detail here: https://github.com/brian-carroll/elm-compiler#architecture-challenges

Development Status

So is that it? Is it all working? Can I use it in production right now? Is it really fast? OMG!

Nope! Sorry!

I'm still working through lots of implementation issues. For example I have not yet managed to get Richard Feldman's elm-spa-example working. It's a great test-case because it's complex enough that if I have any bugs, it's bound to show them up!

I haven't done any performance work yet. Before I can focus on that, I need to debug it and sort out some issues with the architecture (see "current focus" below).

Current focus

A lot of the work I'm currently doing is on the JS/Wasm interface. Since I have the runtime in JS and the app in Wasm, the interface between the two is a major focus.

Two of the topics I'm thinking about:

Some of the objects passed from the JS runtime to the app are unserialisable. For example, DOM events are not serialisable because they contain cyclical references. It's all to do with how the Json library is implemented. I have something that works most of the time! But I'm working on something more reliable.

Currently the app's Model is stored in JS but the update function is in Wasm. That means the model has to get passed from JS to Wasm and back again on every update cycle, getting serialised and deserialised along the way. The only reason it works this way is that it was quicker to get up and running, because I didn't need to change anything in the JS runtime.

String encoding

The original post suggesting this project specifically mentions string encoding, and UTF-8 in particular. And there was some discussion of this in my last post. I suggested that UTF-16 might have advantages, due to better compatibility with JS and most of the browser APIs.

I did some benchmarking on both encodings, to get an idea of the performance implications.

There's not much performance difference. Based on the results, I initially wanted to go with UTF-8. But then I realised that every time I pick an app to test the compiler on, I would also have to migrate its Elm code to using a new String library as well. Otherwise things like URL parsing might break, and who knows what else? It just makes things too complicated. So I'm sticking with UTF-16 for this project. UTF-8 is a separate project.

Asynchronous initialisation

WebAssembly modules are normally compiled asynchronously once loaded into the browser. We have to wait until the compilation is finished before we can call Elm.Main.init.

I created a new function Elm.onReady to help with this. You just put your app's normal setup code in a callback, and Elm.onReady will execute it at the right time.

For my WebAssembly version of the TodoMVC example, it looks like this:

<script type="text/javascript">
  Elm.onReady(function () {
    var storedState = localStorage.getItem('elm-todo-save');
    var startingState = storedState ? JSON.parse(storedState) : null;
    var app = Elm.Main.init({ flags: startingState });
    app.ports.setStorage.subscribe(function (state) {
      localStorage.setItem('elm-todo-save', JSON.stringify(state));
    });
  });
</script>

Summary

We can now compile some Elm apps to WebAssembly, including the TodoMVC demo

There are some architecture issues to work out, there's no performance work done yet, and there's lots of kernel code unwritten.

Wasm enables UTF-8 but it's a separate project

There are some changes in the setup API due to async compilation

Elm functions in WebAssembly

Brian Carroll — Sat, 28 Jul 2018 20:15:18 +0000

I’ve been pretty fascinated for the past few months with trying to understand how the Elm compiler might be able to target WebAssembly in the future. What are the major differences from generating JavaScript? What are the hard parts, what approaches would make sense?

I think one of the most interesting questions is: how do you implement first-class functions in WebAssembly? JavaScript has them built in, but WebAssembly doesn’t. Treating functions as values is a pretty high level of abstraction, and WebAssembly is a very low-level language.

Contents
Elm and WebAssembly
Elm’s first-class functions
Key WebAssembly concepts
Representing closures as bytes
Function application
Lexical closure
Code generation
Summary
What’s next?
References

Elm and WebAssembly

Before we get started, I just want to mention that from what I’ve heard from the core team, there is a general expectation that Elm will compile to WebAssembly some day, but currently no concrete plan. WebAssembly is still an MVP and won’t really be ready for Elm until it has garbage collection, and probably also access to the DOM and other Web APIs. The GC extension is still in "feature proposal" stage so it'll be quite a while before it's available.

But... it will get released at some point, and WebAssembly is one of the suggested research projects for community members, and well, it's just really interesting! So let’s have a think about what Elm in WebAssembly could look like!

Now... how do you go about implementing first-class functions in a low-level language like WebAssembly? WebAssembly is all just low-level machine instructions, and machine instructions aren’t something you can "pass around"! And what about partial function application? And isn’t there something about "closing over" values from outside the function scope?

Let’s break this down.

Elm’s first-class functions

Let's start by looking at some example Elm code, then list all the features of Elm functions that we’ll need to implement.

module ElmFunctionsDemo exposing (..)

outerFunc : Int -> (Int -> Int -> Int)
outerFunc closedOver =
    let
        innerFunc arg1 arg2 =
            closedOver + arg1 + arg2
    in
        innerFunc

myClosure : Int -> Int -> Int
myClosure =
    outerFunc 1

curried : Int -> Int
curried =
    myClosure 2

higherOrder : (Int -> Int) -> Int -> Int
higherOrder function value =
    function value

answer : Int
answer =
    higherOrder curried 3

In case you're wondering, the answer is 1+2+3=6. This is definitely not the simplest way to write this calculation, but it does illustrate all the most important features of Elm functions!

Three key features

Firstly, Elm functions are first-class, meaning they are values that can be returned from other functions (like outerFunc) and passed into other functions (like higherOrder).

Secondly, they support lexical closure. innerFunc "captures" a value from it's parent's scope, called closedOver. This means that myClosure "remembers" the value of closedOver that it was created with, which in this case is 1.

Finally, Elm functions support partial application. myClosure is a function that takes two arguments, but the body of curried, we only apply one argument to it. As a result, we get a new function that is waiting for one more argument before it can actually run. This new function "remembers" the value that was partially applied, as well as the closed-over value.

Clues in the code

Note that we now have several Elm functions that will all will end up executing the same line of code when they actually get executed! That's this expression:

closedOver + arg1 + arg2

If somebody calls curried with one more argument, this is the expression that will calculate the return value. Same thing if somebody calls myClosure with two arguments.

This gives us a clue how to start implementing this. All of the function values we’re passing around will need to have a reference to the same WebAssembly function, which evaluates the body expression.

In WebAssembly, we can’t pass functions around, only data. But maybe we can create a data structure that represents an Elm function value, keeping track of the curried arguments and closed-over values. When we finally have all the arguments and we’re ready to evaluate the body expression, we can execute a WebAssembly function to produce a return value.

There are still lots of details missing at this stage. In order to fill in the gaps, we’re going to need a bit of background knowledge on some of WebAssembly’s language features.

Key WebAssembly concepts

Linear memory

WebAssembly modules have access to a block of "linear memory" that they can use to store and load data. It’s a linear array of bytes, indexed by a 32-bit integer. WebAssembly has built-in instructions to store and load integers and floats, but anything more complex has to be built up from raw bytes.

The fact that everything is built up from raw bytes means that WebAssembly can be a compile target for lots of different languages. Different data structures will make sense for different languages, but they’re all just bytes in the end. It’s up to each compiler and runtime to define how those bytes are manipulated.

Tables

WebAssembly has a feature called "tables" which it uses to implement "indirect calls". Indirect calls are a feature of almost every high-level language, but what are they?

When a machine executes a function call, it obviously needs some reference to know which function to invoke. In a direct call, that function reference is simply hardcoded, so it invokes the same function every time. In an indirect call, however, the function reference is provided by a runtime value instead. This is a very handy thing to be able to do, because it means the caller doesn’t need to know in advance the full list of functions it might have to call. Because of this, most languages have some version of this. C and C++ have function pointers, Java has class-based polymorphism, and Elm has first-class functions.

A WebAssembly table is an array of functions, each indexed by a 32-bit integer. There’s a special call_indirect instruction that takes the index of the function to be called, with a list of arguments, and executes it. The program statically declares which functions are elements of the table, and call_indirect only works on those functions. (Incidentally, there’s also a call instruction for direct calls, but we won’t be focusing on that too much for now.)

By the way, WebAssembly has this design for safety reasons. If functions were stored in linear memory, it would be possible for code to inspect or corrupt other code, which is not good for web security. But with an indexed function table, that’s impossible. The only instruction that can even access the table is call_indirect, which is safe.

If you’re interested in some further reading, I recommend Mozilla’s article on Understanding the Text Format, and the design document on WebAssembly Semantics.

But for now, we already have enough knowledge to discuss how to implement first-class functions.

Representing closures as bytes

As mentioned earlier, to represent an Elm function in WebAssembly we’ll need a function and a data structure. We’ll use the term "closure" to refer to the data structure, and "evaluator function" to refer to the WebAssembly function that will evaluate the body expression and produce a return value.

One way of representing a closure in binary is the following, where each box represents an integer (4 bytes).

`fn_index`	`arity`	`mem_ptr0`	`mem_ptr1`	`mem_ptr2`	...

fn_index is an integer index into the function table where the evaluator function for this closure can be found. At runtime, once all of the arguments have been applied to the closure, we can invoke the call_indirect instruction to look up the table, call the evaluator function, and return a result.

arity is the remaining number of parameters to be applied to the closure. Every time we apply another argument, we insert a pointer to that argument, and decrement the arity. When it reaches zero, we’re ready to call the evaluator function.

mem_ptr* are pointers representing the addresses in linear memory of the arguments and closed-over values. They all start off "empty" (zero), and are filled in reverse order as arguments are applied. So if the closure has an arity of 2, then mem_ptr0 and mem_ptr1 will be "empty". When we apply the next argument, the mem_ptr1 will be filled with the address of the argument value, and arity will be decremented from 2 to 1, with mem_ptr0 still being empty.

Function application

We’ve already mentioned some of the things that need to happen when a closure is applied to some arguments, but here's the algorithm in full:

Make a new copy of the closure
For each applied argument
- Let a be the remaining arity of the closure
- Write the address of the argument into the mem_ptr at position a-1
- Decrement the arity a
If remaining arity is greater than 0
- return the new closure
else
- Use call_indirect to execute the function referenced by func_index, passing the closure as its argument

Let's work through an example, applying two arguments to a closure of arity 2.

Here's what the data structure looks like before we apply any arguments. All of the pointers are set to zero (the null pointer).

`fn_index`	`arity`	`mem_ptr0`	`mem_ptr1`
`123`	`2`	`null`	`null`

Before applying the closure, we need to create a new copy of it, so that the old closure is still available for other code to use. All Elm values are immutable, and the closure is no exception.

Now let's apply an argument, arg0. Our algorithm says that for arity 2, we should put the argument address into the mem_ptr at position 2-1=1. In other words, mem_ptr1. Let's see what that looks like.

`fn_index`	`arity`	`mem_ptr0`	`mem_ptr1`
`123`	`1`	`null`	`arg0`

Notice that we're filling the argument pointers in reverse. This is just an efficiency trick. If we filled them in ascending order, we'd need to know how many had already been applied so that we could skip over them and put the next argument in the next free position. That information would have to be stored as an extra field in the closure, taking up extra space.

But if we fill the arguments in reverse, we only need to know the current arity. If the current arity is 2 then the first two positions are free, regardless of whether this is a simple two-parameter function, or a five-parameter function that has already had 3 other arguments applied.

Let's apply one more argument, arg1. As before, we'll put the address of the argument into the highest available mem_ptr, which is mem_ptr0, and decrement the arity.

`fn_index`	`arity`	`mem_ptr0`	`mem_ptr1`
`123`	`0`	`arg1`	`arg0`

Having applied all of the arguments we've got, we check the remaining arity. If it's non-zero, this must be a partial application, and we can just return the closure. But if it’s zero, that means all arguments have been applied. In that case, it's time to call the evaluator function, and return the value it gives us.

Note that the evaluator function takes the closure structure as its only argument. It contains all of the necessary data, because that’s exactly what it was designed for!

Lexical closure

Let’s look again at our example of closing over values from an outer scope.

outerFunc : Int -> (Int -> Int -> Int)
outerFunc closedOver =
    let
        innerFunc arg1 arg2 =
            closedOver + arg1 + arg2
    in
        innerFunc

To help us think about how to generate WebAssembly for innerFunc, let’s first refactor the source code to the equivalent version below.

outerFunc : Int -> (Int -> Int -> Int)
outerFunc closedOver =
    let
        -- Replace inner function definition with partial application
        innerFunc =
            transformedInnerFunc closedOver
    in
        innerFunc


-- Move definition to top level, inserting a new first argument
transformedInnerFunc closedOver arg1 arg2 =
    closedOver + arg1 + arg2

Here we’ve moved the definition of the inner function to the top level, and inserted closedOver as a new first argument, instead of actually closing over it. This doesn’t make any difference to anyone who calls outerFunc - it still creates an innerFunc that remembers the value of closedOver it was created with.

The big win here is that we no longer have nested function definitions. Instead, they’re all defined at top level. This is useful because we need to put all of our evaluator functions into one global WebAssembly function table. Remember, the table is WebAssembly’s way of supporting indirect function calls. So we’ll need the compiler to do this transformation on all nested function definitions.

Code generation

We’re now ready to look at the steps the compiler needs to take to generate code for an Elm function.

Generate the body expression, keeping track of all of the local names referenced in the body (we can ignore top-level names).
From the set of local names, remove the argument names and any names defined let subexpressions. Only the closed-over names will remain.
Prepend the list of the closed-over names to the list of function arguments, to get the argument list for the evaluator function.
Generate the evaluator function
Declare the evaluator function as an element of the function table
Insert code into the parent scope that does the following
- Create a new closure structure in memory
- Partially apply the closed-over values from the parent scope

Summary

At the top of the post, we started by noting that one of the interesting challenges in compiling Elm to WebAssembly is how to implement first-class functions.

Elm functions have a lot of advanced features that are not directly available in WebAssembly. They behave like values, they can be partially applied, and they can capture values from outer scopes.

Although WebAssembly doesn’t have these features natively, it does provide the foundations to build them. WebAssembly supports indirect function calls using a function table, allowing us to pass around references to WebAssembly functions in the form of a table index.

We can represent an Elm function using a WebAssembly function and a data structure. We saw what the byte level representation of the data structure could look like. The data structure is what gets passed around the program, keeping track of partially-applied arguments and closed-over values. It also contains the table index of the evaluator function, which is what will eventually produce a return value.

We discussed a way to implement lexical closure. It involves automatically transforming Elm code, flattening nested function definitions so that they can be inserted into the WebAssembly function table. This transformation turns lexical closure into partial function application.

Finally we outlined some of the steps the compiler’s code generator needs to take, and looked at the runtime algorithm for function application.

What’s next?

I’m working on a prototype code generator to prove out these ideas. I’m making reasonable progress, and there don’t appear to be any major blockers, but it needs some more work to get it working. I’ll probably share something more if/when I get that far!

I’ve also got some ideas for more blog posts around the topic of Elm in WebAssembly:

Byte-level representations of the other Elm data structures (Extensible records, union types, numbers, comparables, appendables...)
Code generation architecture (WebAssembly AST, Is it reasonable to generate Wasm from Haskell? What about Rust?)
The Elm runtime in WebAssembly (Platform, Scheduler, Task, Process, Effect Managers...)
DOM, HTTP, and ports. Differences between Wasm MVP and post-MVP.
Strings and Unicode
Tail-Call Elimination with trampolines

Let me know in the comments if you’d like to see any of these!

References

Haskell's implementation of closures
Wikipedia Closure article
Wikipedia Lambda Lifting article

Thanks for reading!

Forem: Brian Carroll

Porting Elm to WebAssembly

Robustness

Performance

Still canned demos only!

How does it work?

Targeting two languages

Targeting two memory managers

What's next?

Elm in Wasm: Custom types and extensible Records

Custom types

Unit

Bool

Extensible Records

Field IDs as integers

Record implementation

Field access

Accessor functions

Update expressions

Records in similar languages

Elm in Wasm: Built-in typeclasses

Comparables, Appendables and Numbers

Comparable values in JavaScript

Value Headers

Comparable values in WebAssembly

Structural sharing

Summary

Next Post

WebAssembly compiler update

Summary of previous posts

Latest news

Compiler changes

Development Status

Current focus

String encoding

Asynchronous initialisation

Summary

Elm functions in WebAssembly

Elm and WebAssembly

Elm’s first-class functions

Three key features

Clues in the code

Key WebAssembly concepts

Linear memory

Tables

Representing closures as bytes

Function application

Lexical closure

Code generation

Summary

What’s next?

References