Dwayne Crooks

Posted on May 16

Diary of an Elm Developer - Lazy L-System generation

#elm #performance

L-System Studio is an environment for creating and exploring L-Systems. It was built with Elm by Hunor Geréd. I learnt about the project when he decided to submit it to Built with Elm back in April 2023. It's a fun application for exploring L-Systems but it can be quite resource intensive. My browser has even crashed on several occasions. Definitely not a good look for Elm. Could the performance be improved whilst using Elm or do we have to resort to mutable generation in JavaScript? That was the nagging question.

What's the problem?

The first thing that caught my attention was the huge number of SVG nodes that were being generated in the DOM. Those SVG nodes are created by looping over the array of characters returned by LSys.generateSequence. It's not one to one but the number of SVG nodes that are added to the DOM can be very close to the number of characters in the array. And that array gets really large really quickly.

> import LSys
> import Array

> LSys.generateSequence 4 "F+F+F+F" [('F', String.toList "F+F-F-FF+F+F-F")] |> Array.length
30427 : Int

> LSys.generateSequence 5 "F+F+F+F" [('F', String.toList "F+F-F-FF+F+F-F")] |> Array.length
243419 : Int

> LSys.generateSequence 6 "F+F+F+F" [('F', String.toList "F+F-F-FF+F+F-F")] |> Array.length
1947355 : Int

> LSys.generateSequence 7 "F+F+F+F" [('F', String.toList "F+F-F-FF+F+F-F")] |> Array.length

<--- Last few GCs --->

[17075:0x2724f250]    11351 ms: Scavenge 2022.8 (2062.0) -> 2022.6 (2073.0) MB, 3.52 / 0.00 ms  (average mu = 0.173, current mu = 0.111) allocation failure; 
[17075:0x2724f250]    11359 ms: Scavenge 2029.6 (2073.0) -> 2030.4 (2073.7) MB, 5.33 / 0.00 ms  (average mu = 0.173, current mu = 0.111) allocation failure; 
[17075:0x2724f250]    11668 ms: Scavenge 2030.4 (2073.7) -> 2029.6 (2096.7) MB, 309.40 / 0.00 ms  (average mu = 0.173, current mu = 0.111) allocation failure; 


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xab10e4 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 2: 0xe6de60 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 3: 0xe6e244 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 4: 0x109d907  [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 5: 0x10b6479 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 6: 0x108f0e7 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 7: 0x108fd24 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 8: 0x106f04e v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
 9: 0x14d8d70 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/nix/store/dcdc33kdjdhjnzg6rkmd0cx4kpwl8cac-nodejs-20.17.0/bin/node]
10: 0x719fe2699ef6

N.B. I used my fork of the repository which I shared as an example in GitHub Actions, Devbox, and Elm.

I can't even check the length of the array after 7 iterations because the algorithm used for generation is using up too much memory.

How does `LSys.generateSequence` work?

generateSequence : Float -> String -> List ( Char, List Char ) -> Array Char
generateSequence iterations axiom rules =
    let
        expandSymbol symbol =
            case Extra.find (\( s, _ ) -> s == symbol) rules of
                Just ( _, replacement ) ->
                    Array.fromList replacement

                Nothing ->
                    Array.fromList [ symbol ]
    in
    List.foldl
        (\_ seq ->
            Array.map expandSymbol seq
                |> Array.toList
                |> List.map Array.toList
                |> List.concat
                |> Array.fromList
        )
        (String.toList axiom |> Array.fromList)
        (List.range 1 (round iterations))

iterations - The number of iterations
axiom - The initial string
rules - The rewrite rules

Recall List.foldl : (a -> b -> b) -> b -> List a -> b.

For each character in the array we apply expandSymbol to it. expandSymbol attempts to find a rewrite string (represented as a list of characters) for a given character. A new array of lists is built which is eventually transformed back into an array of characters. A lot of intermediate lists and arrays are built up along the way. This is done round iterations number of times over arrays of increasing length.

expandSymbol searches an association list multiple times for the same symbol and each time it creates a new array from the replacement list of characters it finds.

It seems then that a poor choice of data structures and algorithms has led to performance and memory issues.

Thoughts after the investigation

Why is iterations even a Float?
rules would be easier to enter via elm repl if it had the type List ( Char, String ).
rules should be processed into a data structure that has good lookup performance.
The replacements returned by a rule lookup should be created exactly once.
An array or a list is not the right data structure for storing the characters generated by a given L-System since when stored in memory it will take up too much space. We need to be able to represent the whole value that could be generated but only generate what's needed at any given point in time. Could lazy evaluation work here?
Maybe I could repurpose Logic.Stream from my The Reasoned Schemer (2nd Edition) project to use as the data structure for the in-memory representation of the entire L-System.

The new API

This is the new API I came up with:

module LSystem exposing (generate)

generate : Int -> List ( Char, String ) -> String -> Sequence Char

module Rules exposing
    ( Rules

    -- Create
    , fromList

    -- Query
    , lookup
    )

type Rules

fromList : List ( Char, String ) -> Rules

lookup : Char -> Rules -> Sequence Char

module Sequence exposing
    ( Sequence

    -- Create
    , singleton, fromString

    -- Combine
    , concat, concatMap

    -- Query
    , length

    -- Convert
    , uncons, toList
    )

type Sequence a

singleton : a -> Sequence a
fromString : String -> Sequence Char

concat : Sequence a -> Sequence a -> Sequence a
concatMap : (a -> Sequence b) -> Sequence a -> Sequence b

length : Sequence a -> Int

uncons : Sequence a -> Maybe ( a, Sequence a )
toList : Sequence a -> List a

`LSystem`

module LSystem exposing (generate)

import Rules exposing (Rules)
import Sequence exposing (Sequence)


generate : Int -> List ( Char, String ) -> String -> Sequence Char
generate n rules axiom =
    generateHelper n (Rules.fromList rules) (Sequence.fromString axiom)


generateHelper : Int -> Rules -> Sequence Char -> Sequence Char
generateHelper n rules current =
    if n <= 0 then
        current

    else
        let
            next =
                Sequence.concatMap (flip Rules.lookup rules) current
        in
        generateHelper (n - 1) rules next


flip : (a -> b -> c) -> b -> a -> c
flip f b a =
    f a b

The number of iterations, n, is represented as an Int.

rules input has changed to List ( Char, String ) which should make it easier to enter on the elm repl or for tests but that's not the way rules are used internally. They are processed into a data structure, Rules.fromList rules, that has better lookup performance than an association list.

The looping is done using generateHelper which is implemented tail-recursively to ensure we don't blow the stack. Though that may not have been necessary since n doesn't need to be too large. For e.g. 50 iterations is probably over doing it. Either way, the implementation can handle millions of iterations now.

The workhorse of the algorithm is Sequence.concatMap. It is used to transform the sequence from one iteration to the next.

`Rules`

module Rules exposing (Rules, fromList, lookup)

import Dict exposing (Dict)
import Sequence exposing (Sequence)


type Rules
    = Rules (Dict Char (Sequence Char))


fromList : List ( Char, String ) -> Rules
fromList =
    Rules << Dict.fromList << List.map (Tuple.mapSecond Sequence.fromString)


lookup : Char -> Rules -> Sequence Char
lookup ch (Rules table) =
    Dict.get ch table
        |> Maybe.withDefault (Sequence.singleton ch)

Rules is an opaque type that is implemented as a dictionary. The keys are characters and the values are preprocessed, List.map (Tuple.mapSecond Sequence.fromString), into sequences that get reused each time a key is looked up.

lookup finds the expansion for a given character.

N.B. The only issue I have with lookup is that for missing keys it creates a new singleton sequence each time. Any ideas to avoid doing that would be appreciated.

`Sequence`

module Sequence exposing
    ( Sequence
    , singleton, fromString
    , concat, concatMap
    , length
    , uncons, toList
    )


type Sequence a
    = Empty
    | Cons a (Sequence a)
    | Thunk (() -> Sequence a)

A sequence is a combination of a list and a lazy list. An idea I stole from The Reasoned Schemer (2nd Edition).

If you remove the Thunk constructor then you have a list.

type List a
    = Empty
    | Cons a (List a)

If you combine the Cons and Thunk constructors then you have a lazy list.

type LazyList a
    = Empty
    | Cons a (() -> LazyList a)

Create

singleton : a -> Sequence a
singleton x =
    Cons x Empty


fromString : String -> Sequence Char
fromString s =
    Thunk (\_ -> String.foldr Cons Empty s)

Combine

concat : Sequence a -> Sequence a -> Sequence a
concat a b =
    case a of
        Empty ->
            b

        Cons head tail ->
            Cons head (concat tail b)

        Thunk t ->
            Thunk (\_ -> concat (t ()) b)


concatMap : (a -> Sequence b) -> Sequence a -> Sequence b
concatMap f s =
    case s of
        Empty ->
            Empty

        Cons head tail ->
            concat
                (f head)
                (concatMap f tail)

        Thunk t ->
            Thunk (\_ -> concatMap f (t ()))

The major difference between this implementation and Logic.Stream is that Sequence.concat which corresponds to Logic.Stream.append doesn't interleave between sequences.

Query

length : Sequence a -> Int
length =
    lengthHelper 0


lengthHelper : Int -> Sequence a -> Int
lengthHelper n s =
    case s of
        Empty ->
            n

        Cons _ tail ->
            lengthHelper (n + 1) tail

        Thunk t ->
            lengthHelper n (t ())

Sequences can get very long after relatively few iterations, much longer than the value an Int can store. So this length is really only useful for debugging purposes and it has to be implemented tail-recursively to ensure it gets optimized into a loop. Otherwise, it won't even be helpful for debugging.

Convert

uncons : Sequence a -> Maybe ( a, Sequence a )
uncons s =
    case s of
        Empty ->
            Nothing

        Cons head tail ->
            Just ( head, tail )

        Thunk t ->
            uncons (t ())


toList : Sequence a -> List a
toList s =
    case s of
        Empty ->
            []

        Cons head tail ->
            head :: toList tail

        Thunk t ->
            toList (t ())

uncons allows you to process the sequence one character at time. It's not used for generation but my plan is to use it for rendering.

toList is only useful for debugging purposes on small sequences. I didn't even bother to make it tail-recursive.

Results

> import LSystem
> import Sequence

> Sequence.length <| LSystem.generate 6 [('F', "F+F-F-FF+F+F-F")] "F+F+F+F"
1947355 : Int

> Sequence.length <| LSystem.generate 7 [('F', "F+F-F-FF+F+F-F")] "F+F+F+F"
15578843 : Int

> Sequence.length <| LSystem.generate 8 [('F', "F+F-F-FF+F+F-F")] "F+F+F+F"
124630747 : Int

> Sequence.length <| LSystem.generate 9 [('F', "F+F-F-FF+F+F-F")] "F+F+F+F"
997045979 : Int

> Sequence.length <| LSystem.generate 10 [('F', "F+F-F-FF+F+F-F")] "F+F+F+F"
7976367835 : Int

> Sequence.uncons <| LSystem.generate 100 [('F', "F+F-F-FF+F+F-F")] "F+F+F+F"
Just ('F',Cons '+' (Cons 'F' (Cons '-' (Cons 'F' (Cons '-' (Cons 'F' (Cons 'F' (Cons '+' (Cons 'F' (Cons '+' (Cons 'F' (Cons '-' (Cons 'F' (Cons '+' (Thunk <function>)))))))))))))))

Iteration 10 of the L-System we started off with has over 7 billion characters. 🤯

We can now generate and store the result of 100 iterations of the L-System and process it one character at time. 🎉

How are we going to render the L-System if we can't store it in a list or put billions of SVG nodes in the browser? These are questions best left for another day.

Interesting Related Resources

Why Functional Programming Matters by John Hughes at Functional Conf 2016
- Word-at-a-time programming vs whole value programming
  - One Word at a Time
- Lazy evaluation implied whole value programming was possible
  - John Hughes was interested in separating the decision of how many things to compute from the code you have to compute
  - Lazy Evaluation
Recursive Programming Techniques
- Section 3.10 - Sequences, Coroutines, and Streams

Subscribe to my newsletter

If you're interested in improving your skills with Elm then I invite you to subscribe to my newsletter, Elm with Dwayne. To learn more about it, please read this announcement.