Diary of an Elm Developer - Improving Rules.lookup

#elm

In the previous post, I mentioned that:

The only issue I have with lookup is that for missing keys it creates a new singleton sequence each time.

Jeroen also pointed out, and rightly so, that:

It's even worse than that: you always create this singleton, because Maybe.withDefault is eager. The fix is simple for this part: use a case expression instead.

lookup : Char -> Rules -> Sequence Char
lookup ch (Rules table) =
    case Dict.get ch table of
        Just replacement ->
            replacement

        Nothing ->
            Sequence.singleton ch

But now we're back to square one and I still don't have a fix for my original issue with lookup.

What are my options?

I considered two options:

Have lookup return a Maybe (Sequence Char) and deal with it downstream.
Have lookup return (Sequence Char, Rules) so that we can update the rules with the missing replacement such that the next time we'd find it in rules and return it.

But there were problems with both approaches.

With the first approach, it just puts off the inevitable, i.e. we'd still be recreating the singleton every time just somewhere else downstream.

With the second approach, we'd have to keep track of the changes to rules within Sequence.concatMap. It could kinda work but I didn't like the API changes that needed to occur for it to happen so I abandoned this idea.

After taking a break and a nice long walk another option revealed itself. Why not determine all the characters that would be looked up ahead of time? L-system generation uses both the rules and axiom. Every character that appears in the final generated sequence comes from one of these values.

Enter `Rules.build`

Rules.build replaces Rules.fromList. It takes both the rules and axiom and builds a dictionary that contains a key for every possible character that could occur in the final generated sequence.

import Dict exposing (Dict)
import Sequence exposing (Sequence)
import Set


build : List ( Char, String ) -> String -> Rules
build rules axiom =
    let
        identity =
            buildIdentity rules axiom
    in
    rules
        |> List.map (Tuple.mapSecond Sequence.fromString)
        |> Dict.fromList
        |> Dict.union identity
        |> Rules


buildIdentity : List ( Char, String ) -> String -> Dict Char (Sequence Char)
buildIdentity rules axiom =
    let
        axiomSet =
            axiom
                |> String.toList
                |> Set.fromList

        ( keySet, valueSet ) =
            rules
                |> List.foldl
                    (\( ch, replacement ) ( prevKeySet, prevValueSet ) ->
                        ( Set.insert ch prevKeySet
                        , replacement
                            |> String.toList
                            |> Set.fromList
                            |> Set.union prevValueSet
                        )
                    )
                    ( Set.empty, Set.empty )

        identityKeySet =
            Set.diff (Set.union axiomSet valueSet) keySet
    in
    identityKeySet
        |> Set.foldl
            (\ch ->
                Dict.insert ch (Sequence.singleton ch)
            )
            Dict.empty

buildIdentity is where the main action happens. It builds a dictionary of characters mapped to their singleton sequences. It's easier to understand how it works via an example.

An example

Input:

rules = [ ( 'F', "F+F-F-FF+F+F-F" ) ]
axiom = "F+F+F+F"

Then,

axiomSet = Set.fromList [ 'F', '+' ]
keySet = Set.fromList [ 'F' ]
valueSet = Set.fromList [ 'F', '+', '-' ]

Since identityKeySet = Set.diff (Set.union axiomSet valueSet) keySet we have:

identityKeySet = Set.fromList [ '+', '-' ]

Therefore, the dictionary that buildIdentity returns is:

Dict.fromList
    [ ( '+', Sequence.singleton '+' )
    , ( '-', Sequence.singleton '-' )
    ]

This dictionary is united with the dictionary we build from the rules:

Dict.fromList
    [ ( 'F', Sequence.fromString "F+F-F-FF+F+F-F" )
    ]

to produce the final dictionary. This allows us to change our implementation of lookup.

The new `lookup`

lookup doesn't need to create singleton sequences anymore because of how the Rules data structure is built and subsequently used. So it changes to:

lookup : Char -> Rules -> Sequence Char
lookup ch (Rules table) =
    case Dict.get ch table of
        Just replacement ->
            replacement

        Nothing ->
            --
            -- This branch would actually never be called by the generation
            -- algorithm but since Elm requires total functions we have to
            -- include it.
            --
            Sequence.empty

Reflections

It's interesting how to improve lookup we actually didn't have to change it much. But it makes sense. build is used once. lookup is used for every character in the sequence, which could be in the millions and billions. So pushing more processing into build in order to improve lookup ever so slightly is a good tradeoff in hindsight.