In the previous post, I mentioned that:
The only issue I have with
lookup
is that for missing keys it creates a new singleton sequence each time.
Jeroen also pointed out, and rightly so, that:
It's even worse than that: you always create this singleton, because
Maybe.withDefault
is eager. The fix is simple for this part: use a case expression instead.
lookup : Char -> Rules -> Sequence Char
lookup ch (Rules table) =
case Dict.get ch table of
Just replacement ->
replacement
Nothing ->
Sequence.singleton ch
But now we're back to square one and I still don't have a fix for my original issue with lookup
.
What are my options?
I considered two options:
- Have
lookup
return aMaybe (Sequence Char)
and deal with it downstream. - Have
lookup
return(Sequence Char, Rules)
so that we can update the rules with the missing replacement such that the next time we'd find it in rules and return it.
But there were problems with both approaches.
With the first approach, it just puts off the inevitable, i.e. we'd still be recreating the singleton every time just somewhere else downstream.
With the second approach, we'd have to keep track of the changes to rules within Sequence.concatMap
. It could kinda work but I didn't like the API changes that needed to occur for it to happen so I abandoned this idea.
After taking a break and a nice long walk another option revealed itself. Why not determine all the characters that would be looked up ahead of time? L-system generation uses both the rules and axiom. Every character that appears in the final generated sequence comes from one of these values.
Enter Rules.build
Rules.build
replaces Rules.fromList
. It takes both the rules and axiom and builds a dictionary that contains a key for every possible character that could occur in the final generated sequence.
import Dict exposing (Dict)
import Sequence exposing (Sequence)
import Set
build : List ( Char, String ) -> String -> Rules
build rules axiom =
let
identity =
buildIdentity rules axiom
in
rules
|> List.map (Tuple.mapSecond Sequence.fromString)
|> Dict.fromList
|> Dict.union identity
|> Rules
buildIdentity : List ( Char, String ) -> String -> Dict Char (Sequence Char)
buildIdentity rules axiom =
let
axiomSet =
axiom
|> String.toList
|> Set.fromList
( keySet, valueSet ) =
rules
|> List.foldl
(\( ch, replacement ) ( prevKeySet, prevValueSet ) ->
( Set.insert ch prevKeySet
, replacement
|> String.toList
|> Set.fromList
|> Set.union prevValueSet
)
)
( Set.empty, Set.empty )
identityKeySet =
Set.diff (Set.union axiomSet valueSet) keySet
in
identityKeySet
|> Set.foldl
(\ch ->
Dict.insert ch (Sequence.singleton ch)
)
Dict.empty
buildIdentity
is where the main action happens. It builds a dictionary of characters mapped to their singleton sequences. It's easier to understand how it works via an example.
An example
Input:
rules = [ ( 'F', "F+F-F-FF+F+F-F" ) ]
axiom = "F+F+F+F"
Then,
axiomSet = Set.fromList [ 'F', '+' ]
keySet = Set.fromList [ 'F' ]
valueSet = Set.fromList [ 'F', '+', '-' ]
Since identityKeySet = Set.diff (Set.union axiomSet valueSet) keySet
we have:
identityKeySet = Set.fromList [ '+', '-' ]
Therefore, the dictionary that buildIdentity
returns is:
Dict.fromList
[ ( '+', Sequence.singleton '+' )
, ( '-', Sequence.singleton '-' )
]
This dictionary is united with the dictionary we build from the rules:
Dict.fromList
[ ( 'F', Sequence.fromString "F+F-F-FF+F+F-F" )
]
to produce the final dictionary. This allows us to change our implementation of lookup
.
The new lookup
lookup
doesn't need to create singleton sequences anymore because of how the Rules
data structure is built and subsequently used. So it changes to:
lookup : Char -> Rules -> Sequence Char
lookup ch (Rules table) =
case Dict.get ch table of
Just replacement ->
replacement
Nothing ->
--
-- This branch would actually never be called by the generation
-- algorithm but since Elm requires total functions we have to
-- include it.
--
Sequence.empty
Reflections
It's interesting how to improve lookup
we actually didn't have to change it much. But it makes sense. build
is used once. lookup
is used for every character in the sequence, which could be in the millions and billions. So pushing more processing into build
in order to improve lookup
ever so slightly is a good tradeoff in hindsight.
Top comments (2)
Love the way this goes deep into cleaning up the little slowdowns, feels like the kind of dev stuff I run into all the time.
I love watching the evolution of solutions to a problem. Thanks for sharing.