Forem: Kemal Deniz Teket

Formalized, Reviewed, Triaged — A Practitioner's Account, Part II

Kemal Deniz Teket — Sun, 03 May 2026 18:00:21 +0000

§0 — Hook

The first article introduced a methodology that produced a working library —
four layers, twelve packages, over thirty thousand lines of code — in two weeks.
It also described the gaps that needed to close.

Two parallel improvements happened in the one week that followed. The first was
formalization: the practice that lived in one operator's head became a document
set, a machine-consumable instruction set, a work registry with an explicit state
machine, four context files per package, a decision lifecycle, archive procedures.
The methodology stopped being a discipline and became an apparatus — a structured
set of documents and protocols that an LLM can read and follow without the operator
present for every step.

The second improvement was a sprint. Two new color-related packages shipped under
the formalized process, then several review passes returned more than a hundred
findings. None broke the build. None suggested the methodology had failed. They
were the kind of findings that only appear when a codebase is finished enough to
be read back to its author by an external instrument.

The work-pool schema that runs the paragraf project names three work types:
spec, package, and issue-bucket. Only two of the three have a defined
process. The title word is earned — the findings were triaged. The process that
did the triaging is not yet written down.

These are not two separate stories. The methodology was formalized to handle
forward work properly. The formalization surfaced its own boundary. The findings
sprint ran in that boundary informally. Only one of the three work types — the one
that drove an entire week of review — has no documented process. That gap is not
an oversight. It is the thing the formalization revealed about itself.

§1 — What Got Formalized

In article-1, the methodology was a practice. One week later, it is a document
set with specific roles at specific phase boundaries.

docs/
├── methodology.md              # phases, gates, ask-human triggers
├── methodology-reference.md    # archive procedures, anti-patterns
├── outer-context.md            # project-level consistency checker
├── work-pool.md                # work registry + state machine
├── glossary.md                 # defined terms, hierarchy summary
├── dependency.md               # project-level dependency map
├── io-schemas.md               # project-level I/O navigation
├── roadmap.md                  # strategy and milestones
├── AI-PRIMER.md                # minimal session bootstrap
│
├── inner-context/[package]/
│   ├── inner-context.md        # role, constraints, package rules
│   ├── io-schema.md            # public types, exported functions
│   ├── dependency.md           # package imports
│   └── decisions.md            # active draft decisions only
│
├── plan/
│   └── workId-package-spec-plan-[YYMMDD-HHMM].md
│
└── archive/
    ├── plan/done/
    ├── plan/cancelled/
    └── decision/[package]-decisions-archive.md

Each document has one role. methodology.md is the instruction set — four phases
(Define, Specify, Implement, Revise), a list of Ask-Human triggers, consistency
controls that produce visible output at every phase boundary. outer-context.md
is the project-level checker, run before and after every inner loop, edited only
at outer-loop review gates. work-pool.md is the registry with an explicit state
machine: draft → planned → in-progress → done, with branches for deferred and
cancelled. Plans come in three shapes — package-spec-plan, root-spec-plan for
multi-package work, issue-bucket-plan for grouped issues. Decisions live in the
package's decisions.md while in flight, then graduate to a one-line constraint
in inner-context and a full archive entry when locked.

The motivating observation is simple. In article-1, the methodology survived
because one operator carried the context across every session. One week later, two
new packages had to be deliverable by sessions that didn't have weeks of context.
Formalization is what makes a methodology survive the session boundary. The model
cannot read intent. It can read documents.

§2 — Two Packages Through the Formalized Process

color and color-wasm shipped under the new process. What that looked like in
practice was not dramatic — which is the point.

The packages had blocking relationships: render-pdf could not import OutputIntent
until the color API was stable, and compile could not enforce compliance until
render-pdf could embed the ICC profile. Multi-package work used a root-spec-plan
to orchestrate package-level plans rather than treating it as one large change.
Each workId had a state machine behind it and a plan document that archived on
completion. A session that arrived mid-work could read the plan, see what had been
verified and what was pending, and continue without operator narration. That is
the formalization working.

One design reversal happened during this work, and it is the more instructive
story. The color packages were originally planned as optional dependencies at every
layer, on the assumption that flexibility was always preferable. Implementation
surfaced the opposite: optional-everywhere produced more integration friction than
it saved, and made the dependency direction unclear when render-pdf and compile
both needed the same types. The decision was reversed mid-flight — imports became
fixed at the layer they belonged to, and only the user-facing exports stayed
optional.

The decision lifecycle handled this without ceremony. The original choice was a
draft decision in decisions.md. The reversal was a new draft decision that
superseded it. When it shipped, the one-line constraint graduated to inner-context
Package Constraints, and the full entry — both the original and the supersession —
moved to the package's archive file. The change is traceable in the documents.
It is not something the operator has to remember.

This is the formalization paying off. The architecture survived a reversal without
becoming undocumented, and a future session reading the inner-context files sees
the constraint, sees the archive reference, and can reconstruct why the dependency
direction is the way it is.

Tests passed. End-to-end runs went green. Article-1 described an audit moment
named excuse-me-kemal-I-forked-up.md — a file created when unit tests were
passing across all packages while end-to-end tests were failing across all of them,
and a full audit was the only way forward. There was no equivalent moment this
time. The methodology that article-1 reconstructed from a crisis was running as
designed when color and color-wasm shipped.

Then the review started.

§3 — Then Review Returned 100+ Findings

The four categories

Before describing how the findings were gathered, it helps to name what kinds of
things they were. More than a hundred items resolved into four structurally
different categories that a single priority column could not distinguish.

Inert fixes — README accuracy, broken links, version alignment, stale test
headers. Zero code risk, zero architectural implication. Safe to batch and ship
without review.

Surgical code corrections — narrow, traceable to a specific line, no
behavioral side effects. The GTS_PDFA1 hardcoding that mislabelled OutputIntent
subtypes. The CSS font-weight matching that returned the wrong face when an
exact-weight descriptor wasn't registered. Each had a clear before-and-after state
and a test to update alongside it.

Behaviour-changing refactors — formally correct, but changes which outputs the
algorithm produces. One item — fixing a prefix-sum off-by-one in the Knuth-Plass
ratio computation — was correctly identified as something that "changes which
breakpoints the algorithm chooses." Not a bug fix. A refactor that produces
different paragraph shapes. It cannot travel in the same pull request as surgical
fixes. Merging them makes the change untraceable and the revert scope unclear.

False-assurance tests — the most dangerous category, and the one that deserves
the most attention. These are tests that passed and never exercised the constraint
they were written to verify. A widow/orphan test where both branches produced a
three-word last line regardless of whether the penalty was active. A
consecutive-hyphen-limit test where the fixture happened to produce the limit
naturally, so the cap was never the binding constraint. A looseness test that
produced eleven lines on every setting from −2 to +1. All passed. None provided
assurance about anything. They were identified only because the review read the
test fixtures carefully enough to notice that the constrained and unconstrained
branches produced identical output. A green test suite is not evidence of a correct
test suite.

The five steps

The review that surfaced these categories ran in five steps, in a specific order
that emerged from experience rather than from design.

Step 1 — Whole-codebase review by Claude Opus. The codebase was uploaded to a
fresh Opus session as a zip archive. claude.ai reads from the main branch, and the
new packages were on a feature branch — the upload was the only way to get the
full state into a fresh session. Opus produced a structured pass across all layers
and packages: consistency gaps, accuracy problems, decisions made during the build
that didn't survive contact with the wider codebase. This became todo-list.md —
73 items.

Step 2 — Crosscheck and addition by VS Code Copilot. Two requests in the same
session, with Sonnet in high-reasoning chat mode. First: "see Claude Opus findings
in todo-list.md — don't take any action yet, create a table with issues,
priorities, severity, validity, and write them to todo-list-copilot.md." Then:
"additionally, provide findings based on your own review and add them to
todo-list-copilot.md." The crosscheck corrected priorities, narrowed scope on
several items, and flagged behaviour-changing refactors that had been listed as
cleanups. Copilot's own pass added items the structural review had not surfaced.
The list grew from 73 to 81.

Step 3 — Batched fixes by VS Code Copilot. Group the items and fix the
critical and high findings in batches. This is the work that the methodology
documents had no name for — issue-bucket execution. The work-pool schema had the
type. There was no Phase 1–3 process for it.

Step 4 — GitHub Copilot review on diffs, then Copilot fixes. Run three times
until critical and high issues stopped appearing. GitHub Copilot review operates on
the pull request diff rather than on the whole codebase, so it catches a different
class of problem than the Opus structural pass. Findings were fetched via the
GitHub CLI after each review pass:

gh pr view --json reviews,comments \
  --jq '.reviews[].body, .comments[].body' \
  > docs/findings-I.md

Three iterations produced three findings files — findings-I.md, findings-II.md,
findings-III.md — with comment counts of seven, eleven, and six. For each, the
same prompt to VS Code Copilot: "see GitHub Copilot review in findings-X.md —
don't take any action yet, create a table with issues, priorities, severity,
validity, completed." Then: fix the issues.

Step 5 — Final structural pass by Opus. The updated codebase went back to the
same Opus session for a final review on remaining critical and high items. The
session memory carried the original review context, so the second pass could focus
on what had changed rather than re-deriving the codebase from scratch.

Two reviewers, two registers: Opus for structural and architectural review across
the whole codebase, GitHub Copilot for diff-level scrutiny on the pull request. VS
Code Copilot synthesizing both into actionable batches and executing the fixes. The
sequence wasn't documented before it was run. It worked.

The total — around 105 findings — came from two sources: 81 items from the Opus
and Copilot crosscheck in steps 1 and 2, and 24 comments from the three GitHub
Copilot pull request review passes in step 4. These overlap in coverage but not in
scope: the structural pass finds architectural drift that the diff reviewer never
sees, and the diff reviewer finds interface-level issues that the structural pass
glosses over. Both are necessary. Together they mapped onto those four categories.
The category that mattered most was the last one — and the reason it exists
connects directly to what the formalization revealed about itself.

§4 — What the Formalization Revealed About Itself

A methodology that has been written down can be checked against the work it
actually governs. This is the property that makes formalization more than
documentation hygiene — it produces a surface that the work can be measured
against, and the gap becomes visible.

The gap here is issue-bucket. The work-pool schema names three work types:
spec, package, and issue-bucket. spec work and package work are
documented end-to-end. Both have a Phase 1 mandatory-read list, a plan template,
ownership verification, human gates, consistency controls, and an archive
procedure. A new session can run either type by following the documented steps.

issue-bucket work has none of this. The type exists as a first-class entry in
the schema. There is no Phase 1 process, no plan template that fits its shape, no
ownership verification rule, no defined gate between triage and execution. The
five-step sequence in §3 is the process that ran. It is not yet a document. It
produced correct results because one operator carried the context across every step
— the same condition article-1 described as the situation the methodology was
supposed to graduate from.

One concrete example makes this precise. During the review, a finding was produced
about the relationship between @paragraf/render-pdf and @paragraf/compile. The
finding read the code correctly but drew an inverted conclusion — it described
render-pdf as depending on compile, when the actual direction is the reverse:
compile is the top-level orchestration layer and depends on render-pdf to produce
PDF output, not the other way around. The finding would have looked plausible to
anyone without context. It was caught during manual testing, when the explanation
didn't match expected behavior. The detection mechanism was not a test. It was
familiarity with the layer dependency structure documented in the inner-context
files.

That catch happened in the operator's head, not in the apparatus.

This connects directly to the difference between typesetting and typography as
disciplines. Typesetting is measurable: column width, leading, glyph spacing, grid
alignment. Typography is judgment — pattern recognition built from sustained
exposure to well-set text. A typographer sees paragraph colour as a unified
impression before they can name what is wrong with any single line. The criterion
is real. The instrument is human. You cannot replace the typographer with a
checklist, because the checklist can only encode what the typographer already knew
how to measure.

The false-assurance tests in §3 are the software version of this same problem.
The test author could see what the constraint was supposed to guarantee. They could
not encode that criterion in a way the test framework would verify. So a measurable
proxy stood in — run the test, check the output matches — and the proxy passed
while the criterion was never checked. The apparatus ran correctly. The apparatus
was checking the wrong thing. That distinction is not visible from inside the
system. It is only visible to the operator.

This is not a failure of formalization. It is the honest limit of what
formalization can achieve. The methodology can document the apparatus that supports
the operator. It cannot replace the operator. The right measure of a methodology's
maturity is not how few gaps it has, but whether the gaps that remain are the right
gaps — the ones where human judgment genuinely adds value that instruction cannot
replicate.

Twenty-two items from the original 81 remain open. They are not residual. One is a
behaviour-changing refactor deferred until the Rust side can be updated in
lockstep. Several are typed-only features requiring a decision rather than an
implementation. One is a latent multi-span RTL bug that doesn't trigger today but
is a known risk for when span support arrives. Each is a different kind of open
item, and the current work-pool format does not distinguish between them. Those 22
items are the live test bed for the issue-bucket process when it gets defined.

§5 — Close

Two improvements in one week. Two new packages shipped under a methodology that
was held in one operator's head in article-1 and exists as a document set now.
More than a hundred review findings processed through a five-step sequence that
worked, produced correct results, and is not yet documented. Twenty-two items
deliberately left open as the test bed for the next iteration.

Article-1 showed the methodology working. This article shows it made explicit, and
shows where making it explicit revealed what was still implicit.

The lesson is not specific to paragraf. Any LLM-assisted system that survives long
enough will eventually formalize. Formalization is not the end state — it is the
precondition for seeing clearly where the system ends and the operator begins. The
value of writing down the apparatus is not that it removes the operator. It is that
it shows you exactly where the operator is still necessary, and separates that from
where they were simply compensating for missing documents.

The next methodology article in this series comes when the issue-bucket loop has
been observed in defined form rather than in informal practice. The open questions
are concrete: What distinguishes a finding that goes straight to execution from one
requiring triage? Who owns a behaviour-changing refactor that spans packages? How
does a multi-reviewer sequence open and close without the operator as coordinator?
The five-step process in §3 answered all of these by being run once. The next
version answers them before the process runs.

paragraf is open source. The repository, the live demo, and the article series
are at github.com/kadetr/paragraf.

Three Gaps, One Platform

Kemal Deniz Teket — Sat, 25 Apr 2026 19:21:40 +0000

The terminology, perceptual, and accessibility gaps between typographers and developers — and where paragraf sits

§0 — Hook

A typographer says: "the colour is unbalanced." A developer opens the style sheet and adjusts the colors on the page. The paragraph is still wrong. Neither person made a mistake.

"Colour" in typesetting means the visual density of a paragraph (also called grey) — how evenly ink is distributed across the lines. It has nothing to do with hue. The developer heard the correct English word and acted on a reasonable interpretation. The typographer used the correct technical term and assumed it would be understood. The conversation failed before it began.

This is the first gap: terminology. Two disciplines developed precise vocabularies for the same domain, independently, and the words do not always map to each other.

The second gap is deeper. A typographer looks at a paragraph and perceives something wrong before they can name it — the trained eye reads texture, rhythm, and density as a unified impression. A developer looks at the same paragraph and sees output that matches the specification. The code is correct. The output is wrong. These are not contradictory statements. They describe two different instruments measuring the same object.

The third gap is structural. The tools that produce publication-quality typographic output are desktop applications and command-line systems. They are not callable from a Node.js function. They do not run in a CI pipeline. They do not return a PDF buffer from a single API call.

This article names all three gaps, maps the terminology where it overlaps, and describes where paragraf sits in relation to them.

§1 — The Terminology Gap

Both digital typesetting and software development have spent decades solving the problem of text on a page. They arrived at precise solutions — from different directions, with different tools, and with different names for what they found.

If you have spent years working with type, the terms on the left are familiar. If you have spent years writing code, the terms on the right are familiar. The intersection of the two is not very crowded.

This is not a translation from one language to a simpler one. It is a map between two equally precise vocabularies.

Typesetting	Development	Notes
Paragraph colour / grey	Whitespace distribution	How evenly ink is distributed across a paragraph. Tight lines and loose lines produce uneven grey — visible before you can name it.
Rivers	Whitespace clustering	Vertical gaps that form when loose lines stack. The eye catches them as white channels running through the paragraph.
Justified text	Line-breaking algorithm	Justification is the goal. The algorithm — greedy or Knuth-Plass — determines the quality of the result.
Leading	Line height	Distance between baselines. Named after the lead strips compositors placed between rows of metal type.
Optical margin alignment	Margin protrusion	Punctuation and soft glyph edges pushed slightly outside the column boundary so the margin reads as visually straight.

Knuth-Plass (left) produces even paragraph grey with no rivers — spacing is distributed optimally across all lines simultaneously. Greedy (right) fills each line independently, producing uneven grey and visible rivers of white space.

Optical margin alignment (top): punctuation protrudes slightly outside the column boundary, producing a visually straight margin edge. Without it (bottom), the margin reads as ragged even when mathematically flush.

§2 — The Perceptual Gap

The terminology gap is solvable with a shared reference. The perceptual gap is harder.

A typographer's judgment is trained over years of looking at printed pages — recognising rivers in justified text before knowing their name, feeling when leading is too tight before measuring it, seeing when a paragraph's colour is wrong before identifying which lines are causing it. This is not intuition. It is pattern recognition built from sustained exposure to a specific domain. It cannot be transferred as a specification.

A developer's judgment is trained differently — recognising when code is brittle, when an abstraction leaks, when a system will fail under load. These are also not intuitions. They are patterns learned from building and breaking systems. A developer looking at a paragraph of text has no trained apparatus for reading its typographic quality. The code produced correct output. That is the only signal available.

The inverse is equally true. A typographer looking at a codebase has no trained apparatus for reading its structural quality. The document looks correct on screen. That is the only signal available.

Neither gap represents a failure of intelligence or effort. They represent two disciplines that developed different perceptual instruments for different problems, and are now being asked to collaborate on the same artefact.

What changes this is not more training — it is a shared surface. When the typographer can adjust tolerance and see the paragraph grey change in real time, the parameter becomes a perceptual instrument. When the developer can see that the typographer's tolerance: 1.5 approval maps to a specific output quality, the visual judgment becomes a reusable specification. The gap does not close. But it becomes navigable.

§3 — The Accessibility Gap

InDesign is a desktop application. TeX is a command-line system. Both require installation, licensing or configuration, and specialist knowledge to operate. Neither is callable from a Node.js function. Neither runs in a CI pipeline. Neither returns a PDF buffer from a single API call.

This is not a criticism — both tools are mature, reliable, and used in production publishing worldwide. It is a statement about what they are: authoring environments designed for human operators, not pipeline components designed for programmatic integration.

The gap this creates is visible in how publishing automation projects get built. A catalog of ten thousand products needs a PDF for each one. A report needs to be generated on demand from live data. A personalised document needs to be assembled from a template and a data record at request time. These are engineering problems. The tools that produce acceptable typographic output are not designed for them. The tools designed for them do not produce acceptable typographic output.

paragraf is an attempt to occupy that gap: publication-quality typesetting as a Node.js library, callable from TypeScript, with no external processes and no GUI dependency. The same algorithms that production typesetting tools use — Knuth-Plass, OpenType shaping, optical margin alignment — running inside a standard npm install.

That is the claim. Not that paragraf matches the output quality of tools refined over decades. But that the algorithms are the same, the parameters are exposed, and the pipeline is programmable.

The pipeline side is paragraf's current focus. The typographer-facing side — a visual authoring environment built on the same engine — is studio, currently in design. That is the subject of a future article in this series.

§4 — What paragraf Implements

The terminology map is one thing. What paragraf actually exposes is another. This section shows the implementation — with the actual parameters, and with honesty about what is not yet there.

Paragraph colour / whitespace distribution

Controlled by tolerance and looseness in the paragraph composer. tolerance sets how much deviation from ideal spacing is acceptable before a line is considered bad. looseness nudges the algorithm toward fewer or more lines. Together they give the typographer direct control over the grey of a paragraph.

const { lines } = composer.compose({
  text: '...',
  font: { id: 'body', size: 11, weight: 400, style: 'normal', stretch: 'normal' },
  lineWidth: 396,
  tolerance: 2.0,
  looseness: 0,
  alignment: 'justified',
  language: 'en-us',
});

Rivers

Rivers are the visual consequence of a greedy line-breaking algorithm — not a parameter to set, but a problem paragraf avoids by using Knuth-Plass. The algorithm considers all lines simultaneously rather than filling each line independently, which eliminates the whitespace clustering that produces rivers in justified text.

Justified text / line-breaking algorithm

paragraf implements Knuth-Plass with the full parameter set: tolerance, looseness, consecutive hyphen limit, and runt-line penalties. The demo runs Knuth-Plass and greedy side by side — the difference is visible immediately on any paragraph of justified prose.

const { lines } = composer.compose({
  text: '...',
  alignment: 'justified',
  tolerance: 2.0,
  consecutiveHyphenLimit: 2,
});

Leading / line height

Set per style in the paragraf style system. Applied baseline to baseline, consistent with typesetting convention rather than CSS convention (which measures from the top of the line box).

const template = defineTemplate({
  styles: {
    body: {
      font: { family: 'LiberationSerif', size: 11 },
      alignment: 'justified',
      lineHeight: 16, // baseline to baseline, in points
    },
  },
});

Optical margin alignment

Implemented as a two-pass recomposition. The first pass composes normally. The second pass identifies punctuation and soft glyph edges at line starts and ends, applies protrusion values, and recomposes affected lines. The result is a visually straight margin edge on justified text.

const { lines } = composer.compose({
  text: '...',
  alignment: 'justified',
  opticalMargins: true,
});

What is not yet implemented:
Frame-level widow/orphan control, and PDF/X output are in progress. These affect press-ready output but not the typesetting parameters above.

§5 — Close

Three gaps. One platform pursuing all three.

The terminology gap is addressed by exposed, named parameters that map directly to typesetting concepts. The perceptual gap is navigated by making those parameters adjustable and their output immediately visible — so that a typographer's judgment and a developer's integration can operate on the same artefact. The accessibility gap is addressed by building the pipeline as a Node.js library rather than a desktop application or command-line system.

None of these gaps are fully closed. paragraf is pre-1.0, in active development, and honest about what is not yet implemented. What it offers at this stage is a surface — a place where typographic judgment and engineering decisions can meet, with a shared vocabulary for what they are doing.

The paragraf series continues when studio is ready to demonstrate. A parallel series on AI-assisted development continues with second article — the v0.1 documentation system is the subject.

paragraf is open source. The repository, the live demo, and the article series are at github.com/kadetr/paragraf.

Spec-Driven, AI-Assisted, Test-Validated — A Practitioner's Account

Kemal Deniz Teket — Thu, 16 Apr 2026 23:43:19 +0000

What made a two-week typesetting library possible, and what the methodology still lacks

§0 — Hook

Most accounts of AI-assisted development describe tools and workflows. Very few show primary sources: the actual specification documents, the actual errors caught, the actual decisions revised under implementation pressure. Without those, the account is not evaluable and not replicable. You cannot tell whether the methodology produced the result or whether the result happened despite the methodology.

This article shows the sources. paragraf — an open source typesetting library built in two weeks — is the case study. 12 packages, Rust/WASM and TypeScript, covering a complete pipeline from font shaping to PDF output. The methodology is the subject. Every claim below is either demonstrated by an artifact or flagged explicitly as opinion.

The paragraf demo — live in browser.

§1 — The Prerequisite Nobody Mentions

Every AI-assisted development article eventually says some version of "give the AI precise specifications and you get better output." Almost none of them explain where precise specifications come from.

They do not come from knowing the algorithms. paragraf implements the Knuth-Plass line-breaking algorithm, optical margin alignment, rustybuzz OpenType shaping, and Unicode BiDi. None of those were known in detail before the project started. They were researched, trusted to AI implementation, and verified through tests.

What domain knowledge actually contributes is different and harder to acquire: knowledge of failure modes in the target environment.

The onMissing design in @paragraf/compile — skip, fallback, placeholder, each with defined behavior — does not appear in the typesetting literature. It comes from having seen a real product information management export fail a batch job at 2am because three records out of ten thousand had a missing field. The normalize() hook that maps raw data to template bindings comes from knowing that every enterprise data source has a different field naming convention and no library adapter ever covers a specific customer's exact schema. The strict layer dependency rules — each package imports only from layers below, no exceptions — come from having debugged circular dependency failures in InDesign automation pipelines.

An AI given the same algorithm knowledge and no production context would have built something technically correct that fails the first time it touches real data. The specifications were precise because the failure modes were known in advance. That precision is the prerequisite.

This maps closely to what practitioners in the AI coding literature describe as spec inputs: markdown documents, conversations, diagrams, domain models, and existing code feeding into requirements that the AI can act on reliably. The taxonomy is accurate. What the literature underemphasises is that the quality of those inputs depends almost entirely on what the human already knows — not about the AI, but about the problem domain.

The specification document that governed package structure and testing strategy

§2 — The Two-Loop Process

The development process followed two nested loops. Understanding the distinction between them is the core of the methodology.

The outer loop covers the project. It produces: a problem definition, scope constraints, a high-level layer architecture, an architecture diagram, and a versioned roadmap. Outer loop documents are updated when implementation reality forces a revision. They are not fixed contracts — they are living records of current understanding. This is what Fowler calls design-first collaboration: the human owns the architecture, the AI executes within it.

The inner loop covers each package. It produces: a scope definition, input/output schemas, a step-by-step implementation plan with defined subtasks and edge cases, unit tests written before implementation, then implementation against that specification, closing with integration and end-to-end tests validating every contract.

Here is what the inner loop looks like in practice, from the @paragraf/linebreak package plan:

an inner loop plan sample

The two loops are not independent. An issue discovered mid-package feeds back into the outer loop and may revise the project scope, the architecture, or the roadmap. A concrete example: the layer numbering in the original plan had 1c-font-engine and 1b-shaping-wasm. During extraction, the numbering was revised to 1b-font-engine and 2a-shaping-wasm to better reflect the actual dependency structure. Small decision, visible consequence: the outer loop was updated in response to implementation reality rather than preserved as a false contract. The architecture documents show that evolution directly:

Architecture at the start of the project (left) and after several outer loop iterations (right).

§3 — Trust, But Verify

Trust, but verify — I first heard it from my director Bob Bair when we first met in Greece, nearly ten years ago. I assumed it was professional jargon. It turned out to be a Russian proverb, and the most accurate description of what makes AI-assisted development work at this level.

The phrase "AI-assisted" covers an enormous range of practices, from fully autonomous code generation to a human using AI as an implementation engine operating inside precisely defined contracts. The methodology here is firmly the second. No agentic framework was used — existing frameworks are built around task delegation and autonomy, which is the opposite of what this methodology requires. Every step involved a human decision. The AI tools were implementation engines and discussion partners, not architects.

The tooling split was deliberate. VS Code Copilot and Claude Sonnet/Haiku handled code generation inside the inner loop — writing implementations against pre-defined schemas and tests. Claude Opus and Gemini handled architecture discussions, document synthesis, and the outer loop decisions where the question was "what, why & how should this be" rather than "implement this." A third role — code review and audit — ran across both: Claude connected to GitHub, Copilot code review, and manual review at integration boundaries. The key practice was using multiple models as a quality control mechanism: when two models diverge on an assessment of the same code or decision, that disagreement is a signal worth investigating. The human resolves it. This is what the emerging harness engineering literature is beginning to formalise — the scaffolding around AI tools matters as much as the tools themselves.

The control mechanism that makes this work at the code level is test-first development — but not in the conventional sense of "write tests alongside your code." The distinction matters: tests written before implementation define what correct means before asking for anything. They are specifications expressed as assertions. The AI implements against them. Errors are caught at the unit boundary, not at integration time.

During the first week, unit tests were passing across all packages and end-to-end tests were failing across all packages. The response was to stop, run a full audit, classify every issue by severity, and fix in order before continuing. The audit document was named excuse-me-kemal-I-forked-up.md. The name is the emotional record of the moment. The contents are the professional response to it.

Six critical issues. Two high. Three medium. All code files, tests, and documents fixed before shipping. The table is not evidence that the process prevents errors — it is evidence that the review step is real and not ceremonial. Errors that compound into the next stage are significantly more expensive than errors caught at their origin. The audit caught them at their origin.

§4 — What It Produced and What It Lacks

What it produced: 12 published packages covering a complete typesetting pipeline. 906 unit tests across all packages, 70 end-to-end tests, and 23 manual test scripts producing real PDF and SVG output. A live demo running the full WASM shaping pipeline in the browser. A complete compile API that takes a template and a data record and returns a PDF buffer in a single function call. The visible output of a correctly specified system:

Knuth-Plass (left) distributes spacing evenly across all lines simultaneously. Greedy (right) fills each line independently, producing uneven spacing and a stretched final line.

The left column was produced by Knuth-Plass with real OpenType metrics. The right column was produced by the greedy algorithm used by every JavaScript PDF library. The difference is the consequence of specification precision applied at every layer of the stack.

What it lacks: three gaps worth naming honestly.

Documentation was not versioned alongside code. Inner loop documents were overwritten as understanding evolved. The dependency reference document changed significantly during the project but only its current state is preserved. Git tracked every code change with line-level precision while documentation changes were invisible. The discipline was there. The infrastructure to preserve the evidence of that discipline was not. These are not the same failure.

Session continuity works for one person. The session handoff document — a structured snapshot of current state, architectural decisions with reasoning, known bugs classified by severity, and next steps — is what allows AI-assisted development to resume coherently across sessions. Fowler's context-anchoring describes this problem precisely: without deliberate anchoring, each session starts from a degraded understanding of the project's current state. The handoff document is a manual solution to that problem. It works. It does not yet scale to a team without additional tooling design.

The process is disciplined but not yet systematic. Disciplined means: there was a defined structure, it was followed consistently, it produced results. Systematic means: the process itself is observable and reproducible from its own records. The audit document was produced because an audit was run manually at a moment of failure. A systematic process would have tooling that made that audit continuous rather than reactive.

§5 — What the Next Version Looks Like

Three concrete improvements follow directly from the gaps above.

Documentation commits alongside code commits. When a specification changes in response to an implementation discovery, that change should be recorded at the same moment as the code change that triggered it, with the reasoning attached. Not a git diff of a prose document — a dated entry written by the person who made the decision, capturing intent rather than just state.

Inner loop documents versioned rather than overwritten. Each package plan should have a version history showing what changed between drafts and why. The delta between versions is where the feedback loop between inner and outer is visible. Without it you have outcomes but not reasoning.

Session handoff documents dated and immutable. The handoff document that allows a session to resume coherently should be treated as a changelog entry, not a mutable working document. Write it at the end of a session, date it, do not overwrite it. The history of those documents is the history of how the project's understanding evolved.

The quality of AI-assisted output is determined by the precision of the specification, the discipline of the review, and the honesty of the record. All three are learnable. None of them require a particular tool. The methodology described here is not finished — it is a working version that produced a working result and has visible room to improve. That is a more useful account than a polished success story, and it is the only kind worth writing.

The next article in this series covers the problem space in more depth — the typographic and algorithmic reasons why existing JavaScript document libraries fall short of publication quality, and what it takes to close that gap.

paragraf is open source. The repository, the live demo, and the article series are at github.com/kadetr/paragraf.

Towards an Open Source Print-Ready Publication Library in JavaScript

Kemal Deniz Teket — Mon, 13 Apr 2026 21:47:44 +0000

Building paragraf: a typesetter with industry-standard methods for print-ready publication quality documents in Node.js

§0 — Introduction

When you open a professionally printed book, a luxury product catalog, or a pharmaceutical package insert, the text feels different. Words are evenly spaced. Paragraphs have a calm, consistent density. Punctuation sits exactly where it should. The page looks deliberate.

When you generate the same content programmatically — using the JavaScript libraries available today — something is lost. Word spacing is uneven. Lines break at awkward places. Certain letter combinations look slightly wrong. The output is functional but it does not look typeset.

paragraf is an attempt to close that gap. 12 packages are complete, 3 are planned for the coming weeks. The typesetting core — line breaking, font shaping, optical margins, bidirectional text, hyphenation, styles, layout, and the compile pipeline — is production-ready. The print production layer — color management, color separations, and print-ready PDF output — and the visual editor layer are in progress. This article is an overview of what paragraf is, how it is built, and where it is going.

§1 — The Problem: Why does text look different in books vs documents generated by code?

Professional typesetting solves several distinct problems simultaneously. Most JavaScript PDF pipelines solve none of them.

Line Breaking

The greedy algorithm — used by every JavaScript PDF library and every browser — fills each line as fully as possible and breaks there, making each decision in isolation. The Knuth-Plass algorithm, developed by Donald Knuth and Michael Plass in 1981 and used by TeX and Adobe InDesign since then, treats the entire paragraph as a single optimisation problem and finds the break sequence that minimises total spacing deviation across all lines simultaneously.

The demo above shows the same text set with Knuth-Plass (left) and the greedy algorithm (right), at identical column width and font size.

Font Shaping

Correct line breaking requires correct word widths. Correct word widths require processing the actual glyph outlines in the actual font file: rules that combine letter pairs into designed ligature forms (the Glyph Substitution, or GSUB table) and rules that adjust spacing between specific character combinations (the Glyph Positioning, or GPOS table). Most libraries get this wrong — they approximate from pre-computed tables — and the errors compound directly into Knuth-Plass spacing calculations. paragraf uses rustybuzz — a Rust port of HarfBuzz, the shaping engine used by Firefox, Chrome, and LibreOffice — compiled to WebAssembly.

Optical Margin Alignment

A line beginning with a quotation mark or hyphen creates a visual indent even when geometrically flush, because those glyphs are narrower than full characters. The text block looks ragged at the margin even though every line starts at the same x coordinate. Optical margin alignment corrects this by allowing punctuation to protrude fractionally into the margin — typically 0.3 to 0.7 times the font size depending on the character — so the text block appears visually straight.

Bidirectional Text

Arabic and Hebrew run right-to-left (RTL). Latin text runs left-to-right (LTR). Each Arabic letter takes a different form depending on its position within a word — the letter ع (ayn) has four distinct forms: isolated, initial, medial, and final. Mixed LTR and RTL paragraphs require the Unicode Bidirectional Algorithm to resolve the correct visual order of characters. This is not a bolt-on feature — it requires the GSUB shaping pipeline to already exist. You cannot add BiDi to a library that approximates font metrics.

Hyphenation

Without correct hyphenation patterns, Knuth-Plass has fewer valid break points and is forced toward wider spacing or tighter compression. Hyphenation rules vary significantly by language and cannot be derived from simple character patterns — English breaks "rec-ord" differently as a noun versus a verb, German compounds stack hyphenation boundaries in ways that require dictionary knowledge, Turkish has vowel harmony rules that affect syllabification. paragraf uses the Liang algorithm implemented via the hyphen npm package, covering 22 languages.

In the example above, the minimum word length to hyphenate is 5, so words with fewer than 5 characters are not hyphenated.

Styles, Layout, and Assembly

Change the body font size in pdfmake and you change one paragraph, not the document. Without a style system with inheritance, every derived style is an independent manual update. Without a layout model, page geometry and unit conversions are the caller's responsibility on every project. Without a template schema, binding data fields to content slots — and handling missing fields gracefully — requires custom code per integration. paragraf provides all of it as a coherent stack: a style system with inheritance, a layout model handling units and page geometry, a template schema with defined missing-field behaviour, and a compiler assembling everything into a single function call.

§2 — The Architecture: Monorepo, Layered, Modular

paragraf is a monorepo of 12 published packages organised in strict layers. Each layer imports only from layers below it. No exceptions.

For most developers, @paragraf/compile is the only package that matters directly. It takes a template and a data record and returns a PDF buffer. The packages beneath it are there for developers who need to work at lower levels of the stack — custom renderers, browser-side line breaking, integration with existing font pipelines.

Layer 0 carries zero-dependency shared interfaces. Layer 1 covers the algorithm and measurement packages — line breaking, font engine, page layout, style registry — all browser-safe. Layer 2 adds the Rust/WebAssembly shaper and the SVG/Canvas renderer. Layer 3 is Node-only: the paragraph compositor with OMA and BiDi, and the PDF renderer. Layer 4 is the user-facing API: the template schema and the compile pipeline.

§3 — The Approach: Spec-Driven, AI-Assisted, Test-Validated

paragraf was designed and built through a disciplined two-loop process. The outer loop covers the project: problem, scope, specifications, constraints, high-level layer architecture, diagrams and models, and a versioned roadmap with future work. The inner loop covers each package: scope definition, input/output schemas, diagrams and models, a full implementation plan with defined steps and edge cases, unit tests written before implementation, then implementation against that specification, closing with integration and end-to-end tests validating every contract.

The two loops are not independent. An issue discovered mid-package — an edge case that invalidates an assumption, a schema that cannot express a required input — feeds back into the outer loop and may revise the project scope, the architecture, or the roadmap. The project artifacts are living documents, not a fixed contract.

The implementation was AI-assisted. Claude and GitHub Copilot were used as implementation engines operating against precise specifications. No agentic framework was used — every step involved a human decision. The specifications, architectural decisions, and constraint definitions came from domain knowledge: years working with software engineering, project management, InDesign, publishing automation, and multi-agent systems research.

The quality of AI-assisted output is directly proportional to the precision of the specification the human provides. A developer without this domain background asking an AI to build a typesetting engine would get something that looks like one but breaks in ways they would not recognise.

A separate article covering this process in detail will be published — the artifacts, the feedback loops, and the specific ways domain knowledge shaped the specifications.

§4 — The Technique: TypeScript, Node, WASM, Rust

The core packages are TypeScript targeting Node.js 18+. Browser-safe packages also run in modern browsers — the live demo executes the full shaping and line-breaking pipeline client-side without a server. The OpenType shaper is Rust compiled to WebAssembly via wasm-pack: Rust was chosen for access to the rustybuzz shaping library, which would have taken months (if not years) to reimplement correctly in TypeScript. When the WASM binary is unavailable, the pipeline falls back automatically to a fontkit TypeScript path. PDF output uses pdfkit. The monorepo uses npm workspaces with tsup for package builds and Vitest for testing across all packages.

§5 — The Live Demo: See It Yourself

The live demo runs entirely in the browser — the WASM shaper, the Knuth-Plass algorithm, and the SVG renderer all execute client-side. Type your own text, adjust the tolerance and looseness sliders, and watch the line breaks recalculate in real time. Four interactive pages: line breaking with side-by-side KP vs greedy comparison, layout controls, typography showcase, and multilingual rendering across 6 scripts.

→ Live demo

The line breaking page — edit text, adjust tolerance, and see Knuth-Plass and greedy results recalculate side by side.

§6 — Future Work

paragraf's package architecture is stable and deliberate, though still open to revision as the project evolves. The 12 published packages cover the full typesetting pipeline from algorithm to PDF output, and the three remaining planned packages — @paragraf/color, @paragraf/color-wasm — address color management and print production, and @paragraf/studio — the browser-based editor template. Every feature below fits within the existing package structure as an enhancement, not a structural addition. That is a deliberate outcome of the layered architecture.

Color and print production are the most significant items ahead. ICC color profile support, CMYK color spaces, and PDF/X compliance — the ISO standard for print-ready PDF exchange — are the bridge between paragraf's current typesetting quality and full print production readiness. These depend on @paragraf/color-wasm, a Rust/LCMS2 WebAssembly package following the same pattern as the existing shaping layer.

Typographic quality features planned within existing packages include micro-typography (per-line letter-spacing adjustments as an additional optimisation degree of freedom alongside word spacing), font expansion (horizontal glyph scaling, another Knuth-Plass optimisation dimension), drop caps, small caps, optical sizes, and balanced ragged lines.

Layout and composition additions include page-level widow and orphan control (the paragraph-level penalty system exists; full page reflow is a separate pass in @paragraf/compile), vertical justification, cross-column baseline grid alignment, inline figures with text runaround, and anchored objects.

The Studio is a browser-based visual template editor — a web application in the monorepo, not a published npm package — that writes the same JSON format @paragraf/template defines. It is the non-developer interface to the compile pipeline: drag-and-drop frame layout, point-and-click style definitions, and a live PDF preview powered by the same WASM stack that runs in the demo today.

Contributions, issues, and discussions are open on GitHub. If you are working on a document pipeline and running into the limitations described in this article, opening an issue is the best place to start.

Annex: Terminology

Bleed — artwork that extends beyond the trim edge of a page to prevent white borders after cutting.

BiDi — Unicode Bidirectional Algorithm. Resolves display order for mixed left-to-right and right-to-left text.

Greedy line breaking — fills each line as fully as possible, one line at a time, with no lookahead.

ICC profile — International Color Consortium profile. Defines the color characteristics of a device for accurate color reproduction.

Knuth-Plass — optimal paragraph line-breaking algorithm (Knuth & Plass, 1981). Minimises total spacing deviation across all lines simultaneously.

OpenType shaping — processing font files to apply ligatures, kerning, and contextual letter forms defined in GSUB/GPOS tables.

Optical margin alignment (OMA) — allowing punctuation to protrude fractionally into the margin for a visually straight text edge.

PDF/X — ISO standard for print-ready PDF exchange. Guarantees color, font embedding, and other production requirements are met.

This article is the entry point to a series. Each section above corresponds to a dedicated article in this series, published as they become available. The algorithm, the WASM architecture, the compile pipeline, the AI-assisted development process, and the print production roadmap are each covered separately.