Forem: Jun Suzuki

Generative AI and its singularity zone (CNC Lab paper)

Jun Suzuki — Wed, 29 Apr 2026 10:02:55 +0000

From our image experiment with Stable Diffusion 1.5. More on this one later.

(ﾉ´ヮ´)ﾉ*:･ﾟ✧

Earlier this month, a research paper I co-wrote with Ninon Devis Salvy for the CNC Lab was published. 30+ pages, in French, academic register. Not expecting many people to actually read it... so!, this post stands as a tldr.

The CNC Lab is the R&D branch of the Centre national du cinéma et de l'image animée, France's public agency for film and moving images. They put out a call for contributions last year on two themes: AI and creation, and children's relationship to image consumption. We answered the first one.

Our paper ("Singularity and Creation in the Age of AI") runs at a single question: where does originality live when the model producing it averages everything it has seen? Philosophy has its angles: intuition under algorithms, the algorithmic image, art past aesthetics. We took the empirical route: two experiments, one on text and one on images.

How do you push generative AI past the average?

Generative models (the ones behind ChatGPT or Stable Diffusion) are trained on huge piles of existing content. By default, they produce the statistical average of that pile. Text that reads like text. Images that look like images. Nothing wrong with that for summarizing documents, answering questions, or analyzing data, where reproducibility is what you want. For more creative use cases though, this becomes a ceiling. So how do you push past it without tipping the model into incoherent noise?

How hot can language get before it breaks?

Every language model has a parameter called temperature. Low temperature means the model plays it safe and picks the most likely next word. The output looks smooth and predictable. High temperature means the model gets bolder and picks less likely words. The output gets weirder. At extreme temperature, it stops making sense and becomes pure gibberish.

GPT-4.1 exposes temperature from 0 to 2. We swept the full range in 0.1 steps across three different prompts/input queries (a synopsis, a dialogue, a storyboard description). Ten outputs per setting. For each one we measured three things:

linguistic integrity (is the output still valid language?),
lexical diversity (how varied is the vocabulary?),
semantic alignment (how close does the output stay to the prompt's meaning?).

Temperature sweep on GPT-4.1. Green: the singularity zone where outputs are diverse and still coherent. Red: the rupture regime where coherence drops.

Three regimes came out. Below 1.2 is the plateau regime, where outputs are coherent but predictable. Most of the variation the model can produce is unexploited. Between 1.2 and 1.4 is the singularity zone: outputs diverge from the default while staying readable. That's the zone worth experimenting with if you're using the model for anything creative. From 1.5 onward, the model derails: mixed languages, invented tokens, semantic drift. Past that we enter the rupture regime, where outputs are barely usable unless weirdness is the point. To give you an idea, here are three outputs from the same opening line (translated from French):

Input query: "In the Year 2125, three-quarters of the world's population have lost their sight due to…"

T = 0.0

…a swift pandemic, dubbed "the White Shadow", which spread in less than six months. Scientists were never able to determine whether its origin was natural or the product of human manipulation. The first symptoms were mild: slight eye discomfort, halos of light, then, suddenly, total blindness.

T = 1.4

…a luminous wave of unknown origin that struck the planet during the "Great Opal Flash". Within minutes, humanity was plunged into permanent darkness. Faced with this new era, everything had to be rethought: cities adapted, infrastructure was reinvented to operate without visual cues, and human exchanges evolved toward new sensory modes.

T = 2.0

…a mutation encoded by the famous "Copper Genome", spread through the atmosphere after the地astra-halopoly(headstormkrift الم produjo koll gebeur acabadou oft pegasesabay暨린어 জল Prathanator supera which myster выращ bevorzug럼 coronFillLTERstenen giàiiedades desenvol виде největ Mutó điều: Wochenendesoil…

The singularity zone is narrow and sits immediately before the rupture. It's surprisingly specific. Its exact location depends on the model. Our experiments focused on GPT-4.1. Other models have their own zone.

Newer models (like GPT-5) don't even expose the dial anymore. Every hidden dial is one less lever to push the model past its defaults.

What generalizes is not the values but the shape: a narrow band before the rupture.

Images and the Pareto frontier

For images the question is the same but the controls are different. With Stable Diffusion 1.5 we generated several thousand images from the same sci-fi opening line as above. We varied three kinds of dials:

guidance scale (how strictly the model sticks to the prompt),
latent noise injection (how much random noise we inject during generation),
controlled semantic drift (how much weight we put on specific stylistic keywords).

Then for each image we asked two things. Does it actually illustrate the brief, or would it fit any prompt? Does it look different from the default the model wants to give you? These two goals work against each other. Stick close to the prompt and the image looks average. Push for something different and it drifts off the prompt. The interesting images are the ones that do both. There's only a narrow zone where that's possible. In optimization theory, that shape is called a Pareto frontier. Again, here are snippets from our experiments:

Baseline: what the model produces when you don't push it. Literal, competent, expected, redundant over iterations.

Chaos: what "crank divergence to the max" actually produces. An image in the sense that it has pixels. Pure noise.

Sweet spot: the middle ground between the default and chaos. Evoked, not illustrated.

The baseline illustrates the prompt. Chaos is unusable. The sweet spot sits between these two regimes. It's a choice. You only reach it by accepting that there isn't a single setting that produces "good". The Pareto frontier is a set of trade-offs between specificity and singularity. The work is deciding which trade-off to make.

NB: one honest limit. The frontier is measured, but the labels (baseline, chaos, sweet spot) are aesthetic calls. LPIPS, the metric we use, captures how different two images look, not whether they're good. To judge composition, emotional impact, or fine-grained narrative coherence, we would have needed to run a qualitative human study.

What artists working with AI actually do

Both experiments converge on the same finding, which we call controlled divergence. The point here is that singularity in AI creation lives in a narrow band where the model diverges from its statistical center without losing coherence. The band is small. Push too gently and the output stays average. Push too hard and it collapses into noise.

This changes what the artist is doing. They're not writing the prompt and then picking their favorite of four outputs. They're shaping the parameter space from which outputs emerge: which dials, at what range, with which constraints, seeded how. The final image or paragraph is downstream of that shaping. In the paper we call this being an architect of generative conditions rather than an author of generated results.

This shift takes different shapes across the field. Jennifer Walshe puts it this way: "the gap between what is described and what is produced, between intent and result, is now the domain of the network." For Grégory Chatonsky, the latent space itself is "the new domain of technical imagination." Beth Coleman calls for "a generatively wild AI that exceeds the framework of predictive machine learning," not the reproductive default. Authorship has moved upstream of the output.

Against frictionless iteration

Which brings me to the part of the paper I'm most attached to. If the zone is that narrow, you don't land in it by infinite iterations. You have to commit.

When iterating is free, every choice becomes trivial. If I can regenerate this image at zero cost, my decision to keep it is barely a decision. And a workflow made of barely-decisions produces work that feels like nobody made it.

The paper's proposal against this is what we call methodical friction. Deliberately constrain the loop. Fixed iteration budgets. Locked seeds. Irreversible choices. Rules you actually have to live with. The counterintuitive claim is that friction is what makes a gesture a gesture. Remove all of it and what's left is curation at best.

In January 2025, Ninon and I took part in Wilding AI, a research residency Beth Coleman co-leads in Berlin ahead of CTM Festival. The brief was to hack, misuse and break generative AI tools. A full week of practicing friction and getting as far as possible from frictionlessness.

Surely this is not specific to AI creation. It's a useful rule for any creative process where the marginal cost of trying one more thing has collapsed. But it's especially urgent here, because the industry's whole direction of travel is toward more frictionless design. Cheaper, faster, fewer parameters, more automation. If that's all there is, we end up with interchangeable options and no authorship.
"If the point is to imitate human work at cheap cost, that's dull. If it is to explore new forms, that's when it gets interesting."

A gesture only counts when trying a different one would have cost you something.

Voilà! That's roughly the spine of the paper. You can download the full (french) version, or find it alongside other contributions from the same call on the CNC Lab website. Happy to talk about any of it.

Co-authored with my dear friend and collaborator Ninon Devis Salvy. Special thanks to Pollinations.AI for the GPU compute they provided us for our experiments.

References

Anthropic's skills playbook vs our custom knowledge layer

Jun Suzuki — Sat, 04 Apr 2026 21:54:06 +0000

Sol LeWitt, The Location of Lines, 1975

Thariq Shihipar from Anthropic's Claude Code team recently published a thread on how they use skills internally. Hundreds of them in active use, clustering into nine categories once cataloged, from library references to runbooks to CI/CD automation. I read the thread right after publishing a post about building a knowledge layer on top of CLAUDE.md to capture repo-scoped domain context. The thread answered a question I'd been sitting with: where does deep domain knowledge go when Anthropic themselves recommend keeping CLAUDE.md under 200 lines?

Turns out their team use skills for that.

Same architecture, different packaging

If we look at their nine categories, at least three are primarily knowledge containers. The action wrapper (a slash command, a trigger description) makes them discoverable. But the core content is context, not automation.

Their "Build a Gotchas Section" tip makes it explicit: the highest-signal content in any skill is the gotchas section. Add a line each time Claude trips on something. Day 1, the billing-lib skill says "How to use the internal billing library." Month 3, it has four gotchas covering proration rounding, test-mode webhook gaps, idempotency key expiration, refund ID semantics. Knowledge accumulating over time. Exactly what our .claude/knowledge/ files do.

Their queue-debugging skill uses a hub and spoke structure: a 30-line SKILL.md with a symptom-to-file routing table, and spoke files (stuck-jobs.md, dead-letters.md, retry-storms.md) for the detail. This is structurally identical to our CLAUDE.md knowledge table pointing to .claude/knowledge/*.md files. Same progressive disclosure. Same "keep the hub lean, push detail to the spokes" principle.

One primitive or two

Tbh, if this thread had existed two months ago, we might not have built a separate knowledge layer at all. Their approach handles the same problem. One extension point, one concept to learn. Simpler.

What we ended up building adds one seam: knowledge files hold domain context (what Claude should know), skills hold executable actions (what Claude should do). We do use skills with scripts and hooks. The separation just means the context lives in its own files.

For e.g., we have a knowledge file documenting a data pipeline: 7 jobs in sequence, ontology structure, S3 path conventions, Elasticsearch gotchas. That file is read by four different generic skills for running jobs, querying the database, querying ES, and accessing S3. The knowledge changes depending on which pipeline you're working on. The skills don't. In a skill-only setup, you'd either duplicate that context across skills or wrap it in a dedicated pipeline-knowledge skill, which gets you to roughly the same place.

Another example: a skill that runs our evaluation suite. 44 lines: dataset names, CLI commands, timeouts. Clean and procedural. But interpreting results requires a separate knowledge file explaining what each dataset tests, known weak spots, and the outcome of recent eval sessions, etc. When someone reads that knowledge file months later, they skip repeating the same investigation. They don't need to invoke the eval skill to get there.

Where we land

Going back to their nine categories. Library & API Reference is edge cases and code snippets for internal libraries. Incident Runbooks map symptoms to investigation steps. Both are knowledge containers wearing a skill wrapper. On the other end, CI/CD & Deployment and Scaffolding & Templates are pure automation. Data & Analysis and Code Quality & Review sit somewhere in between: reference data plus scripts, style rules plus enforcement. No one designed that split. It just showed up in the taxonomy. The knowledge/action boundary seems to surface whether you formalize it or not.

One last thought. Skills didn't exist a year ago. Claude Code launched with commands, then skills arrived, the two overlapped for a couple of months, and eventually commands folded into skills. The core abstraction for extending Claude Code changed twice in twelve months.

That's the environment we're all building in. The extra seam we added (knowledge separate from skills) is a bet on adaptability. When the next abstraction shift comes, the knowledge stays put. The wiring around it can change.

We outgrew CLAUDE.md: building a knowledge layer that compounds

Jun Suzuki — Thu, 19 Mar 2026 18:12:43 +0000

Earlier this year, Boris Cherny, the creator of Claude Code, published a thread on how he and his team use the CLI they built. A dozen tips covering everything from running parallel sessions to slash commands to subagents. The one I kept circling back to: the shared CLAUDE.md that their entire team feeds into (what he calls compound engineering, borrowing from Dan Shipper).

Our team shares a single CLAUDE.md for the Claude Code repo. We check it into git, and the whole team contributes multiple times a week. Anytime we see Claude do something incorrectly we add it to the CLAUDE.md, so Claude knows not to do it next time.

— Boris Cherny

I've been running with that idea at Wisetax, and ended up extending it into a broader knowledge layer. Here's why, what we built, and how it compounds.

README.md, CLAUDE.md, then what?

Every repo already has a README.md. That's for us humans to read: onboarding, setup, contributing.

Then there's CLAUDE.md. Essentially agent literature. It gets loaded into every Claude Code session: repo-wide conventions, common commands, guardrails, environment assumptions. Following Cherny's idea, the whole team updates it constantly. In practice, that means it grows. Fast.

In some of our repos, it hit 700+ lines. Ok but. Turns out the official docs recommend keeping CLAUDE.md under ~200 lines. Anthropic's context engineering guide frames it this way: context is a "finite resource with diminishing marginal returns". The more low-signal tokens you load, the less reliable the agent becomes. Stuffing everything into one auto-loaded file works against that.

Like humans, who have limited working memory capacity, LLMs have an "attention budget" that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount.

— Effective context engineering for AI agents, Anthropic

In our experience, domain-specific context, design rationale for particular subsystems: all of that is too detailed and too volatile for a single file. So we created .claude/knowledge/. Inside: one markdown file per topic, versioned in the repo. A place to capture tribal knowledge, the stuff that lives in developers' heads and gets lost between sessions.

Each file covers a slice of the system: a specific piece of the data pipeline, the intricacies of a retrieval layer, the motivation behind the design of the evaluation framework, etc.

In CLAUDE.md, we point to them so Claude knows when to read what:


...

## Knowledge

| File                                  | When to read                                               |
|---------------------------------------|------------------------------------------------------------|
| `.claude/knowledge/architecture.md`   | Overall system design, endpoints, routing, streaming       |
| `.claude/knowledge/agent-framework.md`| Building/modifying agents with WisebrainAgent              |
| `.claude/knowledge/autonomous-agent.md`| Working on the main chat agent, its tools, or prompts     |
| `.claude/knowledge/retrieval.md`      | Search system, semantic/keyword retrieval, corpus taxonomy |
| `.claude/knowledge/elasticsearch.md`  | ES indices, query builders, document structure             |
| `.claude/knowledge/plan-navigation.md`| BOFIP/LEGI plan traversal, plan controllers                |
| `.claude/knowledge/testing.md`        | Writing or running tests, fixtures, evaluation scripts     |
| `.claude/knowledge/evaluations.md`    | Agent evaluation datasets, evaluators, LangSmith setup     |
...

This way, Claude only reads the files it needs to. If the current PR is about evaluations, it pulls in the evaluations.md knowledge file and ignores the rest. This is what makes the approach scale: each file can go into specifics that genuinely help the agent, because only relevant files get loaded into context.

What about Claude's auto memory?

Claude Code has an auto memory feature: notes Claude writes for itself based on corrections and preferences. They are stored locally under ~/.claude/projects//memory/. It's per-machine and auto-managed. Not versioned, not shared. If a collaborator starts a session, they don't get my memory.

Repo-scoped knowledge is the opposite. It's checked in. It goes through PRs. Every engineer on the team gets the same context. Every session starts from the same baseline regardless of who's running it. The name is deliberate: "knowledge", not "memory", to keep the two separate in practice.

What to store in knowledge files

Knowledge files hold two kinds of things: know-why and know-how.

Know-why is arguably the more obvious one. I learn something about the codebase I didn't know, or had forgotten. The moment I document it, that's the last time Claude figures it out from scratch. I might forget next time. Claude won't.

At Wisetax, we curate an enriched Elasticsearch index built from a large corpus of French legal texts. At retrieval, our search system splits queries across corpus groups to mitigate language-level differences in the embedding model. Without that context, Claude doesn't know why the retrieval code fans out into three separate queries instead of one. It would figure it out eventually, after burning time and tokens reading through the code:


...

## Partitioned retrieval

Splits corpus into 3 groups to handle language style differences in embeddings:
- `law`: CGI, LPF, CIBS, CCOM, CSS, CTRAV, CMF, CCIV, CJA
- `guidelines`: BOI, BOSS, COMPTA
- `other`: EUR, INT, DOUANE, CADF, JADE, INCA, CASS, CAPP, CGIANX*, PLF, PLFSS, EXTERNAL, NOTICE

Runs each group as a parallel batch with retry (3 attempts).
...

Know-how is procedure: the sequence of jobs, the commands to run, the filenames that matter. At Wisetax, we ingest legal texts from multiple sources, each with its own pipeline. Take "BOSS" (for Bulletin officiel de la sécurité sociale), a French government publication that goes through seven distinct jobs before it's indexed and searchable:


...

## Pipeline order

1. **Crawler** (manual, docker compose in `scripts/boss_crawler/`) -> `boss-raw` ES staging index
2. **SOURCE BOSS** (pace5/3d) -> compares `boss-raw` latest timestamp vs last batch, creates new batch
3. **REGISTER BOSS** (pace4/24h) -> fetches HTML, parses versions, uploads to S3, inserts docs in PG
4. **INDEX** (pace0/1min, shared) -> indexes docs to ES index `name-of-index`
5. **PLAN BOSS** (pace5/3d) -> builds static plan from ontology config, indexes to ES
6. **VERSION BOSS** (pace4/24h) -> sets VIGUEUR/MODIFIE on versioned docs
7. **SELECT CHUNKABLE / CHUNK / EMBED / INDEX CHUNK** (shared, frequent)
...

The full file covers about a hundred lines: section codes, S3 path conventions, Elasticsearch gotchas, differences from our other main pipeline. This wasn't born from a single session. It's the accumulated map of a dense subsystem, built up over time.

Before it existed, every BOSS PR started the same way: find the relevant Notion tickets, skim through past PRs for context, paste a summary into Claude to catch it up. Now the context is sitting in a knowledge file, ready for Claude to pull in the moment BOSS comes up.

What to leave out

What files were changed (git knows this)
Config values, function signatures (read the code)
Session logs or step-by-step accounts
Deprecated features (delete from knowledge files)

The heuristic we use: "Would this help Claude understand the system 3 months from now?" If not, cut it.

The knowledge update loop

None of this works if the knowledge goes stale. What makes it compound over time: at the end of each session, we call /update-knowledge, a custom skill that reviews the conversation, checks existing knowledge files, and decides whether the repo's instruction surface needs updating.

No friction, we just run it. Half of the time, nothing changes. When it does, it's almost always a targeted edit to a knowledge file. CLAUDE.md moves rarely, maybe once every few weeks when a repo-wide rule shifts. Either way, it goes through a PR like any other change.

Here's what the skill looks like:


---
name: update-knowledge
description: Update project knowledge base after a session
---

Review the current conversation and update project knowledge.

## Steps

1. Read `CLAUDE.md` to see the knowledge table and existing file list
2. Read all files in `.claude/knowledge/`
3. Review the conversation and recent git diff
4. Update knowledge:
   - **Existing file needs update**: edit the relevant `.md`
   - **New feature/topic**: create a new `.md` and add a row
     to the Knowledge table in `CLAUDE.md`
   - **General pattern discovered**: add to `CLAUDE.md` directly
5. Cleanup pass: check all existing knowledge files for stale
   content. Trim or delete as needed.
...

Claude decides what's worth keeping, and what to cut: stale content gets trimmed, not archived.

After a few weeks of running this, the effect is hard to miss. Claude kickstarts sessions with context that used to require manual pasting. No need to tag a file or invoke a command. Claude reads the knowledge table, sees what's relevant, and loads it. The knowledge files fill in gradually, effortlessly.

Plus. The worflow changes how we think about sessions. We used to keep a single session alive as long as possible to preserve context. Now we work on a piece of the problem, run /update-knowledge, and start fresh. Shorter sessions mean a leaner context window. A leaner context means a more reliable Claude.

Every session leaves the next one a little better equipped.