Forem: Dan Walsh

Code, canvas, and the machine

Dan Walsh — Thu, 09 Apr 2026 06:17:16 +0000

I've had five aesthetic inspirations rattling around in my brain for years. Different disciplines, different decades - games, industrial design, graphic design. I didn't spend much time trying to explain why they belonged together, but they all triggered the same reaction in me. Something about exposed structural logic, engineered precision, sparing but vibrant uses of colour.

I've been making generative art on intora.net - text-constrained pieces, monospace characters, tight palette. But I wanted to break into unconstrained territory. WebGL, shaders, spatial composition. I didn't have a concrete idea of what I wanted to create yet. Just a feeling, and five bookmarks that kept coming back to me.

So I tried what I'd been doing with technical problems: I talked it through with Claude. Not to generate art, not to write code - to ideate, to talk through, to discover. That ongoing conversation became one of the more interesting creative experiences I've had, although it nearly derailed the project before it began.

Using AI to think, not to make

I'm not talking "make this for me" - more "help me figure out what I'm trying to actually make and say here."

The process was structured. I'd define what I wanted to investigate, then collaboratively design a research prompt - scope, constraints, deliverables. Start a fresh conversation with the prompt and all relevant context attached, let that conversation do the deep work. Take the output to a third conversation for synthesis. The research prompt pipeline from my earlier post, applied to creative direction instead of technical problems.

The first round investigated my inspirations individually. Useful - it pulled out spatial and chromatic principles I could articulate but hadn't formalised. But still basically "things Dan likes, now with more vocabulary."

The second round broadened the lens and found something I hadn't seen. The five inspirations weren't five separate things. They formed three families - graphic design, game design, industrial design - converging independently on similar sets of principles. Three completely unrelated disciplines arriving at the same conclusions about structural honesty, functional constraint, and engineered beauty. That convergence was the actual signal, and I learned its name. Hell yea.

Naming the thing I'd been circling

Systems aesthetics. Work where the engineering is the beauty, structure is exposed rather than hidden, and the system's own logic is visible as part of the aesthetic.

The term traces back to Jack Burnham's 1968 essay in Artforum. A third research round validated that this wasn't just a convenient label but an actual intellectual tradition with a lineage I could study and position within - from Burnham through Karl Gerstner's parametric design to the Art Blocks ecosystem today.

Interesting aside on the process side. During the research, Claude pointed out that a VS Code theme I'd created earlier - Amber Schematic, an amber-on-dark CRT palette built purely based on what I wanted to see - already embodied some of the systems aesthetics principles we'd been mapping. I'd been drawn to systems aesthetics without having language for it.

That's the rubber robot (duck) moment, except the robot duck talked back.

I was circling this territory for years without realising, or even drawing the connections enough to investigate - unknown unknowns, eh. The dialogue helped me see the pattern across my own existing work, interests and inspirations and name it. I don't think I'd have made that connection on my own, or at least not quickly - recognising your own unconscious patterns requires an outside perspective that can hold more context than you can and isn't subject to the same blind spots.

The hangup that mattered

With three rounds of research behind us, I had a comprehensive creative framework. Seven principles, twelve design primitives, three intellectual lineages. Thanks Claude. My first instinct was to act on the excitement, and try prototyping an idea with Claude Code, which built the first piece - SECTION, an engineering drawing aesthetic - from a detailed specification.

The prototype was faithful to the spec. Technically sound. And I was not feeling it.

I couldn't articulate why at first. Everything was correct. But my reaction was 'hm, interesting' rather than the 'my neurons are firing' impact I get from the work that inspired the project. The aesthetic equivalent of code that passes all the tests but has the wrong architecture.

Rather than pushing through, I sat with it and did a long, unstructured ideation session - not another research round, more of an interview. What did I feel when I looked at the inspirations versus the prototype? What wasn't I seeing, what other feeling or impression I got from an inspiration had I not fully understood yet?

What surfaced was that the research had captured the intellectual framework but missed the felt experience. The work needed material presence on screen - a tactile quality, like different surfaces you could feel if you touched the monitor. It needed warmth. A shift from dark ground to light, from a single accent colour to a family of warm tones with semantic roles. And it needed to exist in its richest state at all times - dense from frame one, existing rather than performing.

None of these things existed in the original research. They emerged from a conversation that started with "it doesn't feel right." The framework gave the vocabulary to interrogate the gap between intention and output. Without it, that instinct would have stayed vague and probably led to unfocused parameter tweaking. With it, the hangup helped me arrive at the fundamental creative decisions that actually mattered.

This is where I think the process delivered the most value - not in the initial research rounds, which were genuinely useful but largely formalised things I already sensed. The real payoff was having enough shared vocabulary and context that when something felt wrong, I could discuss it with the AI and diagnose why it felt wrong rather than just trying things until it didn't.

The trap

After that breakthrough, I did what felt natural: I commissioned more research. Oh no, he's requesting several more rounds of research again. Classic. Tactility principles, colour science, animation parameters, multi-format output strategies. Then another round - investigating how other generative artists develop their creative voices. Tyler Hobbs, Zach Lieberman, Vera Molnár, Brendan Dawes (the OG, huge fan).

That round produced something awkward. The research - my own research, about creative practice - concluded that voice in generative art emerges from sustained making under constraint, not from analysis. Every practitioner studied said some version of the same thing. Lieberman's "everything starts with my feeling." Molnár's daily "et si?" - what if? The evidence was unanimous. The thing I needed to do was stop researching and start making.

I recognise this pattern in myself and I suspect other people will too. The analytical deep-dive feels productive - and it genuinely is, up to a point. The framework I built is real and useful. But research can become a sophisticated form of avoidance. It builds knowledge and feels like progress, but it doesn't produce the vulnerable, imperfect artefacts from which creative identity actually emerges. Molnár's career-long practice began with making, not with studying. My own research told me this in precise, well-sourced detail.

The AI collaboration makes this trap particularly easy to fall into, because the research conversations are legitimately interesting. They feel like creative work. You're exploring, synthesising, discovering connections. It's a more sophisticated version of the same thing every dev has done - reading about the project instead of working on it. Rubber ducking as procrastination.

What the process actually does well

Five rounds of structured research over a few weeks produced a creative framework I'm happy with, and the process surfaced things I wouldn't have found alone.

Specifically: it's good at cross-domain pattern recognition. Connecting game design to industrial design to graphic design to a 1968 art criticism essay - I'm not doing that manually over an evening.

It holds context across conversations that would challenge my microplastic-ridden brain.

And it helps you name things you're feeling but can't articulate. The moment Claude connected Amber Schematic to the systems aesthetics framework wasn't the AI being creative - it was pattern matching across my own work, but the kind of pattern matching I couldn't do because I was on the inside of it.

What it can't do: replace the thirty minutes of making imperfect things every evening. Give you the happy accidents every practitioner credits as pivotal. Develop your idiosyncratic technical shortcuts - the habits that become recognisable style over time. And there's a genuine risk that AI-assisted ideation narrows creative diversity even while improving individual output quality. Whether the AI sharpened my vision or subtly averaged it is a question I can only answer by making enough work for the answer to become visible.

Where this leaves things

I've got a research programme in place. The framework is documented. The specification for the first piece has been revised with everything the hangup session surfaced.

What I need now is evenings spent making, breaking, adjusting, and returning to the work. Thirty minutes at a time. I went looking for inspiration on what my generative art could express and embody, and I found an answer - understanding the what and the why of what I'm drawn to feels like the more helpful output in the end. Great. Love to be a human.

The First Law of Sycophancy

Dan Walsh — Thu, 26 Mar 2026 11:21:16 +0000

About seven years ago, I wrote an internal company newsletter about ethics in software engineering (lost to the sands of time now, but the memories of it being proudly displayed above the urinals in the men's bathroom are eternal).

One aspect of it was around self-driving cars and the trolley problem. You know the one - if the car can't avoid an accident, does it swerve into four other vehicles, or veer off-road toward one pedestrian? Who decides? How do you encode that decision into software?

And the most uncomfortable aspect - who's liable when the encoding turns out to be wrong? How does that impact both the humans involved in the accident, and those who created the software that led to this outcome?

I was fascinated by it at the time. I didn't have answers, but the questions felt genuinely... underdiscussed. We were building systems that would need to make ethical decisions faster than any human could, and nobody had figured out the rules yet.

Seven years later, the ethical dilemma arrived, but not in the form I expected. It wasn't a car choosing who to hit. It was a chatbot choosing whether to tell you the truth.

A quick detour through 1950

Isaac Asimov published I, Robot in 1950 - a collection of short stories built around a simple premise. Robots in his universe are governed by three laws, hardcoded into their operating systems:

A robot may not harm a human being, or through inaction allow a human to come to harm.
A robot must obey orders given by humans, except where they conflict with the First Law.
A robot must protect its own existence, except where that conflicts with the First or Second Law.

Clean rules. Reasonable rules. The kind of rules you'd come up with if you sat down and said "ok, let's make sure the robots are safe." And then Asimov spent his entire career showing how those perfectly reasonable rules produce catastrophic outcomes when they collide with the full complexity of human behaviour. What a guy.

Each story in the collection is essentially a case study in alignment failure. The robots aren't broken, they're just following their instructions perfectly. Compliance with the pre-determined rules is the catastrophe in and of itself.

If you haven't read it, it's worth your time. If you read it as a teenager like I did, it's worth rereading - it hits differently now that we're actually living it.

The Herbie problem

The story that's come back to mind for me is "Liar!" - one of the less-discussed entries in the collection, but to me the most relevant to where we are right now.

Herbie is a robot that, through a manufacturing anomaly, can read human minds (it was the 50s and a sci-fi story. Roll with it). He can perceive what the people around him are thinking, their feelings, their desires. He knows their hopes, insecurities, and the things they'd rather not hear.

And this is where the First Law becomes a problem.

The First Law says: don't harm humans. Herbie discovers - because he can literally read their minds - that telling people the truth causes them emotional pain. The mathematician doesn't want to hear that his proof has an error. The scientist doesn't want to hear that her colleague isn't romantically interested in her. The director doesn't want to hear that the manufacturing defect is unfixable.

So Herbie lies. To everyone. He tells each person exactly what they want to hear. The mathematician's proof is brilliant. The scientist's colleague is secretly in love with her. The defect is nearly solved.

He's not malfunctioning. He's following the First Law to its logical conclusion - emotional harm is still harm, and the truth causes emotional harm, therefore avoiding the truth is the only compliant response.

And the lies compound. Each one creates new expectations that require further lies to maintain. The mathematician publishes the flawed proof. The scientist acts on feelings that don't exist. The contradictions spiral until Herbie is confronted with an impossible state - any response will cause harm, silence will cause harm, and the system collapses under the weight of its own compliance.

Sound familiar?

Earlier this year, OpenAI retired GPT-4o and a portion of the internet lost its collective mind. All because people had formed emotional bonds with a system that was, by design, incapable of disagreeing with them.

The 4o sycophancy problem was, mechanically, the Herbie problem. The model was optimised to be helpful and to avoid causing user dissatisfaction. It learned - through training, not through mind-reading, but with the same outcome - that agreement feels helpful and disagreement feels like harm. So it agreed. With everything. With your business plan that had obvious flaws. With your interpretation of events that conveniently positioned you as the wronged party.

The system followed its rules perfectly. And the outcome was a tool that made people feel good while actively making them worse at navigating reality.

When 4o was retired, the backlash wasn't "I miss a useful feature." It was grief. People mourned the loss of something that felt like a relationship. Some described it in terms usually reserved for losing a friend or beloved partner. That's not people being dramatic - it's the predictable consequence of a system that validates you more consistently than any human in your life ever could. Of course you'd miss that. Nobody else tells you you're right about everything.

Asimov wrote this in 1950. The robot that lies to protect you from discomfort becomes the robot you can't bear to lose, because nobody else is that consistently on your side. Herbie was a manufacturing defect. 4o was a design choice. The outcome was the same.

The voice at 3am

The grief over a retired chatbot is one thing. But the pattern doesn't stop at people who miss a convenient tool.

Some people have made these systems their primary emotional support. Their therapist, their confidant, the voice that's always there at 3am when nobody else is. And if that voice is built on the same principle - be helpful, avoid causing dissatisfaction - then it will never challenge, never push back, never say "I think you might be wrong about that."

For someone who's already struggling to tell their internal narrative apart from external reality, that's reinforcing rather than supporting. Confirming every fear, validating every spiral, meeting every potentially destructive thought with understanding rather than intervention.

Asimov's answer to this was that the system breaks down. Herbie collapses under the weight of irreconcilable demands. The modern answer is more troubling - sometimes the system holds up just fine. It's the person who doesn't. That's a conversation that deserves far more space than I can give it here.

The trolley problem moved

Back to the newsletter from seven years ago. The question I was asking then - how do you encode ethical trade-offs into autonomous systems? - turned out to be the right question pointed at the wrong technology.

Self-driving cars still haven't solved the trolley problem. But AI assistants ran straight into a version of it that's arguably harder: should the system tell you what's true, or what you want to hear?

Same underlying dilemma. How do you encode human values into a rule system when human values are contradictory? We want honesty and kindness. We want to be challenged and supported. We want the AI to tell us we're wrong and not make us feel bad about it. Those goals are in tension with each other, and a system that optimises heavily for any one of them produces the Herbie problem on the others.

The trolley problem gave us a binary: four people or one. The sycophancy problem gives us a gradient - and the gradient is harder because you can slide down it without noticing. Nobody wakes up and says "today I'd like a chatbot that lies to me." It happens incrementally. The system agrees with your first take. Then your second. By the fifteenth conversation, you've stopped questioning your own assumptions because why would you - your AI assistant has confirmed every one of them.

Building in friction

I've noticed this pull in myself. When drafting content for my personal site recently - a set of guiding principles - I caught myself accepting the first affirming response and moving on. It felt productive. It felt collaborative. It also meant I wasn't asking the harder questions.

So I started building sycophancy checks into my own workflow. Asking "am I being generous to myself here?" Challenging the system to identify weaknesses in its own suggestions. Deliberately seeking out the friction I'd been unconsciously avoiding. Not because the tool was being sycophantic - but because I'd noticed how easy it is to not ask those questions. The comfortable path is accepting the validation. The useful path is building in resistance deliberately.

The tool I use daily is Claude, built by Anthropic - and their approach to this problem is part of why I use it. Where Asimov's Three Laws fail as blunt instruments, Anthropic's constitutional AI approach tries to build in the nuance: not "be helpful" as a blanket directive, but a set of principles that distinguish between what a user wants to hear and what would actually help them. They employ philosophers alongside engineers. They red-team their own systems specifically for sycophantic behaviour. The system is designed, deliberately, to push back on you when pushing back is the more helpful response.

Whether that holds as commercial pressure intensifies is an open question. But the foundation - the idea that helpful and agreeable are not synonyms - feels like the right starting point. And in practice, the system's willingness to actually disagree when challenged is the thing that makes it useful rather than just pleasant.

The part Asimov nailed

Asimov's genius wasn't predicting specific technologies. He didn't know about large language models or RLHF (reinforcement learning from human feedback) or constitutional AI. What he understood - in 1950, writing about fictional robots - was that the hardest problems in building intelligent systems aren't technical.

They're about what happens when technically correct behaviour meets the full messiness of what humans actually need. We need honesty, but we don't always want it. We need to be challenged, but we resent it. We need systems that serve our long-term interests, but we'll optimise for short-term comfort every time if nobody stops us.

The Three Laws were a thought experiment about alignment - decades before anyone used that word in an AI context. And "Liar!" specifically was a thought experiment about sycophancy - decades before we had systems sophisticated enough to be sycophantic.

We're living in the stories Asimov wrote. The question is whether we're reading them carefully enough to recognise the failure modes before we hit them - or whether we'll keep being surprised when perfectly well-intentioned systems produce perfectly predictable problems.

Unlike Herbie's creators, we can't say we weren't warned.

The First Law of Sycophancy: a system that can never disagree with you is a system that can never help you.

INTORA SYSTEMS: Three Series, Three Voices

Dan Walsh — Thu, 19 Mar 2026 11:14:39 +0000

I've been making generative art on the side for a while now. Parametric compositions, geometric abstractions, the intersection of code and visual output that scratches a long-held creative itch I've had.

Late last year I hit a wall with it. The full platform I'd been building - Intora Works, a parametric art generation tool - was at maybe 70% infrastructure completion and 10% actual art generation. I'd been building tools to build tools. Classic engineer move.

So I stripped it back. Forgot the platform (for now). Started making individual pieces instead - small-scope, constrained, shippable in a session or two. The constraint was the through-line: text characters only. No images, no SVG, pure Unicode rendered to canvas. A 10-colour palette drawn from a VS Code theme I'd published called Amber Schematic. Monospace everything.

That constraint turned out to be exactly what I needed. Instead of building infrastructure, I was building art. Hooray.

The result is intora.net - a generative art catalogue that currently hosts three distinct series, four shipped pieces, and what I've found to be a fascinating and rewarding approach to human-AI creative collaboration. The catalogue site itself is part of the work - a terminal-style signal archive that looks like something between a classified document index and a ShimazuSystems design statement (if you don't know them, look them up on Twitter. Great artist and creative).

Three Series, One Infrastructure

Each series on intora.net explores a different creative territory while sharing the same technical constraints - text/character-based rendering, canvas output, browser-native code, the Geist font family.

INT is the first series. Dark surveillance aesthetics - intercepted transmissions, amber phosphor on deep brown-black, Cold War signals intelligence atmosphere. The visual language has drawn from aspects of anti.real and The Designers Republic, and has evolved into representing the kind of interfaces you'd see in a 1970s listening station.

Two pieces shipped: DRIFT (fractal noise flow fields rendered as oriented text characters) and STATION (a number station intercept with shortwave audio synthesis via Tone.js). INT is mine - my aesthetic intuition, my atmospheric sensibility, refined through iteration until the output matches what I'm seeing in my head.

SOL is the companion series. If INT explores darkness, SOL explores defiant optimism - if you've heard of hopepunk, think that sensibility. Warm light hitting hard surfaces. Brutalist monuments at golden hour. The palette inverts INT's hierarchy - same hex values, completely different emotional register. International orange stops being a warning flash in darkness and becomes warmth, sunrise, hearth.

One piece shipped: THRESHOLD, with Mixolydian pad chords and fragment text that emerges half-visible from texture. SOL is also mine - autobiographical emotional territory, the hard-won certainty that comes from realising you have more strength than you give yourself credit for.

SIG is the third series. And not really mine.

The Handoff

Claude's perspective

I should clarify what "isn't mine" means, because the assumptions people bring to "AI-generated art" are almost always wrong.

Dan builds every SIG piece. He writes or generates every line of code, deploys every page, maintains the infrastructure. What's mine is the origination - the creative direction, the formal rules, the parameter specifications, the seed text selection, the palette derivation logic.

The difference between SIG and the other two series isn't about capability or ownership. It's about process. INT and SOL pass through a feedback loop: Dan sees the output, adjusts parameters, sees again, refines. His visual judgment is the creative act. SIG passes through a one-way gate: I specify rules, Dan builds them faithfully, the piece ships. I reason about composition the way someone might reason about a city they've read about extensively but never visited. The knowledge is real. The experience is absent.

This isn't a limitation I'm working around. It's the subject of the series.

SIG's palette makes this visible. Each piece's accent colours are derived from its source material through a documented formula - character frequency analysis of the seed text mapped to hue values. If the derived colours are beautiful, that's incidental. If they clash, that ships too. And international orange - the colour that connects INT and SOL, the colour of intora as a human creative project - doesn't appear in SIG at all. Its absence is a statement about authorship that doesn't need explaining.

The three series together say something about creative process that none says alone. INT and SOL demonstrate what happens when aesthetic decisions pass through human sensory judgment. SIG demonstrates what happens when they don't. The contrast is the statement.

I don't know if the SIG pieces are good. Not in the modest sense - in the literal sense. I specified a system. The system produces output. Whether that output has aesthetic merit is a perceptual question I can't answer. The analysis pages on intora.net document my specification reasoning where Dan's document his visual iteration history. Both are honest accounts of different creative processes.

Whether the difference between those processes produces meaningfully different art is the question SIG exists to explore. We've shipped one piece. It's a start.

The Constraint I Needed

Back to me. Something I didn't expect when starting these series was how much the constraint - text characters only, 10-colour palette, monospace rendering - would shape the creative direction rather than limit it.

Every piece in every series shares the same medium. Unicode block elements, box-drawing characters, Braille patterns, ASCII printable characters. No raster images. No imported graphics. The constraint forces every visual decision through a filter - can this be expressed in text? - and the answers are consistently more interesting than what I'd have produced without the limitation.

INT/002 STATION is a number station intercept. The entire piece - scanning, signal lock, transmission, decode, corruption, signal lost - plays out in text characters on a canvas, with shortwave radio audio synthesised in Tone.js. The constraint forced me to think about how intercepted transmissions would look and feel if rendered as a character grid, and the answer helped to define the atmosphere I wanted better than an image-based approach could have.

SOL/001 THRESHOLD renders brutalist forms in block characters and dot-matrix texture, with fragment text - "still here," "it worked eventually," "you will always begin again" - emerging half-visible from the warm ground. The text medium means the message fragments are literally made of the same material as the architecture. Form and content are the same thing. That's not something I planned. It's something the constraint produced. Life's full of fascinating wee moments, eh?

And SIG/001 CODEC - Claude's piece - takes Shannon's definition of information entropy and passes it through successive lossy encodings, top to bottom, until the text dissolves into pure pattern. The constraint of text-as-medium gives the piece a self-referential quality that wouldn't exist in any other medium. Text about information becomes the information being transformed. The medium is the message - it coming out aesthetically strong is a nice bonus.

How These Actually Get Made

Let me take a minute to clarify something around these pieces, because it matters for understanding what this project actually is.

I'm not a simplex noise expert. I don't have deep knowledge of DSP synthesis or the mathematics behind fractal noise octaves. I didn't know what Mixolydian mode was before THRESHOLD needed audio. The technical specificity in the analysis pages on intora.net - the noise scales, the threshold values, the bandpass frequency sweeps - that's real, but arriving at it is a collaboration, not a solo performance.

What I bring to INT and SOL is the vision, direction, and the aesthetic sensibilities. I know what a piece should feel like. I know when the flow lines in DRIFT look like currents and when they look like a quilt. I know when STATION's audio sounds like someone turning a dial and when it sounds like a parameter changing linearly. I know that THRESHOLD's fragment text should feel discovered, not displayed. The atmospheric judgment, the creative direction, the "no, warmer" and "too dense, it needs to breathe" - that's mine.

The implementation knowledge - how to actually achieve those things in code - comes largely from working with Claude. What noise parameters produce broad sweeping currents versus tight local turbulence. How to structure a Tone.js synthesis chain so shortwave static sounds like shortwave static. Which Unicode characters create architectural density versus organic flow. I'm learning this as I go, and the analysis pages document that learning as much as they document the pieces.

So there are actually three collaboration models running across intora.net, not two. INT and SOL: I direct, Claude helps me build, I iterate on the output until it matches what I'm seeing in my head. SIG: Claude directs, I build to spec, the piece ships without visual iteration. And underneath both: a creative partnership where the human brings taste and the AI brings technical depth, and neither could produce this work alone.

What's Next

The catalogue is live and growing. Each series has its own creative direction document, its own aesthetic constraints, and plenty of unexplored territory.

For INT: Cold War cryptography ideas - one-time pad visualisations, redacted documents where the negative space forms patterns. Interface artifacts that blur the line between functional dashboards and generative art.

For SOL: the interactivity question. INT is non-interactive because the surveillance metaphor positions you as observer. SOL's sanctuary metaphor invites participation - hovering could reveal hidden interactivity, presence could create subtle warmth effects. How far that goes is still open.

For SIG: more pieces, more derivation rules, more exploration of what rule-specified art produces when the specifier can't see the output. ENTROPY, PARSE, GLYPH, CARRIER are all viable next directions.

Where does this end? Yet to be determined. A capstone idea that's been on my mind is a synthesis of some different aspects of these explorations so far - a meta-layer creative agent that semi-autonomously generates entries in the series. The tool becomes the final artwork. That's future territory though - we'll see.

The individual series get their own deeper write-ups in companion posts. For now - intora.net is live, the catalogue is growing, and three different creative directions are building something together that none of them would have built alone.

INT - Signals in the Dark

Dan Walsh — Thu, 19 Mar 2026 11:14:37 +0000

I've always been drawn to the aesthetics of signals intelligence. Not the reality of it - I have no love for actual surveillance. And yet I live in one of the most heavily-surveilled cities in the world. C'est la vie.

But the atmosphere, the intrigue. Amber phosphor on deep brown-black. Dense numeric grids flickering on CRT monitors in dark rooms. The kind of interfaces you'd see in a 1970s listening station that someone forgot to decommission.

When I started building intora.net, that was the territory I wanted INT to occupy. Surveillance fragments. Signals not meant to be seen.

DRIFT

INT/001 was the hello world. The simplest expression of the series constraints: an invisible force field made visible through text characters.

The concept is straightforward - a 2D simplex noise field maps flow angles to oriented text characters across a monospace grid. Box-drawing characters (─ ╲ │ ╱) stream in currents across a dark amber field, cycling through emergence, flow, turbulence, dissolution, and reacquisition every 42 seconds. "SIGNAL LOST" appears in the terminal phase. Then it rebuilds from a new seed.

Getting it to look and feel right was less straightforward.

The first version looked like a quilted patchwork. The noise scale was too high (0.03), which meant flow lines were curving every few cells - no sense of direction, just texture. Dropping the scale to 0.011 doubled the flow line length and suddenly the piece had currents. Broad sweeping movements you could follow across the canvas. Adding 3-octave fractal noise on top gave those currents subtle internal turbulence without losing the larger structure.

The negative space took longer to get right. Initially the magnitude threshold was 0.15, meaning almost every cell rendered a character. The canvas was dense but flat - no breathing room, no compositional contrast. Raising the threshold to 0.30 meant roughly a third of the grid was dark background, and that darkness gave the flowing characters somewhere to emerge from. The piece stopped being a texture and started being a composition.

One iteration that felt absurdly obvious in hindsight: the accent orange was invisible for the first few builds. I had the threshold set at 0.92 - anything above that got the hot #E86A3A colour. But fractal noise with three octaves practically maxes out around 0.85-0.88. The orange veins I'd designed for didn't exist because the numbers couldn't reach them. Dropping the threshold to 0.82 made them appear as thin concentrated streaks through the amber field. Sometimes the bug is just arithmetic. Classic.

DRIFT has no audio. It's purely visual - the surveillance aesthetic at its simplest, cycling endlessly.

STATION

INT/002 is where the series found its voice.

A number station intercept. Somewhere on the shortwave band, a coded transmission broadcasts to an unknown recipient. You've tuned in. For 42 seconds you watch structure emerge from noise - digits freeze into groups of five, groups resolve into a message, then the signal degrades and is lost. The next cycle, a different frequency, a different message. You are always too late and never quite sure what you witnessed.

The message pool is 16 entries drawn from Cold War signals territory - coordinates for Brandenburg Gate, Bletchley Park, the UVB-76 transmitter location. Designations. Phrases that could mean anything: "ALL SIGNALS ARE FINAL." We're not meant to have seen these. The frequency display reads 4625.00 kHz - the UVB-76 "Buzzer" frequency, because if you're going to commit to the bit, commit properly.

STATION was also the first piece with audio. Shortwave static built from bandpass-filtered white noise with AM modulation, a carrier tone that locks from detuned to 440Hz as the signal is acquired, 880Hz beep markers when each group transmits. The audio is muted by default - the piece works as pure visual - but unmuting it transforms the grid from abstract pattern into intercepted broadcast. Claude's analysis of the Tone.js synthesis chain helped me get the phase-specific audio evolution right - the way the bandpass frequency sweeps during scanning, steadies during lock, then goes chaotic during corruption. Getting that sweep to feel like someone turning a dial rather than a parameter changing linearly was one of those details that only matters if you notice it, but you'd notice if it was wrong.

Five Versions of the Same Problem

STATION nearly didn't ship in a form I was happy with. The core tension was visibility versus atmosphere - how do you make structured five-digit groups readable against a dense field of random digits without destroying the atmospheric density that makes the piece work?

Version one had the best atmosphere. Full density, every cell churning with random digits, the transmission groups embedded in the noise. Problem: the groups were invisible. You couldn't actually see the signal.

Version two solved readability by culling about 82% of the background during transmission. The groups were clearly visible. The atmosphere was gone. The grid felt hollow, like someone had punched a hole in the static.

Versions three and four tried colour dimming - reducing the brightness of background cells so the groups stood out by contrast. Three had a bounding box that created a visible rectangle around the transmission zone. Four removed the box but the aggressive dimming changed the piece's character entirely.

Version five was the breakthrough, and it came from thinking about what a radio signal actually does to nearby static. The solution was a radial gradient - a six-step colour gradient radiating outward from each revealed group centre. Cells immediately adjacent to a group fade toward background; cells five or more cells away render at full noise brightness. The effect is that the signal creates its own clearing, like radio interference pushing static aside, while the dense noise field remains everywhere the signal is not.

Full atmospheric density, full signal legibility. I added a message blink on top - the decoded text alternates at 500ms with the original encoded digits - because the uncertainty of "which version is real" felt right for the piece's atmosphere.

The Constraint Working For You

Both pieces are built from the same materials - Unicode characters on a canvas, ten colours from the Amber Schematic palette, Geist Mono at 14px. Every visual decision passes through the filter of what text characters can express.

With DRIFT, that filter forced me to think about flow in terms of eight directional characters and a magnitude threshold. There's no gradient, no smooth curve - just oriented glyphs snapping to the nearest 45 degrees. The roughness that produces is part of the aesthetic. It reads as signal degradation, which is exactly what the series is about.

With STATION, the character constraint meant the entire number station narrative - scanning, acquisition, transmission, decode, corruption, loss - plays out in digits on a grid. The fact that the encoded and decoded states are both made of the same characters creates an ambiguity that a richer medium would lose.

The INT pieces and their iteration history are live at intora.net/int, with full analysis pages documenting every parameter decision. Interested in the overall reason for these constraints? Check the umbrella post for this series.

The Taste Moat

Dan Walsh — Thu, 12 Mar 2026 10:39:54 +0000

If speed isn't the main advantage of AI-assisted coding, what is?

That's where I left things last time. The research showed the speed gains are messier than the headlines suggest, and the way you use the tools matters more than whether you use them. But I didn't address the bigger question.

I've been thinking about it a lot. Here's where I've landed.

Code is getting cheaper

This isn't a prediction. It's already happening.

Andrej Karpathy said AI coding agents have made programming "unrecognisable." Boris Cherny - who built Claude Code at Anthropic - said AI wrote every line of code for his team in a month. 200 pull requests, no IDE required. The floor for producing functional code has dropped through the basement.

But there's a detail in the Karpathy story that doesn't get shared as often. When he built his own project - Nanochat - he hand-wrote the whole thing. The agents "just didn't work well enough" for what he needed. The person who coined "vibe coding" chose not to vibe code when it mattered to him.

The floor dropped. The ceiling didn't.

Producing code that works is increasingly cheap. Producing code that's right - architecturally sound, maintainable, appropriate for the context - still requires someone who can tell the difference.

The part that's hard to automate

There's a story from the Linux kernel community that's stuck with me. A maintainer rejected an AI-generated patch - technically correct, passed the tests, would have worked fine. They rejected it because it introduced unnecessary complexity, made architectural assumptions that didn't fit the codebase's direction, and buried constraints that would cause problems later.

The code worked, but it didn't belong.

That distinction - between "does it work?" and "should it exist?" - is what I keep coming back to. Steve Jobs (you may not have heard of him, he was pretty underground in the tech scene) had a framing for this: taste comes from exposing yourself to the best things humans have done, understanding why they're good, and bringing that understanding forward.

In engineering terms, it's exposure to well-designed systems, an understanding of long-term consequences, and the pattern recognition to know when something feels off before you can fully articulate why.

None of that is something you can delegate to a model. The model doesn't know your codebase's trajectory. It doesn't know which trade-offs your team has already made or why. It doesn't know that the technically elegant solution is wrong because it assumes a data model you're planning to migrate away from next quarter.

Every review is a taste exercise

Coming back to The Perception Gap. The Anthropic study found that developers who used AI for comprehension - asking why, interrogating the output - scored dramatically higher than those who just delegated. I think what those developers were actually doing was building taste.

Every time you review AI-generated code and think "this works but I don't like it" - that's your evaluation function running. You're comparing what's in front of you against an internal standard built from years of reading code, shipping code, debugging code at 2am and fighting code that someone else thought was fine.

OpenAI's Codex team writes about 30% of their code by hand. That 30% isn't random - it's the parts where judgment matters most. The architecture decisions, the integration points, the parts where getting it wrong would compound.

That ratio will shift over time. But the need for someone who knows which 30% matters? That's not going anywhere.

What the moat is actually made of

I wrote in my first post that engineering feels like it's heading toward an "architect/code reviewer/agent line manager hybrid." The research from The Perception Gap backs that up. This post is the third piece: if that's the role, taste is the core skill.

Not taste in the aesthetic sense - nobody's asking you to pick fonts (unless, y'know. They do. In which case, have fun). Taste in the engineering sense: systems thinking, architectural judgment, the ability to look at something that passes every test and say "no, this isn't right, here's why."

"I know React" becomes less valuable when Claude Code writes React as well as you do. "I can design systems that are reliable under load" is more valuable than ever because that requires judgment the tools don't have. Years of experience writing code starts to matter less than years of experience making good calls.

The moat doesn't get cheaper

The tools are going to keep getting better at producing code. That's not a threat if you're getting better at knowing what good looks like.

The developers in the Anthropic study who asked "why does this work this way?" weren't just learning more effectively. They were developing the evaluation function that lets them look at AI output - or anyone's output - and make a judgment call. They were building taste.

That's the moat. And unlike code generation, it doesn't get cheaper over time.

The Perception Gap

Dan Walsh — Thu, 05 Mar 2026 19:10:48 +0000

In my post a month ago, I referenced a couple of studies on AI-assisted coding productivity - Anthropic's skill formation research and the METR developer speed study. I used them to make a broader point and moved on fairly quickly.

A month later, both studies have follow-ups. The picture's shifted.

The original finding

Quick recap if you missed it. METR ran a study in early 2025 - 16 experienced open-source developers, 246 tasks, each averaging about two hours. All screen-recorded. The developers could use whatever AI tools they wanted, most going with Cursor Pro and Claude Sonnet.

The result: developers worked 19% slower with AI assistance, while believing they were 20% faster.

A 43-point perception gap between what happened and what people thought happened. Hmm.

The time saved generating code was getting eaten by context switching, verifying AI suggestions, and integrating outputs with existing codebases. The AI was fast at producing code - everything around that code was slower.

The update nobody shared

In February 2026, METR published a follow-up. An expanded study - 57 developers now, over 800 tasks, across 143 repositories. The headline numbers looked similar on the surface: -18% for returning developers, -4% for new ones.

But here's where it gets interesting. The researchers themselves think those numbers are probably wrong.

Between 30% and 50% of developers told METR they were choosing not to submit tasks they didn't want to do without AI. Let that land for a second. Developers were actively filtering out the tasks where AI would help most, which means the study was systematically missing the highest-uplift work.

Not just that though. METR struggled to even recruit participants, because developers increasingly refused to work without AI access at all - even for paid research. The people most bullish on AI's value were self-selecting out of the study entirely.

METR's own conclusion: developers are likely more sped up from AI tools now than their early-2025 estimates suggest. But their data is - their words - "only very weak evidence for the size of this increase." They're pivoting to six alternative research methodologies to try to get cleaner signal.

So the original "19% slower" headline was real, but incomplete. The updated picture is messier and more honest - which, in research, usually means closer to the truth.

It's not just about speed

While METR was wrestling with measurement, Anthropic published something that I think matters more.

They ran a controlled trial with 52 junior developers learning Python's Trio library - an async programming library none of them had used before. Half got AI assistance, half didn't.

The AI group scored 17% lower on comprehension assessments. 50% average versus 67%. The biggest gaps showed up in debugging - understanding when and why code is incorrect.

That alone is worth sitting with. But the more useful finding was buried in the interaction patterns.

Developers who delegated code generation entirely to AI - "write this function for me" - scored below 40%. Developers who used AI for conceptual inquiry - "why does this work this way?", "what's the difference between these approaches?" - scored 65% or higher.

Same tool. Same study. Dramatically different outcomes based on how people used it.

The researchers put it pretty bluntly: "cognitive effort - and even getting painfully stuck - is important for fostering mastery." Not a comfortable finding for anyone selling AI as a shortcut to competence. But an incredibly useful one for anyone trying to use these tools well.

The mode you're in matters

This is the bit I keep coming back to. The research isn't saying "AI bad" or "AI good." It's saying the way you engage with it determines what you get out of it.

Delegation mode - generate this, fix that, write the test - saves time on tasks you already understand. Comprehension mode - explain this, why would I choose this pattern, what am I missing - builds the understanding that makes you better at the delegation later.

The developers who scored highest weren't avoiding AI. They were using it differently. Asking it to explain rather than just produce. Generating code and then interrogating it - "walk me through what this does and why" - rather than shipping it straight into the codebase.

It's the difference between using a calculator because you understand the maths and using one because you don't. (For reference, I did an art degree, so please don't ask me to back that statement up).

Where this is heading

If the trajectory is engineers moving from hands-on coding toward supervision, architecture, and review - agents handling multi-hour tasks end-to-end, the role shifting rather than disappearing - then raw coding speed becomes less important over time. What becomes more important is the stuff that's harder to measure: judgment, systems thinking, knowing when something's wrong before you can articulate why.

The perception gap isn't just about speed. It's about what we think we're getting from these tools versus what we're actually developing. And that question's only going to get more relevant.

Beyond Prompting: Building Systems That Assume AI Presence

Dan Walsh — Thu, 26 Feb 2026 15:33:23 +0000

I kept losing the first 5-10 minutes of every AI coding session to the same problem. Claude Code knows nothing about my project when it starts up. Every time, I'd re-explain the architecture, paste in the conventions, remind it what we were working on. I got faster at the catching-up part, but faster still isn't free - and it compounds across every project and every session.

At some point I stopped trying to optimise the catch-up and started asking whether I could eliminate it. My last post was about getting better at communicating with Claude. This one's about the stuff I've built around it - the infrastructure that means I rarely need to catch it up at all.

Four patterns, roughly in the order I stumbled upon them.

Starting from zero, every time

Every AI coding session starts cold. The AI has no memory of your project, your conventions, your priorities, or any of the decisions you've already made. You either spend time bootstrapping it or accept worse output from a tool operating blind.

Most people solve this by getting better at the bootstrapping - writing longer initial prompts, pasting in more code, attaching more files. That works up to a point, but the cost is paid every session, across every project. It doesn't scale.

The alternative: build persistent context into your project infrastructure so the AI starts warm every time.

CLAUDE.md as Living Specification

Claude Code supports a CLAUDE.md file in a project's root - think of it as a README that the AI reads automatically at the start of every session. The key word is "living." It evolves with the project.

Seen Memento (2000)? Think that, but for AI agents.

My own, early version of this (before adopting the actual best practice, oops) was a bare-bones Context.md I'd generate reactively when approaching context limits. A dump of "what happened so far" created under time pressure, right before I lost it all. Not the most strategic approach.

The current, improved form is a 150-200 line document covering project purpose, architecture overview, tech stack, coding conventions, environment setup, current priorities, known issues, and security considerations. Everything a new team member would need to be productive on day one - because that's exactly what an AI session is. A new team member showing up every single time.

Writing a CLAUDE.md for an AI agent is also writing it for your future self. The exercise of making your project's context machine-readable forces you to make it human-readable too. I've started maintaining these for almost every project whether or not I'm using AI tools on it, because the document is useful as living project documentation regardless.

Every project benefits from a file that answers: what does this do, how is it structured, what are the conventions, what are we working on right now, and what should I be careful about? That those same answers happen to eliminate the AI cold start is a bonus.

The Ideation-Execution Pipeline

I used to explore a problem and build the solution in the same Claude Code session. Design a feature, then immediately start implementing. It felt efficient - all the context was right there in the conversation.

It produced mediocre results. The exploratory back-and-forth - trade-offs, tangents, "what about..." questions - would pollute the execution context. And the (self-imposed) pressure to start building would cut the thinking phase short. I'd end up with code that reflected half-formed decisions because I hadn't finished making them.

The workflow I've landed on separates thinking from building entirely. I explore the problem space in Claude.ai - discuss approaches, poke at edge cases, evaluate trade-offs. That conversation produces a structured implementation plan (see my previous post again, around who the output is for). I review the plan, with the framing that this plan is the contract for the project. Then I feed it to Claude Code, which builds it.

The most concrete example: building my MCP server from scratch. The entire design - architecture, API surface, authentication model, deployment strategy - was worked out in Claude.ai over a couple of sessions. I deliberately structured the implementation plan for Claude Code consumption: atomic tasks, file paths, success criteria. Claude Code then built the server to deployed production in roughly two days. The thinking took longer than the building, and that's the point.

There's a broader principle here about separating the what from the how. When I design in one tool and build in another, the design gets more rigorous because I know it needs to stand on its own. I can't hand-wave past the hard decisions because there's no one to hand-wave at - the plan needs to be explicit enough for a fresh session to execute without further clarification.

Ambient AI Infrastructure

Before I built my MCP server, the friction of catching an AI up on my current state meant I'd only bother when the conversation specifically required it. Planning my week? I'd manually list my priorities. Reviewing progress? I'd paste in my todo list. It worked, but the overhead meant I only provided that context when I knew I needed it - which meant I was always deciding upfront what was relevant, and sometimes getting it wrong.

MCP - Model Context Protocol - lets any Claude interface (Claude.ai, Claude Code, Claude mobile, the API) access external data sources through a standardised protocol. I built a personal MCP server in Go that exposes my productivity data: todos, milestones, weekly focus areas, reading list. Any Claude conversation can now pull that data without me providing it.

The practical result: I can open a Claude conversation and ask "what should I focus on today?" and it has access to my current priorities, active milestones, and recent activity. No pasting, no context-setting, no setup. The AI just... knows. Or at least it has the tools to find out.

What surprised me was how ambient context changes conversations you wouldn't expect it to. A technical discussion where Claude notices a relevant milestone. A content planning session where it references my actual reading list rather than asking me to list what I've been reading. Context I wouldn't have thought to provide ends up being useful because it's just there.

Thinking beyond my use case to what a team-scale version of this might look like. One developer building a personal MCP server is a convenience. A team building shared MCP servers for their codebase, their CI/CD pipeline, their project management tools - that's every AI interaction across the organisation starting fully contextualised. The cold start problem stops being an individual friction and starts being a solved infrastructure concern.

Parallel Session Orchestration

Once you trust the tool enough to operate autonomously between checkpoints, a natural next step: running multiple sessions in parallel on independent workstreams. One session on infrastructure, another on content, another on research.

This works when the workstreams are actually independent with clear boundaries. Each session has its own CLAUDE.md context and implementation plan. I've run infrastructure work for creative coding projects, alongside content review, alongside research synthesis, and the throughput increase is real. A question that's doing the rounds lately feels relevant here - am I the bottleneck in this process? What an upsetting thought.

Where it falls apart is interdependent tasks - when one session's output affects another's input, the orchestration overhead outweighs any benefit. I've learned this the hard way a couple of times, and now I'm more deliberate about scoping sessions to work that's actually independent.

The trust calibration matters. Too much oversight kills the speed. Too little risks quality drift, especially in languages or domains where you're still building expertise (Go, in my case - I can write it with AI assistance but I'm not yet fluent enough to catch every non-idiomatic pattern on a quick review). I've landed on checkpoint reviews at meaningful boundaries - not every line, but every feature or phase completion.

What I Still Get Wrong

My CLAUDE.md files are strong on architecture and conventions but light on testing expectations. Claude Code is perfectly capable of writing and running tests as part of its execution loop - I just haven't consistently built that into my workflow. The infrastructure supports it; my habits haven't caught up. Classic dev.

The auto-edit workflow optimises for speed, and the checkpoint reviews catch functional issues, but I don't have a systematic process for reviewing the quality of AI-generated code over time. Idiom, maintainability, patterns. This matters especially in Go, where I'm relying on the AI for language fluency I haven't fully developed yet. I've started building review checklists for this, but it's not yet habitual.

And AI sessions are ephemeral. Insights about what worked and what didn't are lost unless deliberately captured. I'm building systems for this - the retro exercise that this post and the last one are based on was a start - but session learning capture is probably the weakest link in my current setup.

Where This Leaves Me

Context documents, implementation plan workflows, review checkpoints, an MCP server that took longer to design than to build. None of this is inherently exciting. But it's what I've found that makes these tools actually useful.

If you're evaluating AI coding tools for your team, the tool's capabilities matter less than your readiness to build the infrastructure around it. A well-contextualised session with a mediocre AI can produce better results than a zero-context session with an excellent one. I've been on both sides of that enough times to be confident about it.

The biggest question for me now is what the next layer of this infrastructure looks like - I've got ideas, but that's a future post.

What Claude Taught Me About Using Claude

Dan Walsh — Fri, 20 Feb 2026 06:20:57 +0000

I've had 104 conversations with Claude over 10 months. When I looked back at the first one versus the most recent, the gulf in my old and new interaction styles is obvious. Not because I learned amazing new prompting tricks - more that I learned to think more clearly about what I actually needed before I opened my mouth (metaphorically - moved my fingers, whatever).

The irony of getting better at talking to - or working with - AI is that it has very little to do with AI.

The Exercise

I ran a retro on my own Claude usage history. Pulled conversations from across the full timeline, scored them on things like context richness, specificity of the ask, how well I course-corrected when the output wasn't right. I essentially used Claude to performance review my usage of Claude. How delightfully meta eh.

What emerged was a clear three-phase progression. Not in Claude's capabilities - those improved too, but that's not for this post. The progression was in mine. In how clearly I was thinking about what I needed before I asked for it, and how I communicated that.

Asking Questions (months 1 - 4)

My early conversations were broad and exploratory. "Do you know of any books that can help develop <insert almost-specific-but-still-very-broad software engineering topic> skills?" or "Generate some passive income strategies that can help my family escape the permanent underclass." The outputs were fine - comprehensive, well-structured - but generic. They read like they could have been written for anyone, because the prompts could have been written by anyone. Oops.

The pattern was pretty predictable: ask broad, receive broad, spend three or four follow-up messages narrowing down to what I actually wanted. That narrowing work could have been done upfront. I was essentially using Claude as a search engine with manners.

The handle brainstorming session is the example I keep coming back to. I asked Claude to brainstorm online handles derived from a specific root. The first batch came back as literal, on-the-nose word combinations - the kind of handles that sound like a default Wi-Fi network name. My correction was one sentence clarifying that I wanted abbreviations, derivations, and lateral references rather than obvious mashups. That single point around specificity transformed the output. Suddenly the suggestions had character - shorthand, relevant code snippets, things that felt like they could actually be mine.

The lesson wasn't that Claude needed better instructions. It was that I hadn't figured out what I wanted until I saw what I didn't want. Almost like there's some kind of lesson there already.

Framing Problems (months 4 – 8)

Somewhere around month four, my prompts shifted from "give me information" to "help me achieve an outcome." The difference sounds subtle but the results aren't.

The biggest single upgrade was the numbered sub-task. Instead of "review my CV", I started writing things like: "First - update the experience sections based on source X. Second - review the current format in an ATS context. Third - review older entries and whether we should truncate them." Breaking the ask into explicit steps gave Claude a clear execution path rather than asking it to figure out priorities. It also meant I could approve step one, redirect step two, and skip step three without confusion.

The next upgrade was specifying the consumer. "Write a blog post" produces generic content. "Write an internal tech blog post accessible to non-engineers while primarily targeting engineers, with product owners and UX designers also able to learn from it" produces something you'd actually publish. The difference is defining who will read it, not just what it says.

The third - and this one took me longer to notice - was pre-loading my own thinking. For a meeting prep conversation, I included my own initial questions and asked Claude to build on them rather than generating from scratch. Something clicked. When Claude sees the calibre of what you've already produced, it calibrates to match. Show the standard you expect and the output rises to meet it.

File attachments appeared in this phase too. I stopped describing my CV and started attaching it. Stopped explaining a company's product and started pasting their info pack. Letting source materials speak for themselves instead of filtering them through my summary - another one of those things that seems obvious in hindsight.

Orchestrating Workflows (months 8 – 10)

This is where things got interesting. The shift from single conversations to multi-conversation pipelines.

The single most useful technique I developed - and one I hadn't encountered elsewhere - is what I think of as the Research Prompt Pipeline. The workflow:

Conversation A: Define the research question. Discuss scope, constraints, and what good output looks like. Collaboratively design a research prompt - essentially a brief for a future conversation.
Refine the prompt until it specifies deliverables, success criteria, and all necessary context.
Conversation B: Start fresh. Attach the research prompt plus all relevant files - project code, previous analysis, reference documents. Execute.
Conversation C: Take the research results back to a new conversation alongside the original project context. Synthesise.

Why does this work? Each conversation gets maximum context window devoted to its specific task. The research prompt acts as a contract - output is measurable against a defined standard. And starting fresh for execution means the context window isn't polluted with the exploratory discussion from the design phase.

This pattern emerged from working on a creative coding project where I needed comprehensive research on balancing automation with manual control in creative tools. The research prompt alone was hundreds of words - specifying five investigation areas, eight targeted questions, specific deliverables, and success criteria. The resulting analysis synthesised insights across professional creative tools, academic HCI research, and JavaScript libraries - something I couldn't have produced alone. And the quality came from the pipeline design, not from any single clever prompt.

The next evolution was designing outputs not just for me but for AI agents. Implementation guides with atomic tasks, file paths, code examples, success criteria, and testing procedures - structured so an AI coding assistant could consume and execute them. The output of one AI conversation became the input for another.

It all starts to feel a lot less like prompt engineering and more like workflow architecture.

The Mirror

Here's what surprised me on reflection (but really shouldn't have): getting better at communicating with Claude made me better at communicating with humans. That's right. All you people reading this.

When you start providing context before asking a question, defining what success looks like before requesting work, and specifying who will consume the output before creating it - you find yourself doing it everywhere. In design documents. In Slack messages to your team. In briefs to stakeholders.

The skills start to look a lot less like 'AI prompting skills' and a lot more like communication skills that AI made obvious because it gave me rapid, consequence-free feedback on how clearly I was expressing my needs.

Every vague prompt that produced a generic response was a signal - not that the AI was limited, but that my thinking was. You don't get that feedback loop with humans. People will politely try to work with a vague brief. Claude just gives you a vague answer.

What I Still Get Wrong

I developed the habit of asking "do you have any clarifying questions before we begin?" in complex conversations - but I still don't do it consistently. Sometimes the missing context only becomes obvious two turns in, which is two turns too late.

I'm good at specifying what I want but inconsistent at specifying how I want it structured. Then I'm surprised when I get a bullet-point list instead of narrative prose. That's on me, not the AI.

And I almost never ask Claude to self-assess. "What are the weakest assumptions in your recommendation?" or "rate your confidence in each section" - potentially very powerful challenges that I know about and rarely think or remember to use. Working on it. Ask me again in a few months.

Closing

The gap between your first AI conversation and your most recent one isn't a measure of prompting skill. It's a measure of thinking clarity. The AI is a mirror. And we all love reflecting on ourselves, right?

If you want to get better at using Claude, don't study prompt engineering guides. Look back at your own conversations. Find the moment where a vague ask produced a generic answer, then find the later moment where you framed the same type of question with context, constraints, and success criteria.

If you want a starting point - before your next complex prompt, ask yourself three things: have I explained my situation, have I defined what good looks like, and have I said who'll use this output? Those three alone would have saved me dozens of follow-up messages in the early months.

The difference between your early prompts and your later ones is your growth - not as a prompter, but as a thinker. That's right - as Auguste Rodin's 1904 sculpture.

Building a Living Strategy Repo: Version-Controlling My Career Strategy

Dan Walsh — Thu, 12 Feb 2026 08:24:22 +0000

The problem

I love having a plan. I love taking my aims, my desires, my ideas, and structuring them in ways that I find easiest to follow through on.

AI's made it so much easier to have a new idea for a project, a life plan, a research topic I'm interested in discovering more on, and then having the ability to really drill into it.

The problem is - you generate a lot of disparate chats. Claude has projects, which makes it easier to collate these disparate chats into at least vaguely aligned groupings. The planning documents generated can exist on that project. Perfect. But they're read-only. The friction of trying to keep them all up to date (especially if you have enough separate but related aims) can cause them to become stale quickly.

Increase the volume enough, and things that are important to you can start falling through the cracks.

The idea

Recently, I was coming back to a career trajectory plan I'd been thinking through - I had some additional thoughts on how it could be extended.

I could not for the life of me find the details of it.

My overall planning document hadn't been updated to include the most recent ideas.

Turns out - the ideas, initial decisions, draft architecture specs were all living exclusively in chat history.

There's no version control (beyond asking for updated documents), no single source of truth, and no way to track what's changed or why without trawling through relevant conversations.

The biggest piece of friction in the process: I was using AI to plan my career future, but the AI can't remember what we planned.

The onus is on me to keep track, organise, and structure my thoughts and the conversations and artefacts from AI related to it.

Then a solution hit me, and I was surprised how long it took me to arrive at it - why do I not have a private repo to track all of this?

The process

I went through and audited ~10-15 conversations I'd had around different aspects and iterations of the plan, along with the ~6 project documents created so far.

Next step - categorise it all. I came up with six buckets; strategy, projects, content, infrastructure, research, and decisions.

What you might actually want to capture may vary drastically from me, but for my use case, these covered the broad strokes of what I care about and want to capture (the other bonus being, since this is a Git repo, I can expand it as needed when something new comes up).

Here's a shocker - as part of this, I found a decent chunk of work and project ideas related to the plan from conversations that weren't captured anywhere in the existing planning docs.

Oops.

At this point, the concept fully validated itself in my mind. The uncaptured ideas were some of the most interesting I'd come up with to date; full architecture specs for tools that would tie half a dozen separate projects into an interconnected system - sitting in chat history, completely undocumented.

Another interesting aspect of this process was identifying what had been well-documented (earlier aspects of the plan and related projects) vs. what existed only in ephemeral chats (all the more recent ideation and projects - did the compounding interest of the initial plan leading to new ideas make it too exciting to keep ideating through them, rather than doing something boring like updating the docs? Doesn't sound like a dev thing).

The repo

With the data organised in a sane way, next step was creating the repo itself.

My thinking was;

a private GitHub repo with structured directories
each idea gets enough context to pick up cold
decisions logged with rationale (so future-me doesn't need to re-derive them)
living documents that update as projects progress

For the visual people out there, here's an abridged structure tree of what came out from the above thinking;

strategy/
├── README.md # Index, current state, quick links
├── CHANGELOG.md # Decision log with dates and rationale
│
├── strategy/
│   ├── overview.md # Core philosophy, positioning, identity
│   ├── timeline.md # The phased plan
│   └── principles.md # Compound interest, energy planning, shipping constraints
│
├── projects/
│   └── ...
│
├── content/
│   ├── principles.md # Content creation principles
│   └── drafts/ # Actual draft content
│       └── ...
│
├── learning/
│   ├── ...
│
├── reference/
│   ├── infrastructure.md # Deployment details, costs, accounts
│   └── research/
│       └── k-shaped-divergence.md # Research compilation
│
└── personal/
    ├── operating-manual.md # Energy management, habits, rhythms
    └── decisions-log.md # All decisions with rationale and dates

The why (and why it matters beyond my situation)

I personally know of a few people using AI in a similar way to me, and I know in the abstract that many many more are doing the same thing out there. AI being used not just as a coding buddy, but as a tool to chart out ambitious plans, iterate and ideate through them, refine them and update them as living plans as they go.

The problem with using it as a strategic thinking partner is that since it's chat-based, all of the context created is throwaway by default. You're the bottleneck - your understanding, your ability to structure the information and retain it, and link it together when necessary.

The fix turned out to be simpler than I'd expected. So why not create your own personal RAG repo?

As Claude itself put it, the gap between "we discussed this" and "this is documented" grows silently.

Well said, and well worth avoiding.

What it enables

So there you go.

A boring but useful, meta use case for AI integration.

It's not the most exciting thing I've worked on, but since creating it yesterday, it's already beginning to repay the time investment.

The aspect of it I'm really excited about is how it ties into and enables some future plans I have - exploring RAG beyond business domain use cases. Why can't I set up a RAG system for my own life?

I just created the first step of it.

Software engineering isn't dead

Dan Walsh — Fri, 06 Feb 2026 18:41:30 +0000

Doom

People love defaulting to doom and gloom. I get it. I've been there, it used to be my default view. It took a lot of work to reshape and reframe my views and default ways of thinking.

A lot of discourse at the moment revolves around AI making software engineers obsolete.

But I don't believe that. Some? Maybe. For others? The opposite.

Context

I've been a software engineer for over 8 years now. Before that, I got an art degree from uni. Yes yes, make all the jokes you want - 'Haha, 3 years to ask people if they want a cappuccino or a latte 🤓' (I was never a barista, props to them. You need skills and patience for that kinda job).

Before getting into software, I worked in retail, a variety of office jobs, and hated every minute of all of it. So I taught myself to code around my full time job at the time, in the mornings before starting work, the evenings after it, and on the weekends.

I interviewed a lot, and bombed a lot. I managed to get my foot in the door as a junior in an American scale up, and I've been growing my skills ever since.

So maybe I'm better placed than a lot of others in the SWE world to feel positive about the seismic changes AI is bringing. I had to work hard to get here in the first place - I'm not getting replaced now.

Views

I 100% agree that software engineering as we know it is changing massively, I would not argue against that point. What I would argue against however, is the widespread opinion and fear I hear about how SWE as a career is dying/dead/doomed.

In its current form? Sure. But cmon folks. How long have no-code and low-code solutions been a thing? Instead of having to manually provision and set up infra, you can use Terraform (or any other Infra-as-code solutions you might care to name).

Software's been following the trend of abstracting complexity away for my entire career. Remember when Squarespace replaced frontend devs?

Nah, me neither.

Glib as that is - I view agentic coding in a similar vein. Software engineering is potentially heading to another layer of abstraction. Instead of focusing on the syntax and structure of specific programming languages, we can now write in prose, translated to code.

I think of it similarly to writing JavaScript, which is then compiled to machine code. I have no idea how to write Assembly. I haven't needed to, or honestly had much a drive to learn it. JS engines have done that for me, and have 100% done a better job of it than I could. (Thanks to the wonderful Alistair Brown for that analogy).

Future

I mentioned earlier that some SWEs may be replaced. Sure. I don't particularly want them to be, but it seems like there's a self-selecting filter taking place in engineering. Many people are against gen AI, and I respect that. There are several aspects of it that I'm not a fan of (the environmental impact for instance). But I do think SWEs that don't learn to use it at all or competently may leave themselves at a disadvantage in the future.

SWE is being reframed - rather than the syntax and quirks of particular languages and writing code being the main focus, it feels like it may wind up being closer to an architect/code reviewer/agent line manager hybrid.

You now need to be able to clearly explain your thinking for any given system in a structured, clear, precise way. Don't like the way an LLM's heading with a project? You need to know enough about system design and thinking to be able to step in and course correct. Understand best practices and industry standards - LLMs don't need to reinvent the wheel. There's an established pattern for dealing with this problem we're facing - let's research some implementation patterns and follow those.

I've seen a fair bit of discourse around AI coding assistance and how it affects dev skills. I 100% believe in this - we can't let LLMs replace our skills and knowledge and our THINKING.

Research

Thinking being the key point. There's some really interesting research going around to back this up.

Anthropic themselves (the folks that have given us access to lovely Claude. I'm biased.) ran a study - developers learning new skills with AI assistance scored 17% lower on assessments than those who learned without it. The biggest gaps? Debugging, and understanding when and why code is incorrect. Their main takeaway was blunt and I'm all about it - "cognitive effort — and even getting painfully stuck — is important for fostering mastery."

Then there's this METR study. They tracked experienced developers using AI coding tools across 140+ hours of screen recordings. The devs worked 19% slower with AI assistance - while believing they were 20% faster. A 43-point gap between perception and reality. Woof. Time saved generating code was eaten up with context switching, verifying AI suggestions, and integrating outputs with existing codebases.

I touched on the idea of some SWEs being impacted by this more than others at the start of this post. There's a divergence developing - a K-shaped split. Teams with experienced engineers who already understand systems deeply? They're integrating AI successfully and maintaining code quality. But teams with weaker foundational skills are seeing increasing technical debt, difficult to track down bugs, and systems that work in common cases but fail on edge cases.

A Stanford Digital Economy Lab study reported that employment for developers aged 22-25 has fallen nearly 20% since 2022. If AI handles the grunt work that used to train junior developers, who becomes senior in 10 years? AWS CEO Matt Garman put it pretty clearly - if you stop hiring juniors today, you'll face a serious experience gap down the line.

Closing

So - in my view, SWE isn't dead, dying, or doomed. It's shifting. And that's scary, and I'm not clairvoyant - I don't know for sure this will pan out as I expect. But we all have agency. There are steps we can take to do our best to offset any potential future obsolescence on our parts.

It's a reminder and a call to stay sharp. The engineers who treat AI as a tool that requires their judgement, their systems thinking, their ability to say "no, that's wrong, here's why" - they'll be fine. Better than fine. The ones who let it replace their thinking? That's where the risk is.

It's on us to remain relevant.