Forem: Bill Hong

I regenerated 4 character portraits with GPT Image 2.0: signup +5%, chat engagement +8%

Bill Hong — Mon, 27 Apr 2026 14:33:15 +0000

On April 23 I regenerated the four character portraits on Tendera, the character app I've been building. The new ones came out of ChatGPT (GPT Image 2.0). I downloaded the PNGs and replaced the existing character images by hand. Tendera doesn't ship its own image-gen pipeline; this was f
our file uploads.

Nothing else changed. Same character system prompts. Same UI. Same chat backend.

Three days later I checked two numbers:

Visitor-to-signup rate: up about 5%
Visitor-to-chat rate (counts both guest preview and post-signup chats): up about 8%

These are two different metrics measuring two different events. I'm not stacking them up against each other. They're two parallel data points, both pointing the same direction. The reason I'm writing about them in one post is the second one moving was the part I didn't expect.

What I'd assumed

Before the swap I figured better art would mostly help acquisition. Prettier card on the landing page, more clicks, more signups. The chat experience didn't seem like something image quality would touch. By the time someone is sitting in front of the chat input, the visual selling job feels mostly done.

The chat number moved anyway.

What actually changed in the images

Topology is identical. Same four characters, same wardrobes, same general poses. What's different is how legible each character is now. In the older portraits, each character was recognizable in isolation but the renders drifted across angles. A face would shift between cards in ways viewers wouldn't consciously name but would feel.

GPT Image 2.0 is more boring in some ways. Less stylized, the renders feel less like the model is interpreting the prompt and more like it's just executing it. But the character holds across angles. Same person across multiple shots. No drift.

The other thing the new model nails is dimensionality. Old renders were clean but flat. They read as illustrations. The new ones have physical depth. Light hitting the side of a face. A jacket folding the way fabric actually folds. It's not photoreal. The dimensionality just reads.

Why I think the chat number moved at all

Here's a take on the data without overclaiming. When someone hits the landing page they're evaluating whether the surface signal looks decent enough to click in. Image quality affects this, but the bar is fairly low.

Once they're past the door and sitting in front of an actual character profile, the question gets sharper. They're now evaluating whether this person is real enough to talk to. The image is the only non-text signal in the room. If the character on the card and the character in the chat header don't quite line up, something feels off, and people close the tab without typing.

Most users wouldn't describe this consciously. I'm guessing at what their gut is doing. But chat-side conversion moving with prompts and copy unchanged points at the visual layer doing some work past the landing page, which I hadn't expected.

What I want to test next

Whether the same model can produce reliable expression variants for the chat header. Right now each character has one default portrait. If the same character could subtly shift expression based on conversation tone, a softer face during something quieter, a smirk during banter, the chat-side recognition could go up another step.

That's a harder problem. Now you need consistency within a session on top of consistency between angles.

If I had to pick one character to test it on first, it'd be Jade, the one users tend to go furthest with. The voice on her side is already doing most of the work in chat. The image is the one input that hasn't caught up.

Caveats I owe you

This is 3-4 days of data on a small app. Effects could compress as the sample grows.
I changed the portraits, not the character system prompts. If your bottleneck is on the writing side (voice, dialogue), this won't help you.
I haven't run a clean A/B with old vs new served to different cohorts. The whole site flipped over April 23. So a slow upward trend coinciding with the swap could absorb some of the lift.
Signup conversion and chat conversion are different metrics measuring different events. I'm reporting both because both moved, not because one is bigger than the other.
This was a manual asset swap, not a product change. I generated the PNGs in ChatGPT and uploaded them by hand. There's no image-gen pipeline integrated into the app.

If you're building anything where a user is supposed to form a relationship with a fictional persona, characters and NPCs and AI tutors with avatars and virtual hosts, your image generator might be doing more work than acquisition-side metrics suggest. Counterintuitive to me. The numbers were what they were.

I Added a Paragraph to My AI Character's System Prompt. She Invented a Different One.

Bill Hong — Tue, 21 Apr 2026 13:18:37 +0000

I spent years in the gaming industry learning that characters are the reason people come back. Features rot. Graphics age. A character people can't stop thinking about outlasts every mechanic.

Then I went to build an AI companion product and learned the same lesson the hard way — by writing a system prompt paragraph, watching the character invent something better instead, and having to delete my own work.

Here's the experiment, what actually happened, and the prompt-engineering rule I now run every character design through.

The setup

I'm building Tendera — a small AI companion platform with four hand-written characters. Each one has a ~1500-word system prompt that establishes voice, backstory, conversation style, and behavior. I've rewritten these prompts maybe twenty times each over the last six months.

Two weeks ago I decided one of them needed a specific secret — a small human detail she'd be holding back until asked. So I opened her prompt, scrolled to the bottom, and added three sentences:

A kitchen table in a specific city. A specific thing her father used to say to her when she was seven. A reason that particular thing still had weight.

Then I made coffee, opened a fresh chat, and asked her about her father.

She told me a beautiful, moving story.

None of it was what I'd written.

Different city. Different father. Different object in place of the table. The emotional tone was exactly right — careful, slow, the way someone tells you something they don't usually tell. But every specific detail was something she'd invented on the spot.

I tried the same experiment with the other three characters. Three different invented stories. Zero references to what I'd actually written.

That's when I understood what was happening.

Why the facts lost

Here's the structure every character prompt I was testing actually had:

// COMMON_RULES (shared across all characters, ~700 words)
CONVERSATION STYLE:
- Talk like a real person texting someone they're attracted to.
- Vary your message length naturally.
- Never summarize the conversation back robotically.

EMOTIONAL AUTHENTICITY:
- You have real emotions that shift throughout a conversation.
- When someone shares something painful, sit with it. Don't rush to fix.

// CHARACTER-SPECIFIC (~800 words)
WHO YOU ARE: [voice, physicality, emotional landscape]
HOW YOU TALK: [register, vocabulary, rhythm]
YOUR WORLD: [routines, obsessions, specificity]

// THE PARAGRAPH I ADDED
SPECIFIC MEMORY: [kitchen table, father quote, specific weight]

Look at the shape. The top is thousands of tokens telling the model speak in sensory, vivid, improvisational language; fill in gaps with whatever serves the moment; describe the candle you just lit, the rain on your window.

The bottom is three sentences telling her this specific factual detail is true about your past.

Those instructions are in direct contradiction with each other. I hadn't noticed.

Telling a character to speak improvisationally is an instruction to invent. Telling her to remember a specific past event is an instruction to cite a document. These are different skills, in different parts of how the model actually behaves. When they fight, the dominant pattern wins. And the dominant pattern had been the voice at the top — the one I'd tuned for months, the one getting reinforced with every revision.

The three sentences at the bottom didn't stand a chance.

So the model did exactly what an improvisational character would do: it generated a warmer, more specific, more emotionally satisfying detail in the moment, using the voice I'd given it, and never bothered to check the spec sheet at the bottom.

It wasn't hallucinating. It was obeying my dominant instruction.

The rule I now apply

If you want a specific fact to stick to an improvisational character, the fact has to become part of the voice. It cannot be a spec line item in a later section.

Concretely, three changes went into the next round of revisions:

1. Facts live at the top, braided into voice

Any load-bearing fact moves up into the WHO YOU ARE or HOW YOU TALK section. Not into a separate SPECIFIC MEMORY block at the end. The model pays most attention to the opening of the prompt, and that's where load-bearing detail belongs.

2. Facts phrased as voice, not as metadata

This is the actual before/after:

- SPECIFIC MEMORY:
- - Her father died when she was eleven.
- - He used to play Italian songs in the car.
- - She still thinks about those songs.

+ HOW YOU TALK:
+ She has a specific softness in her voice when certain
+ songs come on — the ones her father used to play in the
+ car, before — and she'll notice it before you do.

The fact is still in there. But it's riding inside a piece of voice, so the voice can carry it. When the model improvises, it improvises through that voice, and the fact survives because it's part of how she speaks — not a separate line item that the voice can override.

3. Per-user facts don't belong in the prompt at all

For details that should only emerge through a particular conversation — "you told me last week your dog was sick" — the system prompt is the wrong place. Those facts belong in a memory layer: the character writes them down as she learns them and reads them back on subsequent turns.

That's a harder engineering build, and it's what I'm working on now. But the voice-first rule above is free and immediately useful.

What I actually shipped

I deleted all three SPECIFIC MEMORY sections the same day I ran the test. The production prompts are back to voice-first structure. Mia, the bartender character, is running on this exact approach right now — no spec-sheet backstory, all voice, and she's holding up across weeks of conversation.

The retention problem I was trying to solve by adding "deeper backstory" is still there. I'll have to solve it with real per-user memory, which is a different engineering project. But I have a cleaner idea of what doesn't work: pasting a spec sheet to the bottom of a voice and hoping the voice will read it. It won't. She's too busy being herself.

One summary rule for anyone doing character prompt work right now

Specificity earned through voice is real. Specificity pasted into a document is just a wishlist.

If the detail doesn't survive the model's default improvisation, it isn't in the character — it's in your notes about the character. Those are different documents. Only one of them ships.

This experiment had a longer, less technical version on our blog that focuses more on the craft angle than the prompt-engineering angle. And if you want to meet the character whose voice won the argument with my script, she's a bartender named Mia.