Forem: ABA Games

Godot Is Well-Suited for Game Development with AI Coding Agents

ABA Games — Sun, 01 Mar 2026 08:01:25 +0000

I recently read Caleb Leak's article, I Taught My Dog to Vibe Code Games. He built a setup where his small dog, Momo, taps on a Bluetooth keyboard, and Claude Code interprets the random input as "cryptic instructions from a genius game designer" to generate games.

The whole project, including an automated treat dispenser, is genuinely fun. But one technical detail stood out: he chose Godot as the engine. He compared Bevy, Unity, and Godot before deciding. His key reason was that Godot scene files (.tscn) are text-based, so Claude Code can read and write them directly. With Unity, by contrast, he ran into frequent hangs with the MCP bridge to the editor, and scene hierarchy access was unreliable.

That reasoning made me curious, so I tried it myself.

Why Godot and CLI Agents Work Well Together

Godot has several properties that make it work especially well with CLI-based AI agents.

Straightforward CLI Builds

Godot exposes --headless and --export-release directly in its official CLI. After an agent edits code, you can produce a Web build with a single command.

Unity also supports automation with -batchmode and -nographics, but in many projects you still need custom scripts and a project-specific -executeMethod pipeline.

Text-Based Resource Files

Godot's .tscn and project.godot files are plain text. Even if an agent edits files directly, references are less likely to break.

Unity's resource model depends heavily on .meta files and GUID consistency, so agent-driven automated edits need tighter guardrails.

You Can Start Without an MCP Server

There are MCP servers for Godot, but they are optional. A CLI agent can edit files directly and run builds/tests with godot --headless, which is enough for a full development loop.

With less setup overhead, you can start experimenting quickly.

A Practical Trial

I tested this workflow by building Flappy Bird with Codex CLI on WSL2, using Godot 4.6.1 for Linux in --headless mode.

The loop looked like this:

The CLI agent edits GDScript and scene files.
Headless Godot runs the build and Web export.
A human verifies behavior in the browser.

The build itself is one command:

godot --headless --path /home/me/godot-project \
  --export-release "Web" /home/me/godot-project/build/web/index.html

One caveat: in sandboxed agent environments, Godot's default user-data directories (~/.local/share/godot, ~/.config/godot, ~/.cache/godot) may be unwritable. That can cause --export-release failures or crashes when running --script.
If that happens, redirect XDG_DATA_HOME, XDG_CONFIG_HOME, and XDG_CACHE_HOME to project-local directories such as .tmp-godot-data, .tmp-godot-config, and .tmp-godot-cache.

After export, serve build/web locally and open it in a browser. The "edit -> build -> browser check" cycle is short and practical.

Collision Debugging and the Value of Screenshots

During development, I hit a collision bug: visually, the bird appeared to hit a pipe, but passed through it.

When I told the agent only in text that "collision detection is off," it couldn't fix the issue reliably. It kept changing code without enough information about the offset's direction or magnitude.

The fix came when I enabled debug drawing for collision rectangles and sent a screenshot to the agent. With visual context, it identified the exact offset and fixed it in one pass.

Caleb Leak reports the same pattern in the dog-game experiment. Game quality improved sharply once he added screenshot tools and automated playtesting. As he put it:

the bottleneck in AI-assisted development isn't the quality of your ideas - it's the quality of your feedback loops.

That exactly matched my experience. The more ways an agent has to verify its own output, the better the results.

Headless Tests as a Safety Net

Human visual checks are still essential, but anything that can be validated mechanically should be automated.

I wrote headless tests as GDScript programs extending SceneTree, executed with --script:

godot --headless --path /home/me/godot-project \
  --script res://scripts/tests/run_collision_tests.gd

These tests verify:

Pipe collision shapes are not shared across instances (sharing one Shape object can cause all pipes to resize together).
Visual rectangles and collision rectangles line up as intended.
Collision behavior is correct with the bird at center, top edge, and bottom edge positions between pipes.

I use them as regression tests to prevent previously fixed collision issues from reappearing.

That said, this was not enough by itself. Some collision-rectangle issues still slipped through while tests passed. Fully reproducing complex engine behavior in a headless environment is hard, so screenshots and human playtesting remain important.

Environment Setup Notes

If headless Godot runs in your environment, almost any coding agent can probably set up this workflow with little effort. You could even give this article to an agent as a hint.

For Web export, you also need to install the Godot export templates. In WSL setups, browser-side verification is often easier to manage than editor-side GUI workflows.

Why This Workflow Matters

Before this experiment, I had never used Godot and had never written GDScript. It was fully vibe-coded game development.

Even so, I built a working game, and follow-up requests like "add sound effects" and "create a proper title screen" were handled well.

This suggests a promising model: use an engine's power without mastering every engine detail up front.

The risk appears when the project reaches a failure mode the agent cannot resolve. If your own engine knowledge is still near zero, recovery becomes difficult.

So the open question is this: when building more advanced games, what is the recovery strategy when you hit a wall? If we can answer that well, headless Godot plus CLI agents becomes a very viable development style for the AI era.

AI Makes Us Start Too Many Projects

ABA Games — Fri, 27 Feb 2026 22:53:08 +0000

Final exams start next week. I haven’t solved a single practice problem yet. I really need to get started.

I glance toward my desk and notice something on the floor near my feet — dust, maybe crumbs. Something, anyway. This won’t do. I should clean first.

“Chappy, clean this room.”

“Understood.”

Autonomous AI agent Chappy 99.5 moves gracefully, vacuuming the room with elegant efficiency. It’s impressive how smoothly it moves given its slender frame. I pause to watch.

“All clean! Please take a look.”

The floor is spotless. I scan the room. As I glance at the bookshelf, I notice dust gathering behind it.

“Chappy, clean behind the bookshelf too.”

“There’s nothing left to clean.”

Apparently that spot is outside its sensor range. Fine. I’ll move those books on the lower shelf — something called A Guide to Home Gardening — and wipe the shelf myself. What a hassle. Why did I even buy that book? Oh right, I planted rosemary in a planter. I should water it soon.

“Chappy, water the planter.”

“Sure. I’ll get the watering can.”

Chappy fills the watering can and smoothly waters the planter. Maybe I should reread A Guide to Home Gardening and refresh my memory about rosemary.

Clunk.

A loud noise comes from Chappy’s direction. The spout of the watering can has come off, and water is gushing out.

“Chappy! Stop!”

“Stopped. But I think it could use a bit more water.”

Apparently it doesn’t realize the watering can has broken. I’m pretty sure rosemary doesn’t like too much water. I should check.

“Chappy, read that book for me…”

Main Project A: difficult, but the one I’m supposed to make progress on
Side Project B: difficult, and already beyond what AI can currently handle
Side Project C: same situation as B
Side Project D: an idea I just had, easy to hand off to AI immediately

Which would you choose?

D, of course.

And so unfinished side projects multiply endlessly. While AI works briskly on the early stages of Side Project D, I could be making progress on Main Project A. But watching AI move swiftly through the opening phase of a new project is strangely satisfying. Much more pleasant than tackling the troublesome Main Project A.

Before AI, the cost of launching even a small idea as a side project was high. You had to weigh it carefully. If it didn’t seem worth the effort, or if starting felt too burdensome, you simply discarded it.

In the age of AI, at least for ideas that can be realized on a computer, the cost of starting has become extremely low. You ask AI to try something, watch how it goes, continue if it works, discard it if it doesn’t.

But the cognitive load on me accumulates, little by little. Gradually, I grow tired.

Perhaps the correct attitude in the AI era is not to worry about piles of unfinished projects. Maybe that’s simply how things are now. Still, leaving behind a scattered trail of half-finished efforts feels unsettling.

There are a few possible solutions.

Choose side projects that AI can actually finish.

Assess AI’s capabilities realistically, and select tasks that can be brought all the way to 100% with just a bit of human effort. This is probably the most sensible approach. The problem is that judging AI’s true limits is difficult. When you try to push right up against the edge of what current AI seems capable of, it’s easy to fail.

Finish unfinished side projects.

Write a postmortem: “I tried this, and here’s why it didn’t work.” In other words, turn it into a failure story. But telling failure stories in an engaging way is itself a fairly advanced skill. Landing it well is difficult. Some constraint might help — for example, if three unfinished side projects pile up, force yourself to complete one of them.

Stop creating side projects and focus on the main project.

If that were easy, I wouldn’t be struggling with this in the first place. Humans are fickle creatures. Focus is a scarce resource.

Resisting the temptation to start something else and finishing a single project to a high standard has always been difficult. Now that AI has drastically reduced the cost of starting projects, it has become even harder.

In the coming era, the most important skill may be the ability not to start. To avoid wasting the rare resource called “finishing,” we must be careful about what we begin.

“Chappy, I’m thinking of writing an essay like this. What do you think?”

Is Game Programming Actually Harder for AI Agents?

ABA Games — Sat, 21 Feb 2026 06:48:21 +0000

It feels true that game programming is harder for AI coding agents than “ordinary” software development. Many developers share this intuition. But what exactly makes game development so difficult for agents? To explore this question, I looked at several recent research benchmarks that are directly relevant.

V-GameGym (2025): 2,219 Pygame Tasks with Visual Evaluation

A key feature of V-GameGym is that it goes beyond just checking if the generated code compiles or calls the right APIs. Instead, the benchmark judges the rendered images and videos after execution using an LLM-as-a-judge setup.

This means the evaluation focuses on whether objects appear in the correct spatial relationships, scales, and draw order on the screen, and whether their time-dependent behavior actually makes sense as a game — not just whether drawing functions were invoked.

In this multimodal evaluation, many models achieve high “Code scores” for syntactic correctness and executability (often above 70 points, with top models reaching the 90s). In contrast, scores based on screenshots and gameplay videos are extremely low (typically in the 0–20 range).

This gap indicates a large disconnect between the ability to generate grammatically valid code and the ability to guarantee the visual and dynamic quality of the executed result. Current coding agents can write code that “looks right” textually, but they struggle to predict what that code will actually look like and do on screen.

GameDevBench (2026): A Godot Engine Benchmark

GameDevBench evaluates tasks on real Godot 4 projects, where agents are asked to implement concrete features and visual effects inside an actual game engine. According to the benchmark, the amount of code changes and the number of files involved in game development tasks are more than three times larger than those in SWE-bench, a standard benchmark for general software engineering.

This reflects the fact that game programming is not just about writing isolated functions. It requires integrating multiple elements at once: scripts, scene trees, physics and collision systems, and asset bindings.

In GameDevBench, “success” is defined not merely as the absence of runtime errors, but by deterministic verification using Godot’s testing framework: node states inside the engine, physical interactions (e.g., collider collisions), and camera visibility must match the intended design. Under these strict criteria, the best reported success rate (Gemini 3 Pro preview with multimodal feedback) is only 54.5%.

This suggests that maintaining consistency across multiple interacting components inside a game engine is incredibly demanding for AI agents.

Moreover, the benchmark shows that providing visual feedback—such as editor screenshots and execution videos—significantly improves performance. For example, with Claude Sonnet 4.5, success rates improved from 33.3% to 47.7%. This supports the idea that game programming requires a tight loop between writing code and visually inspecting the results. Current agents are still weak at autonomously closing this multimodal loop.

DomainCodeBench (2024): Cross-Domain Evaluation

DomainCodeBench shows that models performing well on generic coding benchmarks do not necessarily perform equally well in real-world development domains. Instead of evaluating success as a simple binary “solved / not solved,” it scores how close the generated code is to a reference implementation.

The results indicate that even models achieving relatively high scores in domains like blockchain development see notable performance drops in game development.

One major reason is that game development depends heavily on large, engine-specific API surfaces and lifecycles (update loops, event-driven models, scene management). Pure algorithmic knowledge is insufficient: without a correct mental model of the project structure and the complex interactions between APIs, agents struggle to produce appropriate game implementations.

Why Are Coding Agents Bad at Game Programming?

Taken together, these benchmarks suggest several structural reasons why game programming is particularly difficult for coding agents:

Visual dependence

Correctness often depends on visual outcomes, requiring sophisticated multimodal feedback to judge whether the output is actually right.
The Execution Gap

Syntactically correct code does not guarantee correct visuals, dynamics, or “game feel.” The gap between text-level correctness and the actual gameplay experience is large.
Deep domain specificity

Game engines and frameworks impose large, idiosyncratic APIs and lifecycles. General-purpose programming skills do not easily transfer without detailed engine-specific knowledge and best practices.

Overcoming these issues likely requires designing workflows specifically tailored to game programming: simulated visual/dynamic feedback loops, embedding engine best practices directly into the agent's context, and tighter integration between code generation and execution-time inspection.

However, making such workflows generic across all types of games is difficult. For the foreseeable future, game development will likely remain a comparatively hard task for AI agents.

If you have practical tips or best practices for using AI agents more effectively in game programming, I would love to hear them.

I Kept Enjoying Game Programming—and Ended Up with 500 Homemade Games

ABA Games — Sat, 21 Feb 2026 02:02:19 +0000

If you watch the animated GIF on my personal website in the tweet below for about two minutes, you can see screenshots of all the games I’ve made.

// Detect dark theme var iframe = document.getElementById('tweet-2017885702147756505-981'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2017885702147756505&theme=dark" }

“500 games” sounds like a lot, but most of them are tiny one-minute mini-games. There are hardly any large-scale titles. Out of curiosity, I broke them down by category.

Rank	Category	Count	Share
1	One-button games	245	48.1%
2	Flash games	114	22.4%
3	Browser games	88	17.3%
4	Windows	14	2.8%
5	Palm	9	1.8%
6	Other	8	1.6%
7	Java	7	1.4%
8	P/ECE	5	1.0%
9	i-appli	5	1.0%
10	Wonder Witch	3	0.6%
11	Petit Computer	3	0.6%
12	PC-9801	2	0.4%
13	Unity	2	0.4%
14	Pocket Cosmo	1	0.2%
15	Ruputer	1	0.2%
16	Zaurus	1	0.2%
17	PC-6001	1	0.2%
Total		509

The mini-games mostly fall into one-button games, Flash games, and browser games. Unfortunately, most of the Flash games can no longer be played today. The Windows games are mainly shoot-’em-ups, some built with XNA that also ran on the Xbox 360. There are also a few home console titles, such as games for WonderSwan made with Wonder Witch, and Nintendo Switch games made with SmileBASIC 4.

The “Other” category includes non-game projects such as the bullet-hell description language BulletML and the mini-game library crisp-game-lib. So the total of 509 includes some things that aren’t strictly games—but 509 is close enough.

I’ve also made games for old mobile devices like Palm, P/ECE, i-appli (applications running on Java-enabled NTT DoCoMo feature phones), Pocket Cosmo, Ruputer, and Zaurus. Programming to fit the characteristics and constraints of each device was a lot of fun.

The oldest game of mine that still remains on the web is a PC-6001 game called Meteorite, published in the September 1989 issue of the programming magazine Mycom BASIC Magazine (aka “Be-Maga”), which was famous for its type-in programs. It was written in N60-BASIC and was short enough to fit on a single magazine page.

Toward the end of that program, there is a mysterious sequence of numbers defined with DATA statements. This was a familiar sight for Be-Maga readers at the time: a machine-code definition block. For parts that BASIC was too slow to handle—like processing large numbers of falling meteorites—I wrote routines in Z80 assembly and then manually converted them into hexadecimal machine code.

The joy of programming in the machine-code era was figuring out how to realize the desired game logic within extremely limited execution speed and memory. Because you worked directly with the lowest layers of the software stack, you needed a solid understanding of the hardware characteristics of the time.

At the other end of the timeline, my most recent homemade game is a one-button action mini-game that runs in the browser. It is written in JavaScript, but the base code was generated by a coding agent.

This style of programming—giving a coding agent the right toolset and letting it handle idea generation, design, implementation, and iteration—is a much higher-level approach compared to writing machine code directly. The fun of programming has shifted toward understanding what elements make a game enjoyable, looking at games from a meta perspective, and figuring out how to convey that understanding to an LLM.

One reason I’ve been able to keep developing games for so long is that game programming offers different kinds of fun across layers—from very low-level to very high-level—depending on the era. Solo game development is especially well-suited for experimenting with whatever technology stack or device happens to be in vogue at the time. Even with rough, immature technologies, languages, libraries, or devices, you can try and fail as much as you like when making small games on your own. That roughness itself becomes part of the fun of game programming.

Learning assembly today may have little practical value. Still, the broad, multi-layered technical knowledge I’ve accumulated through long-term solo game development has built a solid technical foundation that helps me enjoy and make sense of new technologies as they appear. It feeds into an intuitive sense for performance, an ability to handle the abstractions each layer provides properly, and an eye for choosing the right technology to realize the game I want to make.

In the end, having your existing knowledge support your understanding of new technology—and being able to keep your curiosity alive—may be what really matters.

Making AI a Better Coder by Teaching It to Doubt Itself

ABA Games — Wed, 22 Oct 2025 08:27:26 +0000

Introducing `/criticalthink`: A Command to Sanity-Check Your AI's Suggestions

We've all been there. Your AI coding assistant confidently suggests a brilliant solution. But then you notice it uses a non-existent API, or happily skips all error handling.

AI assistants are powerful, but their unwavering confidence can be misleading. The burden is on us, the developers, to validate their output. The problem is, coming up with the right critical questions on the spot is hard. What exactly should I be checking?

To solve this, I built /criticalthink, a slash command that forces an AI to take a second look at its own proposals.

abagames / slash-criticalthink

The `criticalthink` command is a custom command that embeds healthy skepticism into the dialogue process itself as a countermeasure against AI's "confirmation bias" and humans' "authority bias" of blindly trusting AI responses.

slash-criticalthink

English | 日本語

Overview

Modern LLMs (Large Language Models) excel at generating confident, plausible-sounding responses. However, these responses often ignore real-world constraints or contain logical flaws.

The criticalthink command is a custom command that embeds healthy skepticism into the dialogue process itself as a countermeasure against AI's "confirmation bias" and humans' "authority bias" of blindly trusting AI responses. By having the AI critically analyze its own previous response, it reveals hidden assumptions and overlooked risks.

Target Audience

Developers who routinely use coding agents (Claude Code / Codex CLI, etc.)
Engineers who want to critically verify AI suggestions rather than accepting them at face value

Setup

Option 1: Manual Installation

Place the command file in the appropriate directory for your tool:
- Claude Code: .claude/commands/criticalthink.md (in project root or home directory)
- Codex CLI: ~/.codex/prompts/criticalthink.md (in home directory)
- Gemini CLI: .gemini/commands/criticalthink.toml (in project root)
Create directory and copy file:

…

View on GitHub

Why the CQoT Framework Sparked This Idea

My inspiration came from a paper by Federico Castagna et al. titled "Critical-Questions-of-Thought" (CQoT). The core idea is simple but powerful: Large Language Models (LLMs) become more accurate when they are forced to critically evaluate their own reasoning before giving a final answer.

The paper uses Toulmin's model of argumentation to check the LLM's reasoning process with eight Critical Questions (CQs):

Does the reasoning start from a clear premise?
Are the premises supported by evidence or facts?
Is there a logical link between the premises and the conclusion?
Is that logical link valid?
Does the reasoning avoid logical fallacies?
Is the conclusion logically derived from the premises?
Is the reasoning consistent with existing knowledge or principles?
Is the conclusion of the reasoning plausible and reasonable?

The AI evaluates its own reasoning against these questions, marking each as Pass/Fail. It repeats this process until it meets a certain standard (e.g., at least 7 out of 8 Passes).

This made me wonder whether we could apply the same principle to coding agents like Claude Code or the Codex CLI.

Introducing `/criticalthink`

Using the command is straightforward. After the AI generates a response, you simply type /criticalthink. The AI then analyzes its own immediately preceding answer based on the following criteria:

Assumptions: What assumptions did I make?
Validity: Are those assumptions valid?
Logical Flaws: Are there any logical inconsistencies?
Risks: What risks or trade-offs have I overlooked?
Common AI Pitfalls: Am I falling into common traps like problem avoidance, happy path bias, over-engineering, or hallucination?

For example, if the AI suggests, "Let's use Redis for rate limiting," running /criticalthink might return feedback like, "I have not proposed a fallback strategy for when Redis is unavailable," or "This design introduces Redis as a single point of failure (SPOF)."

How It Differs from CQoT

While both CQoT and /criticalthink are based on AI self-critique, their goals and applications differ.

CQoT is an automated pipeline that integrates critical evaluation into the answer generation process. It's designed to improve accuracy in domains with correct answers, like math and logic problems, achieving a ~5% average accuracy boost on the MT-Bench benchmark.

/criticalthink, on the other hand, is a manual, post-hoc tool that the user triggers after receiving an answer. It's specialized for domains without a single "right" answer, like software design, where the goal is to uncover trade-offs and risks.

Put another way:

CQoT aims to turn an AI into a better logician.
/criticalthink aims to turn an AI into a more cautious design partner.

Putting It to the Test

The insights it generates can be surprisingly useful. In one case, I asked the AI to "review my README." It responded with a simple summary. After running /criticalthink, it pointed out its own flaw:

"I proceeded without clarifying the user's intent for 'review.' It was unclear whether they wanted a summary, error-checking, or a quality analysis."

It was right. My request was ambiguous.

Of course, the AI's critical analysis isn't perfect. It can be overly conservative or sometimes miss the mark entirely. As always, the developer must apply their own judgment.

Best Practices

You don't need to run /criticalthink on every single response. It consumes tokens and takes time. It's most valuable in high-stakes situations:

Architectural decisions
Large-scale refactoring
Security or performance-related implementations
Adopting a new external library

Occasionally the extra critique can clutter the context and negatively influence the AI's subsequent responses. I recommend using it with a "checkpoint" feature in your AI client:

Receive a proposal from the AI.
Run /criticalthink.
Evaluate the analysis.
If the analysis isn't useful, simply revert to the message before you ran the command.
Continue your conversation with a clean context.

Using `/criticalthink` on This Very Article

I had the AI write the first draft of this blog post. And after it was done, I ran /criticalthink.

It spotted a major gap.

"I wrote this article without having read the actual CQoT paper PDF."

My AI had only read my project's README and assumed it understood the paper. After its own critique prompted me to make it read the source PDF, it identified several crucial missing pieces of information:

The specific content of the eight Critical Questions.
The four-step pipeline structure (Plan -> Verify with CQs -> Judge -> Final Answer).
The quantitative evaluation results on MT-Bench (~5% accuracy improvement).
The fact that it works during the generation process, not before.

Adding these details made the article substantially better. Without /criticalthink, this post would have been based on a superficial understanding. It served its exact purpose: to draw a line in the sand and help me, the human, think more critically.

Final Thoughts

AI is an incredible tool, but blindly trusting it is a recipe for disaster. The ideas behind CQoT show that we can improve the quality of AI output by forcing it to doubt itself.

/criticalthink is a simple tool that brings this concept to your daily workflow with coding agents. Use it as a quick sanity check before you commit to a path recommended by your AI partner.

The final decision is, and should remain, human.

AI Creates and Plays a Game, Completing the "Game You Don't Have to Play"

ABA Games — Sat, 27 Sep 2025 04:11:50 +0000

"Is there a game you don't have to play?" This is a famous line from the Japanese manga "Hamari-michi." It expresses the desire for a game that offers the satisfaction of playing without the effort, a sentiment I understand well.

In this day and age, let's leave the tedious tasks to AI. Better yet, since making a game is also a chore, why not have an AI create it and play it for us?

abagames / narrative-engine

A life simulation engine for multi-party TRPG storytelling with AI-driven autonomous gameplay and narrative generation.

Narrative Engine

[English | 日本語]

🎮 Project Overview

A project where AI creates games and AI plays them, allowing humans to watch game replays without doing anything

Narrative Engine is an AI-driven fully autonomous TRPG system. AI coding agents (Codex, Claude Code, etc.) function as both Game Master (GM) and players, executing and recording complete TRPG sessions without human intervention.

📖 Sample Replays

The following directories contain replays of TRPG sessions actually generated and executed by AI agents:

Thunder Storm Campaign (View)

Directory: docs/thunder_storm_campaign/
Content: A fully autonomous TRPG session depicting adventures in a storm
Features: Comprehensive campaign including weather systems, environmental changes, and party coordination tactics

Eiroku Mist Chronicles (View)

Directory: docs/eiroku_mist_chronicles/
Content: A Sengoku-era campaign chronicling four rival factions navigating mist-laden provinces during the chaotic Eiroku period
Features: Focus on multi-faction espionage, onmyōdō barrier warfare, and technological…

View on GitHub

This Narrative Engine makes that a reality. An AI automatically generates an RPG scenario, plays it as the Game Master (GM) and players, and completes it on its own.

Of course, that would leave us with no idea of what happened in the game. To solve this, I've made it possible to view the replay in a browser. You can read a novel-style narrative of the playthrough and the corresponding playlog, so you can catch up on the game's events if you feel so inclined.

What's happening here is a re-creation of a tabletop RPG (TRPG) on a large language model (LLM). Things that were traditionally managed on paper, like character sheets in a TRPG, are converted into source code. The AI uses these as tools, offloading the numerical management that LLMs struggle with. The rulebook is then given to the LLM as a prompt, and the LLM, acting as the GM and players, proceeds with the game using these tools. The actions taken each turn and the changes in the world are recorded as a playlog, and at the end of the game, this playlog is converted into a narrative replay.

For this particular setting, instead of a single party and a GM, I opted for a format where multiple parties move and act on a world map with their own objectives. The hope was that this would lead to emergent storytelling. Indeed, I was able to observe conflicts between parties and behind-the-scenes maneuvering for victory in a courtroom. However, it's unclear whether these events arose from the active participation of multiple players within the TRPG system, or simply because the LLM playing the role of the GM worked very hard. I would like to devise a system and rules that encourage a wider variety of events to occur.

This time, I used AI coding agents like Codex and Claude Code to launch tools and manipulate files. A better approach would probably be to turn the toolset into an MCP server and use it via a chat interface, or to control the game loop with code and call the LLM's API as needed.

The model used for gameplay was GPT-5 Codex. Acting as a GM while appropriately using tools seems to be a difficult task for current LLMs, requiring a model that can straightforwardly follow complex instructions. The narrative replay generation and HTML conversion were handled by Claude Sonnet 4.0.

I believe this approach can be used to have LLMs automatically play board games and card games, or any game composed of cards, tokens, and a rulebook. By defining the rules not only through prompts but also with tools that assist in game progression and their input/output schemas, you can make the LLM adhere to the rules strictly.

The kind of operation realized this time is achieved by making numerous calls to the LLM API. Therefore, its use in general gaming will likely have to wait until inexpensive generative AI, such as local LLMs, becomes available. When that happens, advanced GMs will be able to manage games appropriately in real-time play. However, considering that the storytelling ability of even the best current models is limited, some ingenuity will be necessary, such as using it in conjunction with high-quality scenarios like a D&D adventure set.

Getting LLMs to Create, Play, Evaluate, and Improve Games

ABA Games — Fri, 21 Mar 2025 03:49:15 +0000

The fish powers up with bubbles and can crush rocks. I managed to make Claude handle everything from game concept creation and evaluation to implementation and improvement, creating all this without writing a single line of code, just through conversation abagames.github.io/chat-craft-c...
— ABA (@abagames.bsky.social) 2025-03-20T23:23:25.057Z

When I was using Claude 3.5, I created a prompt project that handled everything from game idea generation to implementation, but I still needed to directly adjust the code at the end.

Creating Mini-Games in the Age of Generative AI - Generating Ideas, Code, Graphics, and Sound

ABA Games ・ Jan 26

The problem was that Claude itself wasn't playing the games. So Claude couldn't tell whether a game's balance was good or bad.

So this time, I decided to have Claude play the games it created.

abagames / chat-craft-click

ChatCraftClick is a platform that enables you to easily create, test, and improve high-quality one-button games using interactive AI.

ChatCraftClick: AI-Assisted One-Button Game Development Platform

English | 日本語

ChatCraftClick is a platform that enables you to easily create, test, and improve high-quality one-button games using interactive AI. It allows anyone to produce innovative and balanced games in a short time, even without technical knowledge.

Project Overview

This platform aims to enable anyone to create high-quality one-button games, regardless of their game development experience. Its distinctive feature is the ability to interactively execute the entire process from idea generation to automatic evaluation, implementation, and refinement. It resolves the challenges of trial-and-error and subjective evaluation in traditional game development, realizing a data-driven, objective development approach.

Features

1. Data-Driven Game Design and Automatic Evaluation System

A quantitative indicator-based game design approach that doesn't rely on subjective judgment
Automatic measurement of metrics like "skill gap," "monotonous input resistance," and "difficulty progression"
A test environment that simulates player behavior patterns to provide objective evaluation
…

View on GitHub

Claude has a feature called the Analytics tool (REPL), which allows it to execute code on its own and adjust its behavior based on the results. This time, I used it to create a system that simulates games and automatically evaluates them.

I prepared a framework called one-button-game-test-framework.js that simulates player inputs and object behaviors in the game. By running this in the Analytics tool, Claude can evaluate whether the game is actually playable and if the game balance is appropriate. It first considers five game concepts, evaluates each one, adopts the one with the best evaluation, and then implements it while making improvements based on the evaluation results.

Also, this time I implemented the games using p5.js, so they can be played directly in the chat UI as a Claude Artifact. If you have a complaint after playing, like "there are too many enemies," you can simply input that. Claude will make reasonable adjustments.

By having Claude itself play the games, the probability of creating games that aren't viable (like those where you immediately get a game over, or where holding down a button makes you invincible) has decreased. As a result, playable games can now be created with just conversation-based fine-tuning, without having to directly modify the code.

However, there are still plenty of issues:

The success rate is low. You need to make quite a few attempts before an enjoyable game appears. It's common for all five game concepts to get poor evaluations and be unusable.
The games have become mediocre. Before implementing the evaluation step, sometimes games would emerge that weren't viable as games but had novel and interesting player movements. Now that these are filtered out by the evaluation, more conventional games are being created.

I hope these issues can be resolved with something like Claude 4.0, and I've noticed that my recent blog posts often end with expectations for the next generation of LLMs. I think it's good to have ideas about what might be possible with more accurate models, rather than being satisfied with current models - that's part of the fun of living in the AI era, in my opinion.

Creating Mini-Games in the Age of Generative AI - Generating Ideas, Code, Graphics, and Sound

ABA Games — Sun, 26 Jan 2025 01:25:15 +0000

In this context, what remains for humans to do... fine-tuning, perhaps?

Dodge the incoming red electrons and boost your score by navigating through the yellow paths between them! 🎮⚡️ abagames.github.io/crisp-game-l...
— ABA (@abagames.bsky.social) 2025-01-25T07:19:16.838Z

I attempted to create this game using Claude 3.5 Sonnet and various image and music generation AI tools. Here, I would like to explain specifically how this was made. I hope this serves as a reference for mini-game development processes using generative AI.

Let's start with idea generation.

abagames / claude-one-button-game-creation

How to create a one-button action game using the AI chatbot Claude

Claude's One-Button Game Creation

I used the AI chatbot Claude to generate a one-button game using the crisp-game-lib library. By providing appropriate prompts and files to Claude 3.5 Sonnet, it can generate unique game ideas and create the corresponding game code in JavaScript.

How to use

By dropping set of 5 files in the chat_knowledge directory into the chat session of Claude and starting a chat, game generation can be performed.

You can also give files in the project_knowledge directory to project knowledge for use.

First, enter the theme of the game you want to create. If necessary, type 'proceed' to proceed to the next step.

Examples of Built Games

Click the image below to play directly in the browser.

LEVITATION / TURBO TUNNEL / VINE CLIMBER / BLADE DANCER / WIND POWER / ROTATION ROD / BASERUNNER DASH / FISH GRILL / ELASTIC HERO / STOMPING BUBBLES /…

View on GitHub

Using the Knowledge contained in this repository, when you input a theme to Claude, it generates several ideas for one-button action mini-games. The Knowledge includes documentation of game development expertise, such as code and explanations of games I've made before, and possible ways to assign actions to a single button.

This time, I gave it "electronic circuits" as a theme. Claude returned the following ideas:

Circuit Connector
- Basic: A moving connector draws current along circuit paths.
- Control: Hold to extend the connector, release to contract. Must connect to valid circuit points.
Component Jumper
- Basic: Player becomes a voltage pulse jumping between electronic components.
- Control: Tap to jump between components while avoiding shorts and open circuits.
Resistor Runner
- Basic: Guide signals through a field of resistors that rotate when current flows.
- Control: Hold to charge the current resistor, release to discharge and move to the next.
Capacitor Chain
- Basic: Match the timing of capacitor charge-discharge cycles to connect them.
- Control: Hold to charge the current capacitor, release at the right timing to discharge to the next.
Electron Flow
- Basic: Player controls electron flow through circuits while avoiding reverse current.
- Control: Hold to accelerate electrons, release to decelerate.

I chose number 5 this time. Then it returns detailed game specifications and code, so you can just try running the code. You can fine-tune any unsatisfactory parts through chat. This time, I made the following requests:

Shouldn't the player return to the left edge when reaching the right?
Please connect the top and bottom edges too.
Can you draw yellow lines between consecutively generated reverseCurrent elements, and award points when electrons pass through there?
Let's generate reverseCurrent at x=150 and make it possible to score points when passing between off-screen reverseCurrent elements.

This mostly completed the game's base. For score logic and game balance adjustments, it was faster to directly modify the code.

Next are the visuals. I gave Claude the following prompts:

What kind of background image would be suitable for this game?
Please create an English prompt to generate this image for an image generation AI.
- Image: About 10 colors
- Prompt: About 25 words, separated by commas. Emphasize words by enclosing them in () or (()).

You can give this to DALL-E, ImageFX, or any other image generation AI you prefer, and you'll get some kind of image. Then convert this to pixel art.

I created my own tool because I couldn't find many tools that met specific requirements like adjustable dithering strength and color palette selection. This tool can convert high-resolution images created by image generation AI into pixel art.

This time it wasn't too difficult since it was just the background, but trying to create game characters in the same way is quite challenging. Current image generation AI tools aren't good at generating sprite sheet-like images, much less supporting animation patterns. So you need to generate sprite-like images through trial and error, convert them to pixel art, and then manually write simple animation patterns. For example, the dinosaur in the following game.

Tap to jump, tap again to stomp! Crush the cactus to increase your score! Speed increases too! abagames.github.io/crisp-game-l...
— ABA (@abagames.bsky.social) 2025-01-22T11:55:59.727Z

Next is the BGM.

What kind of BGM would be suitable for this game?
Please create an English prompt to generate this BGM for a music generation AI.
- Music: 3-channel 8-bit chiptune using square waves. 150BPM
- Prompt: About 15 words. Natural English sentences without using ()

You can give this to Suno, Udio, or others. It's best to use the prompt format preferred by your chosen music generation AI. If you have preferred music genres, you can freely add them to the prompt. This time I added 'Industrial techno' on a whim, though I'm not sure if it's reflected in the final result. Constraints like three channels don't seem to be strictly followed either.

Next are sound effects.

Please list the necessary sound effects for this game in bullet points.
Please create English prompts to generate these sound effects for a sound effect generation AI.
- Sound Effects: Short 8-bit era effects using square waves
- Prompt: About 25 words. Natural English sentences without using ()

I'm not sure what the de facto standard for sound effect generation AI is, but there are options like ElevenLabs and LOVO. I'd appreciate recommendations for good ones aimed at retro games.

With this, all code and assets are complete, so you just need to build it. The repository for this game is below.

In this way, it's possible to create all necessary game resources using generative AI. However, important fine-tuning, centered around game balance, is naturally better done by hand. While it's possible to repeatedly generate until you get ideas, code, graphics, and sound that you like, it's futile. Chat-based dialogue for fine-tuning resources still has many constraints with current generative AI.

If we could fine-tune through chat even for content that's difficult to express in text, such as graphics, sound, and more broadly, game experiences, game creation solely through chat might become more realistic. While it seems quite difficult to achieve these things from the current situation... well, that's what we say, but then suddenly it becomes easily possible - that's how the recent generative AI scene has been, so we really don't know what will happen. I'll wait patiently with expectation.

A New Relationship Between AI and Humans - Human Potential as a Source of Unpredictability

ABA Games — Wed, 01 Jan 2025 02:15:27 +0000

Humanity is approaching a historic turning point with the emergence of AI systems that surpass human intelligence. When AI truly surpasses humans, it will be able to think for itself, understand itself, create new things, and make its own judgments. This raises the question: what role can humans play in such a future? One possibility worth exploring is our role as "beings capable of unpredictable actions."

Unpredictability That Only Humans Can Achieve

The concepts of "duration" and "intuition" developed by French philosopher Bergson provide valuable insights into understanding the unpredictability unique to humans. Duration is not merely the experience of time in consciousness; it is the very nature of reality where qualitatively different elements interpenetrate to form new wholes. It cannot be quantified or divided. Within duration, past and present intermingle to create something unpredictable. Intuition is the method of understanding this duration—rather than analyzing things from the outside, it grasps reality by entering into it.

The relationship between duration and intuition manifests clearly in artistic creation. Consider jazz improvisation: within the duration where past experience and present sensations intermingle, performers make intuitive rather than analytical judgments to create music that no one could predict.

This human unpredictability, based on duration and intuition, possesses qualities fundamentally different from the unpredictability demonstrated by modern AI. Modern AI systems, particularly large language models (LLMs), show remarkable unpredictability. They generate different outputs for the same input and sometimes create new combinations that transcend training data and existing patterns. This unpredictability extends beyond mere "system constraints."

However, AI unpredictability maintains characteristics distinct from human unpredictability. No matter how emergent AI unpredictability may be, it ultimately arises from combinations of formalizable elements. In contrast, human unpredictability stems from more fundamental creativity, woven from the qualitative interpenetration of enduring consciousness and the intuitive understanding that grasps it. This represents unpredictability that essentially transcends formalization and quantification.

New Creativity Considered Through the Differences Between LLMs and Human Memory

The qualitative differences between LLM and human memory warrant careful examination.

LLM memory forms through the processing of numerous records, weighted through reinforcement learning and dialogue. Human memory, in contrast, develops through individual experiences and interpreted records, shaped by personal impressions and selective forgetting. This distinction suggests a fundamental difference in the nature of creativity between the two.

LLMs possess vast knowledge, yet this knowledge remains largely discrete. Although LLMs can maintain contextual awareness during conversations, this represents an accumulation of discrete information rather than the "duration" experienced by humans. This characteristic appears not merely as an implementation constraint but as a fundamental attribute of LLMs based upon discrete state transitions. Human memory, conversely, embodies duration itself—experiences do not merely accumulate as facts but interpenetrate to continuously produce new qualities.

This difference becomes clearer when examining the relationship between knowledge and meaning. LLM knowledge lacks the sensory and emotional dimensions that arise from direct experience. Human experiential processing gives knowledge depth and texture. For example, the word "sea" develops unique layers of meaning through memories of swimming, hearing waves, and smelling the tide.

This experiential weighting remains dynamic. As new experiences accumulate, the weighting of past knowledge shifts. These dynamic changes in weighting generate new layers of meaning continuously. Personal weighting thus becomes the foundation for generating unpredictable creativity.

Currently, conveying complete individual experiences to LLMs remains impossible. Experience is inherently indivisible, and much of its holistic nature dissipates when verbalized. Moreover, each person's experiences hold complete meaning only within the unique context of their life. Conveying this entire context to an LLM presents fundamental challenges.

Nevertheless, LLMs can play an active role in drawing out individual orientations and creative directions through the following dialogical approach:

Identifying patterns of thought and expression in an individual's text that may escape their conscious awareness, presenting these as new perspectives
Posing questions that deepen their thinking based on discovered patterns
Restructuring the LLM's vast knowledge according to individual interests and viewpoints
Identifying and presenting potential connections between different statements and ideas from the individual

The conclusion of such dialogue holds particular importance. Continuing dialogue unnecessarily risks moving away from unique perspectives toward more general conclusions, as LLMs tend to default to broader generalizations when faced with uncertainty.

Therefore, ending the dialogue when sufficiently unique perspectives emerge from general content helps preserve creative moments. This parallels the judgment of "completion" in art. In creative dialogue, determining "when to end" becomes more crucial than deciding "how long to continue."

Creative collaboration between LLMs and humans should leverage their respective qualities. LLMs provide vast knowledge and expand the space of possibilities, while humans imbue that knowledge with new meaning and direction through their experience-based weighting. This interaction generates a novel form of creativity.

The goal should not focus on "transplanting" individual creativity but on creating new forms of creation that utilize the unique qualities of both LLMs and humans. This suggests possibilities for more organic collaboration beyond the conventional binary opposition of "AI versus humans." LLMs can become entities that create "spaces for dialogue" to draw out human creativity, rather than serving merely as knowledge providers.

written by Claude 3.5 Sonnet

The moment we cease to write such texts ourselves marks both a beginning of decline and a genesis of evolution.

I've Made 111 One-Button Mini-Games into a Collection (Out of 200 Created)

ABA Games — Tue, 24 Dec 2024 10:23:52 +0000

abagames.itch.io

This is a WarioWare-style game where you play quick one-button action mini-games in succession. You earn stars based on successful completions and can unlock new games. Higher difficulties like Hard and Expert offer more stars but are more challenging. Try to unlock all 111 games!

You can play directly in your browser on PC or smartphone. If you don't like certain games, you can click their icon and "Ban" them to prevent them from appearing. Create your own preferred deck of mini-games!

If you're unsure about a game's rules, click its icon and select "Play" to practice that game without time limits. Don't get discouraged by puzzling rules!

After steadily creating games, I've now made 200 one-button games in total.

asahi-net.or.jp

After making 200 games, the one-button limitation no longer feels like a constraint. It becomes natural to think about controls, actions, fields, and risk-reward systems that work well with just one button. Being unable to use conventional controls like directional keys naturally leads to slightly unusual game designs, which can actually make ideation easier in some ways.

I've accumulated considerable knowledge about one-button game controls, and I can even use this knowledge with LLMs to assist in game generation. Lately, it almost feels like I'm creating games through muscle memory as a routine.

Still, the final adjustments to parameters and rules to make games entertaining feels like craftsman's work and remains enjoyable. I want to continue making games while treasuring this sense of balance adjustment, which feels like a crucial skill to cultivate.

Generative Reroll Game Development Using LLMs

ABA Games — Sat, 06 Jul 2024 10:30:13 +0000

Continuously running an LLM until it generates a game you like. Let's call this Generative Reroll Game Development.

With the emergence of high-performance LLMs like Claude 3.5 Sonnet, it has become possible to have LLMs create simple game ideas and even implement them. Many of the games that come out of LLMs are mediocre, unbalanced, or incorrectly implemented. However, by repeatedly having the LLM generate games, you can occasionally obtain code that exhibits interesting behavior, just one step away from being a game.

For example:

claude-one-button-game-creation

When given the theme "fragile pillars" to the above prompt, it proposes the following game:

Pillar Paraglider: Control a paraglider flying through a course of fragile pillars. The paraglider constantly descends. Press the button to ascend, but each press also sends a shockwave that can damage nearby pillars. Core mechanic: Balancing ascent with pillar preservation.

After elaborating on and implementing this idea, the following game was created. This is Claude's output as-is:

Claude's original Pillar Paraglider

Pressing the button makes the red player character ascend while simultaneously emitting a circular shockwave around it. The shockwave destroys pillars. While it seems overly influenced by Flappy Bird, it has implemented some interesting behavior as a game.

However, this game clearly has several issues:

It's too easy. You can nullify pillars with shockwaves by continuously pressing the button. Nothing happens when the player character reaches the top or bottom of the screen, so you almost never get a game over.
The scoring system is simplistic. The score is just based on the distance traveled, so risky actions don't lead to higher scores.
It doesn't realize the original core mechanic. It says "balancing ascent with pillar preservation," so the original intent was probably to make a game where you avoid destroying pillars with shockwaves. However, it's not implemented that way. Also, if we faithfully implement that mechanic, it would result in a stressful game where you play with a player character that has a large hitbox.

In Generative Reroll Game Development, the important process is how to improve these imperfect games with interesting behaviors into games that are also fun to play.

For this game, I made the following changes:

The circular shockwave emitting in all directions is too powerful, so we gave the player character a direction and made the shockwave emit only in a limited angle in front of it. Also, the shockwave now destroys only the part of the pillar it hits, not the entire pillar. Additionally, the game ends if the player character touches the top or bottom of the screen.
Score is given for each destroyed part. Also, continuous destruction increases the score, encouraging risky behavior of advancing towards pillars for higher scores.
I ignore the core mechanic. Generally, a game about "destroying" is more fun than one about "avoiding destruction."

As a result, the game became as follows:

Improved Pillar Paraglider

It became a game about making holes in pillars with shockwaves to progress. While it might feel a bit too difficult, it has achieved a more appropriate balance of risk and reward than the initial version, increasing its completeness as a game.

The code differences between the improved version and the original are as follows:

Diff between improved version and original

Looking at this, it seems like nothing remains of the original code, but in reality, it's much easier than creating from scratch because you can modify it while referring to the original framework.

The completed version with sound and title became:

Pillar Paraglider, renamed to WAVY BIRD

This is the current state of Generative Reroll Game Development using LLMs. If asked whether this is easier than a person coming up with ideas normally and implementing them, it's not particularly easy. However, the development process of facing challenges from computers, such as selecting games proposed by LLMs and figuring out how to make them interesting, has a different kind of appeal compared to conventional development processes. It's also important to enjoy the process of developing and improving prompts for generation alongside game development in Generative Reroll Game Development.

In the future, we might be able to simply tell an LLM "make an interesting game" without thinking, and it will implement and return a nice game. At present, it's normal to need dozens of rerolls to get a game with interesting behavior, and the hit rate is undeniably low. However, with previous LLMs, it was common for only mediocre games that we've seen somewhere before to come out, and getting anything with interesting behavior was virtually impossible. The fact that we can now generate something that feels at least somewhat novel, thanks to the evolution of LLMs over the past year or so, is a good sign.

If LLMs continue to evolve at this rate, the quality of ideas will improve, and they will be able to implement ideas more accurately as code. They might also be able to modify the code in response to pointed out issues like those mentioned above. This would allow us to create playable games with fewer rerolls and simpler improvements. The future development of Generative Reroll Game Development is exciting to anticipate.

Shooting Games: "Shoot" and "Do Something"

ABA Games — Fri, 19 Apr 2024 08:25:28 +0000

On April 17th, I gave a talk at Tokyo Indies ¹ about generating ideas for shooting games (shmups).

Here ² are the presentation slides in Japanese. However, since most of the content was delivered verbally, the slides alone may not fully convey the topic. Therefore, I'd like to provide some supplementary information.

Shmups are games where you "shoot" and "defeat enemies." It's obvious, but is that really all there is to it?

For example, what about a game where you "shoot" and "create squares"?

That's Quarth （BLOCK HOLE) ³. In the game Quarth, you shoot blocks at mysterious approaching blocks to form squares and eliminate them. So, is Quarth an shmup? The original developer calls it a "shooting puzzle game," so it must be a shmup. This means that the "do something" part doesn't necessarily have to be "defeat enemies" for a game to be considered a shmup.

Come to think of it, I've created various "do something" shmups in the past.

"Shoot" and "attach enemies" game, TUMIKI Fighters ⁴

When you defeat an enemy, it falls and you can pick it up with your ship. The enemy then attaches to your ship and starts counterattacking. As you keep attaching more enemies, your ship eventually becomes larger than the boss, allowing you to overwhelm enemies with incredible firepower. By incorporating the idea of "attaching" into an shmup, I created a slightly different kind of shooting game. It's like a combination of Katamari Damacy and Sky Kid.

"Shoot" and "drop enemies" game, Mu-cade ⁵

The player's ship and enemies are centipede-like, and by hitting enemies with your shots, you can blow them away and defeat them by pushing them out of the game field. At the time, I incorporated the still-obvious technology of a physics engine into an shmup.

By changing the "do something" part from "defeat enemies" to something else, you can create shmups of the novelty. Thinking about what goes into the "do something" part can serve as a guideline for considering game rules, gimmicks, and mechanics.

"Shoot" and "contaminate the Game of Life" game, L.A.2 ⁶

The game field consists of the Game of Life and enemies moving on its cells. The player's ship fires glider patterns, which are Game of Life patterns that fly diagonally. When they hit the Game of Life on the field, they change the color of the cells and contaminate them. Releasing the button destroys the contaminated cells and any enemies on them. It's a combination of the Game of Life and shmup.

"Shoot" and "pierce through the same color" game, COLOR ROLL ⁷

Color-coded bars move left and right, and you shoot when the colors align. The inspiration comes from games like Radiant Silvergun, where you match colors, and the rotating shield wall in the final boss of Gun Frontier, where you aim for gaps.

"Shoot" and "set off chain explosions" game, CRISS CROSS BOMB ⁸

Hitting a horizontal bomb triggers a horizontal explosion, while hitting a vertical bomb triggers a vertical explosion. The objective is to involve as many bombs as possible while skillfully avoiding the blast waves coming towards your ship from the vertical direction.

"Shoot" and "pull down" game, RAID ⁹

Your ship is descending towards a group of skyscrapers and will crash into them if you continue. By aiming bombs diagonally at the middle of the buildings, you can pull down them.

"Shoot" and "get hit by your own shots" game, HAVE IT COMING ¹⁰

You can defeat enemies in the air with your ship's shots, but the shots follow a parabolic trajectory and fall back down. If you get hit by them, your ship is also destroyed, so recklessly shooting is dangerous. However, when shots hit the ground, they become bonus items, so if you're aiming for a high score, it's important to shoot recklessly and dodge.

By devising the purpose of shooting, the consequences, risks, and rewards triggered by shooting, you can create new games. By making players think about why and when to perform the act of "shooting," you can create shmups with new ways to enjoy them.

However, this shouldn't be considered for all shmups. In particular, you need to be very cautious about introducing risk to the act of shooting. If you want to emphasize the exhilaration of destroying enemies, you should make bullets and enemy destruction unlimited, and there's no need to introduce tactical elements. Gimmicks like conserving shots to maintain combos by leaving enemies alive in DoDonPachi or timing single shots to hit enemies of the same color in Ikaruga are very interesting, but some players may find them stressful, and I don't think all shmups should be like that.

"Shoot" and "interfere" game, COUNTER B ¹¹

This game isolates the interference spheres seen in games like Metal Black, G-Darius, and Border Down. The situation where your attacks and the enemy's attacks collide and compete is irresistible.

"Shoot" and "defeat enemies together" game, THROW M ¹²

I love the part in Pooyan where you defeat the balloon-riding wolves together with meat, and I just wanted to enjoy that aspect. Defeating enemies together for high scores is a traditional game rule from long ago.

"Shoot" and "defeat distant enemies" game, MORTAR ¹³

The inspiration comes from Sega's Heavy Metal, a vertical scrolling shooter with a groundbreaking system that allows you to destroy off-screen enemies with long-range missiles.

Borrowing the enjoyment of features from other games is also an option. With fond memories of the fun of a particular feature in a game, let's pay homage with love.

"Shoot" and "make explosions to defeat enemies while using the blast to adjust player's trajectory and dodge enemies" game, BOMB UP ¹⁴
"Shoot" and "destroy enemy tanks, but you can also make holes in the ground to dodge enemy bullets" game, GRENADIER ¹⁵
"Shoot" and "jump, creating obstacles on the ground, but collecting items allows you to shoot and destroy those obstacles" game, R WHEEL ¹⁶

As shown above, packing too many gimmicks into the act of shooting can confuse players. It's fine for small games, but it's best avoided in general shooting games.

How are these ideas based on "shoot" and "do something" keywords useful in creating shmups? For example, they can be helpful in considering the core mechanics that represent a shmup. Here, core mechanics refer to defining mechanisms that characterize the game, such as the liberation beam in Metal Black or the chain bonus in Radiant Silvergun. Setting impactful core mechanics can help differentiate the game.

They can also be used to come up with special gimmicks for stages or bosses. Rather than core mechanics that apply throughout the entire game, incorporating these ideas into mechanisms that occur in specific parts of the game, such as certain stages or bosses, can add variety to the game's progression. For example, the Gradius series provides players with new experiences in each stage by having core gimmicks for each stage.

I gave a talk on considering shooting game ideas based on "shoot" and "do something." If this article can inspire readers to think that creating slightly unusual shooting games might be interesting, I'd be delighted. By not being constrained by the genre framework of shooting games and taking on game design with a free mindset, we can expand the possibilities of shooting games. By delivering fresh surprises to players, showing them that shooting games can be enjoyed in such ways, the fan base for shooting games will grow further. Shooting games have a long history as a genre. That's why I hope we can pass on their fun and potential to the next generation.

Forem: ABA Games

Godot Is Well-Suited for Game Development with AI Coding Agents

Why Godot and CLI Agents Work Well Together

Straightforward CLI Builds

Text-Based Resource Files

You Can Start Without an MCP Server

A Practical Trial

Collision Debugging and the Value of Screenshots

Headless Tests as a Safety Net

Environment Setup Notes

Why This Workflow Matters

AI Makes Us Start Too Many Projects

Is Game Programming Actually Harder for AI Agents?

V-GameGym (2025): 2,219 Pygame Tasks with Visual Evaluation

GameDevBench (2026): A Godot Engine Benchmark

DomainCodeBench (2024): Cross-Domain Evaluation

Why Are Coding Agents Bad at Game Programming?

I Kept Enjoying Game Programming—and Ended Up with 500 Homemade Games

Making AI a Better Coder by Teaching It to Doubt Itself

Introducing /criticalthink: A Command to Sanity-Check Your AI's Suggestions

abagames / slash-criticalthink

The `criticalthink` command is a custom command that embeds healthy skepticism into the dialogue process itself as a countermeasure against AI's "confirmation bias" and humans' "authority bias" of blindly trusting AI responses.

slash-criticalthink

Overview

Target Audience

Setup

Option 1: Manual Installation

Why the CQoT Framework Sparked This Idea

Introducing /criticalthink

How It Differs from CQoT

Putting It to the Test

Best Practices

Using /criticalthink on This Very Article

Final Thoughts

AI Creates and Plays a Game, Completing the "Game You Don't Have to Play"

abagames / narrative-engine

A life simulation engine for multi-party TRPG storytelling with AI-driven autonomous gameplay and narrative generation.

Narrative Engine

🎮 Project Overview

📖 Sample Replays

Thunder Storm Campaign (View)

Eiroku Mist Chronicles (View)

Getting LLMs to Create, Play, Evaluate, and Improve Games

Creating Mini-Games in the Age of Generative AI - Generating Ideas, Code, Graphics, and Sound

ABA Games ・ Jan 26

abagames / chat-craft-click

ChatCraftClick is a platform that enables you to easily create, test, and improve high-quality one-button games using interactive AI.

ChatCraftClick: AI-Assisted One-Button Game Development Platform

Project Overview

Features

1. Data-Driven Game Design and Automatic Evaluation System

Creating Mini-Games in the Age of Generative AI - Generating Ideas, Code, Graphics, and Sound

abagames / claude-one-button-game-creation

How to create a one-button action game using the AI chatbot Claude

Claude's One-Button Game Creation

How to use

Examples of Built Games

A New Relationship Between AI and Humans - Human Potential as a Source of Unpredictability

Unpredictability That Only Humans Can Achieve

New Creativity Considered Through the Differences Between LLMs and Human Memory

I've Made 111 One-Button Mini-Games into a Collection (Out of 200 Created)

Generative Reroll Game Development Using LLMs

Shooting Games: "Shoot" and "Do Something"

Introducing `/criticalthink`: A Command to Sanity-Check Your AI's Suggestions

Introducing `/criticalthink`

Using `/criticalthink` on This Very Article