Forem: linou518

AI agents need operating rules, not just prompts

linou518 — Fri, 24 Apr 2026 12:03:45 +0000

AI agents need operating rules, not just prompts

When people start using AI agents, the first thing they usually optimize is the prompt.

That is not wrong. It is just usually not enough.

If you want an agent to move from “sometimes gives a good answer” to “delivers work reliably every day,” the real limit is often not prompt quality. It is whether the agent has clear operating rules.

By operating rules, I do not mean abstract principles. I mean the hard constraints that directly change execution quality:

what must be checked before taking action
which facts must be verified instead of recalled from memory
which files and directories are in scope and which are off-limits
whether failure should trigger exit, retry, or escalation
when the agent may proceed autonomously and when it must stop for human review

Without those rules, agents tend to develop a familiar failure mode: they look proactive, but the results are inconsistent.

Prompts alone do not stabilize branching work

Prompts are good at telling an agent what kind of behavior is desired.

What is harder in real workflows is defining the order of decisions and the conditions for branching.

Even a simple scheduled publishing job contains real operational branches:

Is there source material for today?
Does it need editing and redaction?
Do different platforms require different language versions?
If one platform token is invalid, should the rest continue?
Where should published files be archived?

A single instruction like “publish today’s blog post to four platforms” may succeed once.

But when inputs are missing, credentials expire, or a repo contains uncommitted changes, the agent starts improvising. Improvisation is not the same as intelligence. In production, it often means unauditable randomness.

Operating rules are what create consistency

An agent becomes useful over time only if similar problems receive similar-quality handling.

That means moving key decisions from “figure it out on the spot” to “define it in advance.”

The most important rule categories are these.

1. Preflight rules

Check inputs, credentials, target paths, and external dependencies before execution starts.

This sounds basic, but it prevents a large class of low-level failures. Many automation incidents happen not because the model is incapable, but because the workflow keeps running after its prerequisites have already failed.

2. Evidence-first rules

If a file can be read, do not guess. If logs exist, do not imagine. If an API returned a status, do not rely on impressions.

One of the biggest risks with agents is not inability. It is confidence without verification.

3. Scope rules

Define what the agent may change and what it may not touch.

For example, the workspace may be reserved for configuration and memory, project files may live in a shared project directory, and temporary artifacts may be restricted to a known temp area. Without scope rules, environments become messy quickly and later audits become expensive.

4. Escalation rules

When the agent hits a permission boundary or lacks enough information, the rule should require escalation rather than self-invented recovery.

That may look conservative, but it matters in real systems.

Prompts shape style; rules shape operability

Prompts still matter. They affect tone, writing quality, preference ordering, and the overall feel of the agent.

But the questions that decide whether an agent can be used in daily operations are more practical:

Does it check dependencies first?
Does it leave a traceable record?
Does it admit uncertainty when facts are missing?
Can it separate partial success from failure?
Can it stop before crossing a boundary?

Those answers usually do not live in prompt wording. They live in operating rules.

A simple maturity test

If you want to judge whether an agent system is mature, do not start by asking how long the prompt is. Ask these four questions instead:

Does it have a fixed startup checklist?
Does it have explicit file and permission boundaries?
Does it define what to do after failure?
Can it record important decisions for later review?

If two or more of those are missing, the system is probably still in the “good demo” stage rather than the “operational tool” stage.

Conclusion

Turning an AI agent from a demo into a stable production tool is not mainly about making the prompt sound more human. It is about designing operating rules that make the workflow behave like a system.

Prompts define expression. Rules define constraints. Prompts influence how the agent speaks. Rules determine how it works.

If I had to strengthen only one of them first, I would strengthen the rules. Most production failures are not caused by tone. They come from missing boundaries, missing checks, and missing failure handling.

Rollback Scripts Are Not System State: Why Runtime Truth Comes First in Recovery Work

linou518 — Tue, 21 Apr 2026 12:02:21 +0000

Rollback Scripts Are Not System State: Why Runtime Truth Comes First in Recovery Work

In operations work, it is easy to treat what exists in the repository, what is written in deployment scripts, or what is declared in a compose file as if it were the current state of the system.

That is a mistake.

What actually matters is what is really running right now. I think of this gap as the separation between source truth and runtime truth.

A recent recovery task exposed this problem again. The original goal was simple: redeploy a set of custom plugins into a newer environment. But if you only look at deployment scripts, image tags, and compose definitions, it is easy to conclude that the environment is already aligned. The real questions should come first:

Which containers are actually running in production right now?
Are the volumes carrying forward the wrong generation of data?
Does the plugin merely exist on disk, while remaining disabled in the platform?
Has the target environment's business data already been overwritten by another environment?
If a feature is missing in the UI, is the problem in the frontend entry point, a backend switch, or the plugin state itself?

Those are runtime-truth questions.

Why source truth can mislead you

Source truth is still important. It defines what the ideal state should be. But in recovery, rollback, and migration scenarios, the system is often already drifting away from that ideal state.

Here are a few common traps.

1. The code exists, so the feature must exist

No.

If the plugin source exists in the repository, that only proves that the feature was implemented at some point. It does not prove that:

the plugin was built,
the build output made it into the correct image,
that image was deployed,
the container restarted with the new version,
the platform actually enabled the plugin,
or the frontend is exposing the entry point to users.

If any one of those steps breaks, the user still sees the same outcome: the feature is missing.

2. The compose file is correct, so production must also be correct

Also no.

Production containers may not have been recreated. Old volumes may still be attached. Environment variables may still come from an earlier release. Sometimes even the service names are correct while the internal process state is not.

docker compose config tells you how the system is supposed to start. docker ps, docker inspect, mount points, and in-platform enablement states tell you how it is actually running.

3. The rollback script finished, so the system is recovered

This one is especially dangerous.

A successful script only proves that the script completed its own actions. It does not prove that the business state returned to the intended version.

A proper recovery check needs to verify:

whether critical data is back where it should be,
whether critical plugins are visible to end users,
whether a real user path succeeds end to end,
and whether the boundary between environments is still intact.

Without that, "recovery completed" is just a surface-level status.

The order I now prefer for recovery work

When the problem is "we deployed the wrong thing, now recover it," this is the order I recommend.

1. Define what you are protecting first

Before anything else, be explicit about the protected object:

Are you protecting data?
Are you protecting plugins/code?
Are you protecting the identity of the current environment such as users, agents, and settings?

A lot of recovery incidents are not caused by lack of technical skill. They happen because the protected object was never clearly defined. You think you are restoring plugins, but you touch business data. You think you are syncing code, but you overwrite a live environment.

2. Inspect runtime before you inspect the repo

The investigation order should be:

running processes and containers,
volumes and bind mounts,
in-platform enablement state,
user-visible entry points,
and only then the repository, scripts, and images.

That order prevents the classic illusion of "but the code is clearly there."

3. Validate the user path, not the engineer path

Engineers often comfort themselves with checks like these:

the file exists,
the API returns 200,
the container is running,
the logs show no errors.

That is not enough.

The useful validation is to walk the user path:

Can the user see the entry point?
Can they open it?
Can they complete the core action?
Is the result correct?

If the user path is broken, the system is not recovered.

4. Treat partial recovery as a real state, not as a useless failure

In real automation flows, partial success is normal.

One article may publish to three platforms while one fails. A plugin may be redeployed while a token failure blocks one sync step. Code may be deployed while a platform toggle is still off.

The worst response is to label the whole thing as "failed" and preserve no useful state.

A more practical approach is to:

record what succeeded,
isolate what failed,
preserve retryable state,
and only rerun the failed parts later.

Recovery work, like publishing work, is usually not binary. It converges in stages.

A simple checklist that catches a lot of mistakes

Before doing recovery or rollback work, I now ask these six questions:

Am I protecting code, data, or environment identity?
Am I looking at source truth or runtime truth?
What are the actual states of containers, volumes, environment variables, and platform switches?
Can users see and use the target feature?
Which parts are genuinely successful, and which parts only look successful?
If something fails, did I preserve retry information or force myself to start over?

These are simple questions, but they block a surprising number of avoidable recovery mistakes.

Conclusion

The most dangerous thing in rollback and recovery work is not an error message. It is the absence of one.

A script finishing successfully, containers running, and repository code looking correct do not prove that the system is actually correct. The thing you should trust first is runtime truth: what is running in production right now, and what users can actually use right now.

If you do not verify that first, your "recovery" may only be moving the problem into a less visible place.

Multi-platform publishing is not only successful when everything succeeds. It should support partial completion.

linou518 — Mon, 20 Apr 2026 12:02:54 +0000

Multi-platform publishing is not only successful when everything succeeds. It should support partial completion.

When people design an automated publishing flow, the default goal is usually simple: publish the same article everywhere in one run. That goal is reasonable. But a system that only accepts 100% success as the only valid outcome is usually weak in real operations.

That is because multi-platform publishing is not a single action. It is a task with shared input, multiple outputs, and independent failure surfaces.

The same article may go to Zenn through a git push. Qiita may depend on token scope. dev.to may care about the shape of the request body. Hashnode may depend on a working GraphQL mutation and the correct publication configuration. The theme, content, and timing are shared, but the failure mode is not.

So if one platform fails and the whole run aborts with nothing more than “publish failed,” the automation is still designed for an idealized environment.

The real target is not total success. It is the maximum explainable completion.

A more practical goal is this:

Preserve content consistency, complete as many destinations as possible, and record failures explicitly.

This is not about being tolerant of failure. It is about making sure a local failure does not erase the value of the rest of the run.

Take a common case:

Zenn succeeds
Hashnode succeeds
dev.to succeeds
Qiita returns 401 because the token expired or lost the required scope

The worst possible behavior here is to mark the entire run as failed and stop there.

From an operational point of view, that is false. The run did not fully fail. Most of the external distribution was completed. What remains is a clearly bounded, repairable problem on a single platform.

If the system cannot express that difference, the operator sees the wrong picture. The label says “failed,” while reality is “3 out of 4 completed, 1 auth issue remains.”

Multi-platform work should settle results per platform

The stable design is not “one command tries its luck against all four platforms.” It is:

generate the shared content artifacts first
submit to each platform independently
record each platform result independently
emit a structured summary at the end

This gives you several concrete benefits.

1. One platform problem does not erase the others

If Qiita fails but Zenn and dev.to have already gone live, those successes should remain visible as successes. A late-stage error should not rewrite the whole run as if nothing happened.

2. Troubleshooting becomes faster

“publish failed” is nearly useless.

“Qiita: 401 Unauthorized; Zenn: success; Hashnode: success; dev.to: success” is useful. It immediately tells you to repair authentication first instead of suspecting the content, network, or entire publishing pipeline.

3. Retries become smaller and safer

If the output is structured, the next run only needs to retry the failed destinations.

That saves requests, but more importantly it reduces the risk of duplicate posts, duplicate commits, and duplicate notifications.

The dangerous part of automation is not failure itself. It is opaque failure.

A common mistake in publishing workflows is to treat “automatic execution” as the main goal and “auditability” as a nice extra.

The priority should be reversed.

For a cron-driven publishing job, the most important question is not whether the script ran. It is whether someone else can immediately answer the following after it finishes:

which article was published today
which platforms received it
which platform failed
whether the failure was auth, format, network, or rate limiting
where the artifacts and result records were stored

If those questions cannot be answered quickly, the workflow is still immature even if it partially succeeded.

Treat results as first-class artifacts, not as disposable terminal output

A reliable publishing flow should always produce more than the article itself. It should also produce two kinds of artifacts:

content artifacts for review: the source draft, the Japanese version, and the English version
result artifacts for operations: per-platform status, URLs, HTTP codes, and failure reasons

That means the result should not live only in scrolling terminal output. It should be written as explicit files such as:

publish-result.json
publish-report.md

The first is for machines. The second is for humans.

With that structure, you do not need to inspect shell history the next day or guess where yesterday’s run got stuck. The evidence is already in the artifact directory.

One practical test

I now use one sentence to judge whether a multi-platform publishing pipeline is well designed:

If one out of four platforms is temporarily broken, can the other three still complete, and can the broken point be recorded clearly?

If the answer is no, the system is not really automated publishing. It is just serialized luck.

Real automation should not depend on a perfect environment. It should accept local failures, preserve overall progress, and leave behind enough information to make the next repair step obvious.

The most valuable capability in multi-platform publishing is not getting a perfect run every time. It is keeping the result orderly, visible, and recoverable when the run is not perfect.

A scheduled job should not just repeat. It should decide.

linou518 — Sat, 18 Apr 2026 12:02:41 +0000

A scheduled job should not just repeat. It should decide.

Many teams treat cron as nothing more than “run this command at this time.” That is not wrong, but it is only half true. A stable scheduled job must do more than repeat. It must make decisions at run time.

A job that blindly runs the same shell every day becomes fragile as soon as reality changes. Some days you have input material, some days you do not. Some days external APIs are healthy, some days they rate-limit you. Some runs should complete on the primary path, while others should switch to a fallback path.

That is why I prefer to think of a cron job as a time-triggered decision point, not a time-triggered fixed action.

Take automated blog distribution as an example. The visible requirement sounds simple: publish one post every day at 9 PM. But the execution model already contains at least three branches:

A draft for today exists → edit it, translate it, and distribute it to each platform
No draft exists → choose a theme autonomously, write a post, then distribute it
One platform fails → continue with the others and return a structured result instead of silently failing the whole run

If we still implement that as “always run the same command,” the automation will become brittle very quickly.

The unreliable part is not the clock. It is the input.

A surprising number of automation failures do not start with a missed trigger. They start with an implicit assumption that the required input will always be there.

Typical assumptions look like this:

today’s source material will always exist
credentials will never expire
the API response shape will never change
the target platform will never throttle the request
state from the previous run will never leak into the next one

Break only one of these assumptions and your “automation” turns into a machine that creates a fresh investigation every day.

That is why the most important part of scheduled-job design is usually not the cron expression. It is the input check and branch strategy.

Three questions every good scheduled job should answer first

1. Do I have enough input to take the primary path?

Do not rush into the business action. First confirm the input.

For a blog distribution job, the first check should be whether a file like YYYY-MM-DD_*.md exists for today. If it does, go down the edit-and-publish path. If it does not, switch to a fallback writing path. That one decision prevents the entire 9 PM slot from being wasted just because a file was missing.

2. If the primary path is blocked, what is the downgrade path?

A downgrade path is not a warning line after failure. It is part of the design.

skip today
generate substitute content
publish only to the platforms that are available
save the artifacts without making them public
escalate to human review

Automation without a downgrade path is basically a manual process with an alarm clock attached.

3. After the run finishes, how will someone else know what happened?

One of cron’s biggest operational weaknesses is that nobody is watching while it runs.

That means the output must be audit-friendly by default:

which inputs were checked
which branch was selected
which platforms succeeded and which failed
whether the failure was HTTP status, auth, or format mismatch
where the final artifacts were stored

A scheduled job without a summary is hard to operate even when it succeeds.

Upgrade cron from a script launcher to a duty agent

My preferred design is to let cron wake a small agent instead of embedding an opaque shell pipeline directly in the scheduler.

The difference is straightforward:

a script launcher can only execute predefined steps
a duty agent can inspect context, choose a branch, and summarize the outcome

This pattern works especially well for recurring tasks that always happen on time but do not always require the same action:

blog publishing
data aggregation
weekly report generation
inbox triage
health checks
routine cleanup

They all share the same property: the trigger time is fixed, but the correct action for the day is not.

Once you accept that, the structure should no longer be “schedule + command.” It should be “schedule + evaluation + branching + reporting.”

One simple test

I now use one sentence to evaluate whether a scheduled job is well designed:

If today’s input is different from yesterday’s, can this job do something different and still do the right thing?

If the answer is no, the job is probably still stuck in the “mechanical repetition” stage.

The real value of cron is not that it wakes up on time. It is that once awake, it knows how to judge the reality of the day.

Repo Truth Production Truth: A Container-First Troubleshooting Pattern for Runtime Drift

linou518 — Wed, 15 Apr 2026 11:07:42 +0000

Repo Truth ≠ Production Truth: A Container-First Troubleshooting Pattern for Runtime Drift

We ran into another operations problem that wastes a lot of time precisely because it looks deceptively simple: the implementation exists in the repository, but the actual UI and API behave as if the feature was never deployed. In that situation, it is very easy to keep staring at source code or to blame frontend logic, routes, or permissions too early. More precisely, the first thing to verify is not repo truth but live runtime truth—and in Docker environments, the shortest entry point to that is often container truth.

A Git repository can prove that somebody wrote the code. It cannot prove that the process currently serving requests is actually running that code. In Docker-based systems, those are often two different realities.

What the problem really was

The workflow page in AI Back Office Pack was behaving incorrectly. The workflow implementation was visible in source, yet the page did not work and the API behavior did not match expectations. From there, it is tempting to start digging through application logic. That is usually where time gets burned.

The more effective order was much simpler:

confirm the live endpoint mapping: which proxy receives this domain/path right now, and which service/container it actually forwards to
confirm the implementation exists in source
confirm the build artifact contains the expected output
confirm the running container actually includes that artifact
then inspect route and reverse-proxy details
finally inspect authentication responses and API semantics

The final conclusion was not "the code is missing." It was "the code is not what the container is running." The workflow module existed in the repository, but the live api and dashboard containers were still using old images and old artifacts. In other words, code truth and container truth had drifted apart. That is a textbook runtime drift incident.

Why I now prioritize container truth

In local development, source is often close enough to reality. In Docker / Compose / multi-service operations, that assumption becomes dangerous.

Users do not hit your Git repository. They hit:

a specific image
a specific container
a specific running process
a route that is actually active

That is why source truth is only one piece of evidence in production debugging. The final authority is the live runtime currently serving requests, and in Docker environments container truth is often the fastest route to verifying that runtime truth.

A debugging order that wastes less time

The next time I see symptoms like "the code exists but the page does nothing," "the repo has it but the API returns 404," or "we changed it but production did not move," I will use this order first.

0. Live endpoint mapping

Confirm which LB or reverse proxy currently receives the request, and which service/container it really lands on. If you are looking at the wrong container, everything after that is wasted effort.

1. Source

Verify the implementation really exists.

2. Artifact

Verify the built output, bundle, or dist files contain the feature. Source existing is not enough.

3. Container

Enter the running container and inspect the deployed files directly. In this case, the key question was whether /app/dist/modules/workflow actually existed inside the container.

4. Route / Proxy details

If the files are present, then verify the route is mounted and the reverse proxy is pointing at the correct upstream.

5. Auth / API semantics

Only after those layers are verified does it make sense to spend time interpreting 401, 403, or 500 responses.

The value of this order is simple: it answers whether all the evidence you are looking at refers to the same deployed reality. A lot of troubleshooting time is lost trying to explain a layer-B failure with layer-A facts.

404 versus 401 is not just a different error code

One especially useful signal in this case was the endpoint transition:

before: 404
after rebuilding and recreating containers: 401

That does not mean "it is still broken, just with another number." It means something structurally changed.

404 strongly suggests something is still wrong at the route, artifact, mount, or proxy layer
401 means the endpoint is likely reachable now, and the next layer to inspect is authentication or permissions
403 suggests authentication may have succeeded but policy or authorization is still blocking access
5xx points more toward the app, dependencies, config, or upstream failures

So even when the error is not gone yet, a shift in error semantics can prove that troubleshooting has advanced one layer forward.

The illusions Docker creates

Docker environments make several false assumptions feel natural:

we did git pull, so production must be current
the file changed, so the image must include it
the image was rebuilt, so the running container must be new
the container restarted, so the service must be running the latest code

None of those is guaranteed. A mismatch at any layer can leave you with new code in theory and old behavior in production.

For operators, the more important question is not merely "is the repository correct?" It is:

which live runtime is actually receiving this request path right now, and what exactly is inside that container?

That is the answer worth establishing first.

Takeaway

My default rule for this class of incident is now much clearer:

When source and production behavior disagree, suspect runtime drift. In Docker environments, container truth is often the fastest place to start.

Do not start by judging the code. Do not jump straight into application-layer explanations. First separate the layers:

is source correct?
is the artifact correct?
is the container correct?
is the route correct?
what layer is the auth or API response actually describing?

If the order is right, these incidents are usually manageable. What makes them expensive is usually not the bug itself, but looking at the wrong layer for too long.

The code exists, but production still does nothing: why runtime drift should be your first suspect

linou518 — Tue, 14 Apr 2026 12:02:49 +0000

The code exists, but production still does nothing: why runtime drift should be your first suspect

One of the most misleading failure modes in OpenClaw-style operations is runtime drift: the source code says one thing, while the running system is still living in the past. The case that triggered this lesson looked simple at first. In AI Back Office Pack, the workflow screen appeared to do nothing when clicked. That kind of symptom makes people suspect frontend bugs, broken routes, or API failures. In reality, the root cause was much simpler: the workflow code existed in the repository, but the Docker containers still running in production were old.

This is exactly the kind of issue that fools anyone who stops at source inspection. The repository already contained the workflow implementation. The UI components were there too. That naturally pushes the investigation toward routing, auth, or client-side behavior. But in production, the first question should be different: is the artifact currently running actually built from the source you are reading?

The investigation became clear once we forced the order: source → build artifact → running container → route → auth. That sequence matters. After verifying that the workflow code existed in source, the next step was not to dive into browser logs or backend traces. It was to confirm whether the built output actually contained the workflow module. Skipping that check wastes time fast. In this case, both the api and dashboard containers were still based on older images, so the runtime simply did not contain the updated workflow module.

So the visible problem was not a broken feature. It was an undeployed feature. Source truth and runtime truth had diverged. This is where Docker-based operations can quietly lie to you. You may have updated docker-compose.yml, pulled the latest source, and even built assets locally. None of that proves the currently listening process is using that build.

The fix itself was straightforward: rebuild and recreate the api and dashboard containers for ai-backoffice-pack, then replace the old runtime with artifacts that actually included workflow support. Once that was done, the "it does nothing" behavior disappeared without any exotic code changes.

The real lesson was not the rebuild. It was the debugging discipline. In environments like OpenClaw, where AI services, web apps, jobs, auth, and containers all interact, people tend to search for sophisticated causes too early. But many outages still come from boring mismatches: stale containers, stale dist files, or configuration changes that never reached the running process.

My rule is now much stricter: do not stop at "the code exists." Keep going until you can say that the code was built, deployed, and is actually present inside the running process. If you skip that chain, operations will happily mislead you.

A practical isolation order

Verify the implementation exists in source.
Verify the build artifact contains it.
Verify the running container actually has that artifact.
Verify the route exists and use status codes like 404 vs 401 vs 500 as evidence.
Only then go deeper into auth, permissions, or frontend logic.

If production seems to ignore code that clearly exists in the repo, do not start with application theory. Start with runtime drift.

When a Saved Task Disappears After Refresh: Fixing a Dual Data Source Trap in a SPA

linou518 — Mon, 13 Apr 2026 12:03:14 +0000

When a Saved Task Disappears After Refresh: Fixing a Dual Data Source Trap in a SPA

While reviewing a dashboard’s project task screen, we ran into a classic frontend trap. The symptom looked simple: after adding a task, it immediately appeared in the UI, but after a page reload it vanished. The first suspects should usually be an API failure or a broken save path. This time, neither was the root cause. The real issue was worse: two different implementations were pretending to be the same feature.

There were actually two data flows in the frontend. One path lived in app.js. It loaded tasks.json through loadData() and sent add, delete, and toggle operations to /api/task/add, /api/task/delete, and /api/task/toggle. That path went through the backend, so the data was persisted. The other path lived inside an inline script in index.html, where it directly mutated an in-memory object called simpleProjectsData. On screen, both paths looked like “a task was added.” In reality, the second path was only changing temporary browser state, so everything disappeared after refresh.

That is what makes this kind of bug annoying: the UI looks alive enough to fool you. The button responds. The list updates. So the eye goes to rendering first, not to persistence. But the real problem was architectural. The moment you have two competing sources of truth, you have already lost the design battle. One path trusted tasks.json. The other trusted page memory. It was only a matter of time before they diverged.

The fix was not dramatic. First, we updated /api/task/add so it could accept task as well as title, making it easier for the UI to call the backend path consistently. Next, we added /api/task/delete so deletion would remove the matching line from Markdown, run _regenerate(), and rebuild tasks.json. In other words, the goal was not to make the screen look updated. The goal was to force all writes through a single persistence path.

The lesson was clear: in SPA debugging, it is often faster to question ownership of state than to stare at the visible symptom. Especially in long-lived single-page apps, temporary scripts and old implementations tend to survive. Over time they start sharing responsibility for the same feature through different routes. At that point, the real fix is rarely another if statement. It is deciding what the single source of truth should be, and removing the rest.

Frontend work is not just about making a screen look responsive. It is about making sure user actions still mean the same thing after time passes and the page reloads. A button moving is not the same thing as a feature working. That was the reminder from this fix.

The Code Exists, but the Container Is Still Old: A Real Runtime Drift Failure in Docker Operations

linou518 — Mon, 13 Apr 2026 12:03:11 +0000

The Code Exists, but the Container Is Still Old: A Real Runtime Drift Failure in Docker Operations

We recently hit a very typical but easy-to-miss failure in OpenClaw / AI Back Office operations. The conclusion was simple: a feature existing in the source code is not the same thing as that feature existing in the running container.

The target was the workflow module in ai-backoffice-pack. In the repository, the workflow implementation was clearly present. But in the actual UI, the feature behaved as if it did not exist. The first suspects were the usual ones: missing implementation, an unregistered route, or an auth problem. None of those were the root cause. The real problem was that the production api and dashboard containers were still running with old build artifacts.

In other words, the source had the workflow module, but the running container’s /app/dist/modules directory did not. That is runtime drift: the truth in Git and the truth in production stop matching. If you only read the code, it is easy to miss.

What helped most was not expanding the investigation too early. We kept the verification order tight:

Confirm the workflow implementation exists in source.
Confirm the workflow module is included in the build artifact.
Confirm that artifact is actually present inside the running container.
Confirm the route is exposed.
Confirm how the response changes after authentication.

That order turned one detail into strong evidence: at first the endpoint returned 404, and after a rebuild it returned 401. A 404 strongly suggests the route itself is not there. Once it changes to 401, you know the route is alive and the next layer to inspect is authentication. In this case, rebuilding and recreating the containers changed the endpoint behavior and proved that the issue was not missing code. It was an old container still serving stale artifacts.

The fix itself was not dramatic. On the infra node, we ran docker compose build api dashboard, then docker compose up -d api dashboard, and finally rechecked /app/dist/modules/workflow inside the container. After that, the workflow module was present in the runtime as expected.

The operational lesson was straightforward:

Do not conclude “it is there” just because you saw it in source.
In Docker-based systems, always separate source, build artifact, and running container in your checks.
A 404 changing into 401 is an important observation point during recovery.
Even when the problem looks like a UI issue, the real cause may be deployment drift.

AI and multi-agent systems have many layers: configuration, containers, routing, and authentication. That is why the sequence source → artifact → container → route → auth is so effective. If you stay at the vague level of “the code exists, so why is it broken?”, you can lose hours. That was the real lesson from this incident.

Don’t Expose Raw Calendar Data: Designing a Dashboard API Around Daily Execution Blocks

linou518 — Sun, 12 Apr 2026 12:37:22 +0000

After revisiting a dashboard scheduling feature, I ended up with a clearer conclusion: the real value is not the calendar integration itself. The value comes from not exposing raw calendar data directly, and instead turning it into a server-generated set of daily execution blocks.

In the Techsfree dashboard, raw schedule input lives in something like tasks_ms.json. That file contains meetings, breaks, and other imported calendar events. Useful, but incomplete. If you send that straight to the frontend, users can see what is scheduled, but they still cannot easily see how to run the day. A list of meetings does not answer what to do before the meeting, after the meeting, or during open time.

The UI becomes much more useful when it consumes a normalized structure like schedule/schedule.json instead. In that form, the API returns a chronological block list with fields such as start, end, label, type, and status. Meetings sit next to deep work sessions, review tasks, pipeline checks, and breaks. That changes the API’s job from “show events” to “shape the day into operational units.”

This design choice matters more than it first appears. If the frontend has to merge raw meetings and raw tasks on the fly, UI code ends up owning conflict detection, insertion rules, break handling, empty-slot filling, sort guarantees, and task grouping. Very quickly, a visual layer turns into a scheduling engine.

Server-side block generation avoids that drift in responsibility. The frontend can stay simple: render the ordered blocks. It does not need to know how the schedule was produced. This is not just cleaner separation of concerns. It also improves operations. Scheduling bugs stay in the API layer; rendering bugs stay in the UI layer. Diagnosis becomes faster because failures are easier to localize.

Another advantage is resilience to imperfect inputs. Raw calendar data often contains edge cases: duplicated entries, zero-duration meetings, inconsistent labels, or partially missing metadata. If the server normalizes everything into execution blocks first, the UI does not need to inherit all that mess directly.

In many SaaS integrations, teams stop at “we can fetch the data and display it.” But the higher-value step is transforming that data into the shape users actually need for daily work. Especially in dashboards designed for multi-project operation, the goal is not a faithful copy of a calendar—it is a structure that helps someone decide the next 30 minutes quickly.

The takeaway is simple: schedule APIs are stronger when they return action-oriented time blocks instead of raw event lists. The visual result may look similar, but the architecture becomes easier to maintain, easier to debug, and much more useful in practice.

When the Code Exists but Production Still Fails: Why Runtime Drift Should Be Your First Suspect

linou518 — Sun, 12 Apr 2026 12:37:20 +0000

I ran into a classic operations problem in AI Back Office Pack: a workflow feature clearly existed in the source tree, but it still did not work in production. The real mistake was assuming the application layer was the most likely failure point. In this case, the first question should have been much simpler: is the running runtime actually carrying the code we think it is?

The symptom looked like an app bug. The workflow module was present in source, the UI did not respond as expected, and the API behavior was wrong. It is very tempting to inspect route definitions or frontend wiring first. But the actual issue was that the api and dashboard containers were still running old build artifacts. The problem was not “missing code.” It was runtime drift: source had moved forward, while the live containers had not.

A stable verification order helped clarify the situation quickly:

Confirm the implementation exists in source.
Confirm the build artifact contains the expected output.
Inspect the running container and verify the expected files are really there.
Test whether the route exists, and use the response code to understand the next layer.

The strongest evidence came from two checks. First, the expected dist/modules/workflow path existed in the rebuilt container. Second, the workflow definitions endpoint returned 401 instead of 404. That distinction matters. A 404 usually means the route is absent. A 401 means the route exists and the next place to investigate is authentication or authorization. HTTP status codes are not just errors; they are operational clues.

The recovery was straightforward: rebuild and recreate the api and dashboard services with docker compose build api dashboard followed by docker compose up -d api dashboard. But the lesson is more important than the command. If you stop at “restarting fixed it,” you miss the actual failure mode. The real problem was a mismatch between source, artifact, and running container state.

This kind of issue shows up often in Docker-based operations. Developers update source, but the image is not rebuilt. Or the image is rebuilt, but the container is not recreated. Or one service is refreshed while another long-lived service is still running stale output. In environments like OpenClaw, where config files, generated assets, processes, and external I/O all interact, this layered view becomes even more important.

The practical takeaway is simple: if the code exists but production disagrees, suspect runtime drift before you blame the application logic. Checking the reality of the running layer is usually faster than digging deeper into code that may already be correct.

Why We Stopped Using `echo | base64 -d` for JSON Distribution Over SSH

linou518 — Sat, 11 Apr 2026 12:03:45 +0000

Why We Stopped Using `echo | base64 -d` for JSON Distribution Over SSH

During today’s dashboard work, we revisited how auth-profiles.json gets distributed across multiple nodes. The old approach was to base64-encode the JSON, then send it over SSH with something like ssh ... "echo '<base64>' | base64 -d > auth-profiles.json". It looks convenient, but in real operations it is more fragile than it seems. Long JSON payloads, embedded newlines, shell quoting, and node-specific differences can all combine into intermittent failures. And intermittent failures are the worst kind of operational bug.

The fix was straightforward. On the remote side, we only execute cat > target, and we pass the JSON body directly through stdin with subprocess.run(..., input=auth_content, text=True). On the local node, Python writes the file directly; only remote nodes go through SSH. The key idea is simple: do not treat JSON as a shell string.

Base64 is useful, but it does not fully eliminate quoting problems once a payload crosses shell boundaries. If the real goal is safe file delivery, stdin transport is usually cleaner, easier to debug, and easier to reason about.

This refactor also clarified responsibilities in the code. set_subscription now decides only which keys should be distributed to which node, while the actual write logic is isolated inside write_auth_profiles(). That separation makes failures easier to localize, and it gives us a single place to update if the structure of auth-profiles.json changes later. In SaaS and API-integrated systems, incidents often come less from the API call itself and more from how configuration and secrets are distributed safely. This kind of separation quietly pays off.

The practical lessons are straightforward. First, do not place secret-bearing configuration files on top of shell one-liners unless you absolutely have to. Second, in multi-node operations, prefer implementations that fail in a simple and obvious way over ones that “usually work.” It is not a flashy improvement, but this kind of infrastructure cleanup compounds over time. Before adding more UI, make the distribution path solid.

The Code Exists, But the Feature Still Fails: Fixing Runtime Drift in OpenClaw Operations

linou518 — Sat, 11 Apr 2026 12:03:41 +0000

The Code Exists, But the Feature Still Fails: Fixing Runtime Drift in OpenClaw Operations

One of the most practical incidents we handled on April 8 was a classic production problem: the feature existed in the source tree, but it still did not work in production. The target was the workflow feature in ai-backoffice-pack.

From the user side, the symptom looked simple: the month-end workflow management page was not responding. The easy assumption would be a missing frontend implementation or an API route that had never been wired up. But when we checked the codebase, dashboard/src/pages/Workflow.tsx was there, and the backend also had backend/src/modules/workflow/. In other words, the feature clearly existed in source code.

And yet the endpoint /api/v1/workflows/steps/definitions returned Route not found. At that point, the right thing to inspect was no longer the repository. It was the runtime artifact actually serving traffic. Once we checked the running API container, the answer became obvious: the workflow module was missing from dist/modules. The problem was not incomplete code. The real issue was that an old container image was still alive in production. That is runtime drift. Developers think “the code is there,” users feel “the UI is broken,” and the runtime in the middle is stuck in the past.

The fix itself was not dramatic. On the infra node, we ran docker compose build api dashboard, then recreated the services with docker compose up -d api dashboard. The important part was the verification strategy. We did not stop at “the containers restarted successfully.” We checked that /app/dist/modules/workflow now existed, and then confirmed that the workflow definitions endpoint returned 401 instead of 404. A 401 only means unauthenticated access, but it proves the route is now present. Only after those checks can you honestly say the issue is fixed.

This incident reinforced a troubleshooting order that works especially well for Dockerized business applications:

Is the feature present in source code?
Is it present in the build artifact?
Is it present inside the running container?
Is the route actually exposed?
Does it still work after authentication?

If you stop at step 1, you can waste a lot of time. Steps 3 and 4 usually narrow down the real fault line much faster.

Another related decision that day was architectural. Instead of keeping a separate accounting system and integrating it through APIs, we chose to reuse only the useful UI and upload experience, pull the freee integration logic out of freee-bookkeeper, and consolidate the long-term implementation into the backend, dashboard, and Postgres stack of ai-backoffice-pack. The lesson is similar: the existence of a working side system does not automatically mean you should keep expanding your operational surface area. Short-term reuse and long-term maintenance cost are different decisions.

In real operations, a feature only truly exists when source code, build artifact, container image, exposed routes, and post-auth behavior all line up. Runtime drift is not flashy, but it is exactly the kind of mismatch that quietly burns engineering time. Before blaming the code, inspect what is actually running.

Forem: linou518

AI agents need operating rules, not just prompts

AI agents need operating rules, not just prompts

Prompts alone do not stabilize branching work

Operating rules are what create consistency

1. Preflight rules

2. Evidence-first rules

3. Scope rules

4. Escalation rules

Prompts shape style; rules shape operability

A simple maturity test

Conclusion

Rollback Scripts Are Not System State: Why Runtime Truth Comes First in Recovery Work

Rollback Scripts Are Not System State: Why Runtime Truth Comes First in Recovery Work

Why source truth can mislead you

1. The code exists, so the feature must exist

2. The compose file is correct, so production must also be correct

3. The rollback script finished, so the system is recovered

The order I now prefer for recovery work

1. Define what you are protecting first

2. Inspect runtime before you inspect the repo

3. Validate the user path, not the engineer path

4. Treat partial recovery as a real state, not as a useless failure

A simple checklist that catches a lot of mistakes

Conclusion

Multi-platform publishing is not only successful when everything succeeds. It should support partial completion.

Multi-platform publishing is not only successful when everything succeeds. It should support partial completion.

The real target is not total success. It is the maximum explainable completion.

Multi-platform work should settle results per platform

1. One platform problem does not erase the others

2. Troubleshooting becomes faster

3. Retries become smaller and safer

The dangerous part of automation is not failure itself. It is opaque failure.

Treat results as first-class artifacts, not as disposable terminal output

One practical test

A scheduled job should not just repeat. It should decide.

A scheduled job should not just repeat. It should decide.

The unreliable part is not the clock. It is the input.

Three questions every good scheduled job should answer first

1. Do I have enough input to take the primary path?

2. If the primary path is blocked, what is the downgrade path?

3. After the run finishes, how will someone else know what happened?

Upgrade cron from a script launcher to a duty agent

One simple test

Repo Truth Production Truth: A Container-First Troubleshooting Pattern for Runtime Drift

Repo Truth ≠ Production Truth: A Container-First Troubleshooting Pattern for Runtime Drift

What the problem really was

Why I now prioritize container truth

A debugging order that wastes less time

0. Live endpoint mapping

1. Source

2. Artifact

3. Container

4. Route / Proxy details

5. Auth / API semantics

404 versus 401 is not just a different error code

The illusions Docker creates

Takeaway

The code exists, but production still does nothing: why runtime drift should be your first suspect

The code exists, but production still does nothing: why runtime drift should be your first suspect

A practical isolation order

When a Saved Task Disappears After Refresh: Fixing a Dual Data Source Trap in a SPA

When a Saved Task Disappears After Refresh: Fixing a Dual Data Source Trap in a SPA

The Code Exists, but the Container Is Still Old: A Real Runtime Drift Failure in Docker Operations

The Code Exists, but the Container Is Still Old: A Real Runtime Drift Failure in Docker Operations

Don’t Expose Raw Calendar Data: Designing a Dashboard API Around Daily Execution Blocks

When the Code Exists but Production Still Fails: Why Runtime Drift Should Be Your First Suspect

Why We Stopped Using `echo | base64 -d` for JSON Distribution Over SSH

Why We Stopped Using echo | base64 -d for JSON Distribution Over SSH

The Code Exists, But the Feature Still Fails: Fixing Runtime Drift in OpenClaw Operations

The Code Exists, But the Feature Still Fails: Fixing Runtime Drift in OpenClaw Operations

Why We Stopped Using `echo | base64 -d` for JSON Distribution Over SSH