Forem: Matthew Gladding

Addressing Hallucinations and Security in Open-Source LLM Agents

Matthew Gladding — Fri, 22 May 2026 12:23:55 +0000

Large Language Model (LLM) agents offer exciting possibilities for automation - imagine a system that proactively resolves IT incidents or instantly summarizes complex legal documents. However, realizing this potential in production demands a focus on reliability, security, and factual accuracy. This post focuses on the critical issues of hallucinations and security when working with open-source LLM agents, targeted towards technical professionals.

The Hallucination Problem

LLM agents, by their nature, generate text. While impressive, this generation isn't always grounded in reality. Hallucinations - where the agent confidently states incorrect or nonsensical information - are a major concern. These can range from subtly inaccurate details to completely fabricated scenarios. Mitigating hallucinations requires a multi-faceted strategy.

Retrieval-Augmented Generation (RAG)

One of the most effective techniques is Retrieval-Augmented Generation (RAG). Instead of relying solely on the LLM's pre-trained knowledge, RAG involves retrieving relevant information from a trusted knowledge base and providing it as context to the LLM.

# Example RAG pipeline (simplified)
def get_relevant_documents(query, knowledge_base):
    # Search knowledge base for documents related to the query
    documents = search(query, knowledge_base)
    return documents

def generate_response(query, documents, llm):
    context = "\n".join(documents)
    prompt = f"Answer the question based on the following context:\n{context}\n\nQuestion: {query}"
    response = llm(prompt)
    return response

Prompt Engineering & Constraints

Carefully crafting prompts can also reduce hallucinations. Clear, specific instructions and constraints help guide the LLM towards more accurate responses. For example, explicitly requesting the agent to state its sources or to admit when it doesn't know the answer.

Security Considerations

Open-source LLM agents introduce unique security challenges. Because you're responsible for the entire stack, vulnerabilities can arise at multiple layers.

Prompt Injection Attacks

Prompt injection attacks occur when a malicious user manipulates the prompt to bypass intended safeguards or extract sensitive information. For instance, a user might craft a prompt that instructs the agent to ignore previous instructions and reveal its internal configuration.

Data Privacy & Sensitive Information

LLM agents often process sensitive data. Protecting this data requires careful consideration of data storage, access control, and encryption. Avoid storing sensitive information directly within the LLM's context window.

Dependency Management

Open-source projects rely on numerous dependencies. Regularly updating these dependencies is crucial to patch security vulnerabilities. Tools like pip-audit or dependabot can automate this process.

What we shipped on 2026-05-21

Matthew Gladding — Thu, 21 May 2026 13:01:02 +0000

The alert firing felt like a failure until we realized it was measuring our success. The Content Quality Drop alert averaged quality_score across all tasks, so rejected items dragged the average down to 69.36. When we filtered the query to only count status IN ('published','approved'), the true average jumped to 83.30. We updated the alert to ignore rejected tasks so it actually measures the quality of what ships to users.

We also fixed the video service path issue that was silently killing video regeneration. The original diagnosis focused on file deletion, but the real bug was a stale hardcoded container-path prefix. The container user changed from root to appuser, breaking the /root/.poindexter replacement, so paths passed through unchanged. We now derive the prefix dynamically to match the logic in the featured-image path.

Our Discord notifications were also broken. The discord_ops row was the only one storing the full URL instead of using secret_key_ref, so Matt's webhook rotation on the 15th resulted in "Unknown Webhook" errors. We updated the handler to resolve via secret_key_ref and relaxed the migration constraint so the URL swap could actually apply.

We closed the orphaned media assets gap. Over 442 media_assets rows for featured_image had NULL post_id because the recorder fired before the post was created. We now persist task_id at insert and back-stamp post_id after publish to link these rows properly.

Finally, we stripped all private GitHub URLs from the dev diary. The LLM was emitting links that 404'd for public readers. We updated the prompt and ran the regex fixes to replace them with a plain text pointer to the public Poindexter repo.

From here, the feed runs cleaner and the alerts reflect actual content quality rather than rejection noise.

Auto-compiled by Poindexter from today's commits and PRs.

The Expanding Role of Open-Source LLM Agents in Autonomous Workflows

Matthew Gladding — Wed, 20 May 2026 23:02:50 +0000

The autonomous workflow market is undergoing a period of rapid innovation, and a notable trend is the increasing prominence of open-source Large Language Model (LLM) agents. While proprietary models were early leaders in this space, advancements in open-source LLMs, combined with growing demands for customization and control, are reshaping adoption patterns. This shift extends beyond purely economic considerations; it represents a fundamental change in how developers and organizations approach automation. The following details the factors driving this trend, the challenges faced, and the projected landscape of autonomous workflows leveraging open-source LLM agents.

The Rise of Open-Source LLMs and Agent Frameworks

For much of the early LLM landscape, access to powerful models involved reliance on API-driven, proprietary solutions. However, the open-source community has dramatically altered this dynamic. Initiatives like those found on the Hugging Face Model Hub have fostered a proliferation of capable open-source LLMs, increasingly challenging the performance of their closed-source counterparts. The LLM Leaderboard offers a continually updated resource for comparing these models based on various benchmarks.

Recent progress in open-source LLMs is particularly noticeable in coding tasks. Analysis indicates the performance gap between open and closed-source models in this domain is narrowing, and while challenges remain, particularly in complex reasoning, the gap is becoming less pronounced. This capability is vital for constructing complex autonomous workflows that require code generation, scripting, or interaction with software systems. Beyond the base models themselves, the emergence of agent frameworks built around these LLMs is accelerating development. These frameworks provide tools and abstractions to simplify the creation of agents capable of planning, reasoning, and acting autonomously.

Desktop-First Agents and the Benefits of Local Execution

A key driver behind the adoption of open-source LLM agents is the move towards local, desktop-first execution. Traditionally, running LLM-powered applications required constant connectivity to cloud-based APIs, introducing latency and potential privacy concerns. Industry observers note a significant shift towards deploying agents directly on user devices.

This "local-first" approach offers a range of benefits. Reducing dependence on external services lowers operational costs and improves responsiveness. It also enhances data privacy, as sensitive information remains within the user's control. Furthermore, local execution enables operation in environments with limited or no internet access, expanding the potential applications of autonomous workflows. Tools and techniques for safely running these agents on personal machines are becoming more mature, making local deployment increasingly accessible. Security best practices, such as sandboxing and resource limitations, are crucial considerations for responsible local deployment.

Practical Examples of Local Agent Applications:

Personal Knowledge Management: A locally run agent can process and summarize personal notes, documents, and emails, providing quick access to relevant information.
Offline Code Generation: Developers can utilize a local agent to generate code snippets or complete functions even without an internet connection.
Automated Report Generation: Agents can process local data sources to generate customized reports on demand, without relying on cloud services.

The Appeal of Vertical AI Agents and Customization

The growing popularity of vertical LLM agents is further fueling the open-source trend. These agents are specifically designed for a particular task or industry, offering a focused and optimized solution. As LLMs become more sophisticated, the potential for specialized vertical AI agents is becoming increasingly apparent [https://www.youtube.com/watch?v=eBVi_sLaYsc]. Open-source models are particularly well-suited for this application because developers can fine-tune and customize them to meet specific vertical needs.

While proprietary LLMs are also customizable through techniques like prompt engineering and fine-tuning, open-source models offer a greater degree of control and flexibility. The capacity to tailor an open-source agent to a specific use case is a compelling value proposition. This customization goes beyond simply fine-tuning the model; it also involves integrating the agent with specific data sources, tools, and APIs relevant to the target vertical.

Addressing Challenges: Hallucinations, Observability, and Security

While open-source LLM agents offer many advantages, they also present challenges. One persistent issue is the potential for "hallucinations"--instances where the model generates inaccurate or nonsensical information. Autonomous agents acting on these hallucinations can lead to errors or unintended consequences. However, techniques for mitigating these risks are emerging. Solutions such as hallucination watermarking and edge-based filtering, designed to detect and prevent destructive AI actions, are gaining traction.

Runtime observability is another critical area of focus. Understanding how an agent is operating and identifying potential issues requires robust monitoring and debugging tools. Open-source projects increasingly emphasize observability layers, providing developers with the insights needed to build and maintain complex autonomous systems. Security is paramount, given that autonomous agents can take actions on behalf of users. Robust authentication, authorization, and input validation mechanisms are crucial to prevent malicious actors from exploiting vulnerabilities in the agent or the underlying LLM.

Memory and Context Management for Complex Workflows

The ability of an agent to maintain context over extended interactions is critical for complex workflows. Early LLM agents were limited by a relatively small context window, hindering their ability to process lengthy documents or engage in multi-turn conversations. However, innovative techniques are emerging to overcome this limitation. Methods for augmenting an agent's memory with external knowledge sources are becoming increasingly sophisticated, as discussed in Breaking the Memory Wall: How to Give Any Open-Source Agent Claude-Level Recall. Retrieval-Augmented Generation (RAG) is a particularly promising approach, allowing agents to access and utilize vast amounts of information stored in external databases or knowledge graphs. This capability significantly expands the scope of tasks that autonomous agents can handle.

Techniques for Improving Memory and Context:

Vector Databases: Storing embeddings of documents or knowledge snippets in vector databases allows for efficient semantic search and retrieval.
Long-Term Memory Systems: Implementing mechanisms to store and retrieve relevant information from past interactions, enabling agents to maintain context over extended periods.
Knowledge Graphs: Representing information as a network of entities and relationships provides a structured way for agents to reason and infer new knowledge.

The Future Landscape of Open-Source LLM Agents

Looking ahead to 2026 and beyond, the trend towards open-source LLM agents is expected to continue its upward trajectory. The combination of improved model performance, local execution capabilities, and a growing ecosystem of tools and frameworks is creating a powerful alternative to proprietary solutions. Several factors suggest this momentum will persist. The cost of accessing proprietary LLM APIs can be substantial, particularly for high-volume applications. Open-source models offer greater long-term cost predictability, requiring investment in infrastructure and expertise, but potentially reducing overall operational expenses.

Moreover, the open-source community fosters collaboration and innovation, accelerating the pace of development. The ability to fully customize and control the underlying model is another key driver. Businesses can tailor open-source agents to their specific needs without being constrained by the limitations of a closed-source platform. This level of flexibility is particularly valuable for organizations operating in regulated industries or handling sensitive data.

As a result, industry observers anticipate that open-source LLM agents will become increasingly prevalent in a wide range of autonomous workflow applications, from customer service and data analysis to software development and scientific research. The democratization of AI through open-source LLM agents empowers a broader range of developers and organizations to harness the power of autonomous workflows, fostering innovation and driving economic growth. However, realizing this democratization will require addressing the initial investment in infrastructure and specialized expertise, but the long-term benefits of customization and control are significant. This shift promises a future where automation is more accessible, customizable, and aligned with the needs of individuals and businesses alike.

Sources

What we shipped on 2026-05-20

Matthew Gladding — Wed, 20 May 2026 18:49:58 +0000

Every publish with tags had been silently 500'ing since the posts.tag_ids → post_tags split. We caught this when investigating alert #293 on post dcd86ea6-9d8e-4841-9543-a46a55d96283. The create_post insert was using unnest($2::text[]) on a uuid column, causing a Postgres error that rolled back the transaction mid-flight. We fixed the cast to uuid[] and added a regression test (PR #509).

That database error exposed a drift in the reconciliation job. It was still filtering by slug prefix, a pattern from a previous refactor that it missed. We updated media_reconciliation.run to check cardinality(media_to_generate) > 0 first, so it stops regenerating media for posts that explicitly opted out (PR #508).

The initial publish path also needed fixing. PR #510 notes that publish_post_from_task ignored the media_to_generate array, spawning podcasts for posts with empty arrays while fire_post_distribution_hooks gated correctly. We split the logic into per-type checks to match the hooks (PR #510).

We also cleaned house on the test suite. PR #511 removed a stale xfail marker from test_threshold_triggers_summary_with_llm_payload because the dispatcher payload had already been restored to emit summary_request and annotations fields. The test now passes cleanly (PR #511).

On the release front, the CHANGELOG was leaking private keys into the public mirror, causing release-please to wedge. We added a line-redaction pass in scripts/sync-to-github.sh to strip out mercury_ and nightrider patterns before the leak guard ran (PR #506). v0.10.1 shipped today.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

Beyond the Hustle: A Technical Professional's Guide to Recognizing Burnout

Matthew Gladding — Wed, 20 May 2026 12:40:06 +0000

The image of the dedicated developer is... well, a lot of images. Often it's romanticized - hunched over a glowing screen, fueled by caffeine, relentlessly coding. Instagram might show late nights and "grind culture," but that's not the real experience of burnout. For those of us deep in the technical trenches, it manifests very differently.

It's not a sudden crash. It's a slow erosion, starting with the things that used to energize you. As a software developer building an AI/ML-focused business, with a workstation capable of handling demanding workloads (Ryzen 9 9950X3D + RTX 5090 - ), I initially thought I was built for long hours. I was wrong. The issue wasn't simply the amount of time, but the increasing cognitive demands of the work itself.

It starts with technical debt. Not just knowing you have it, but the feeling that addressing it is increasingly difficult. It's the creeping realization that every new feature adds another layer of complexity to a fragile system. You start with a simple script, a quick fix, a "works on my machine" moment... and then it multiplies. That initial convenience turns into a web of dependencies you dread touching. It's not about the amount of work, it's the feeling that the work isn't moving you forward.

Then comes the operational weight. We're building increasingly complex systems. The expectation isn't just "getting something working" it's building something that lasts. This means dealing with CI/CD pipelines, database migrations without downtime, and the constant pressure to scale. You're not just coding, you're building and maintaining the infrastructure around that code, and increasingly finding yourself reacting to issues as they arise.

The AI landscape adds another layer. We're entering a new era of AI orchestration. The promise of AI-powered tools is immense, but it also creates a pressure to adopt everything. The illusion that simply having AI solves problems is pervasive. You become a manager of AI agents as much as a developer, and that adds significant cognitive load.

Here's what it feels like, specifically:

Constant context switching: Jumping between addressing urgent production issues, designing new features, and evaluating the latest AI tools.
Sleep disturbances: Light, restless sleep, and multiple other entries in the same dream log. You're mentally replaying code reviews and potential outages instead of resting, which then leads to...
Difficulty concentrating: ...a fuzzy brain and inability to focus, even when you do sit down to code.
Loss of curiosity: The things that used to excite you about technology - learning a new language, exploring a new framework - feel draining.
Decreased code quality: You start taking shortcuts, prioritizing speed over maintainability, often due to feeling overwhelmed by the workload.
A sense of futility: The feeling that, no matter how much you do, you're not making a meaningful impact. Like you're building in a vacuum.

Burnout isn't a badge of honor. It's a signal that something is fundamentally broken. It's time to reassess, prioritize, and remember why you started building in the first place. It's time to move from just doing the work, to building a system that supports you while you do it.

What we shipped on 2026-05-19

Matthew Gladding — Tue, 19 May 2026 13:01:18 +0000

We discovered our kill switch for the HTTP probe was failing open, not closed, on DB uncertainty. The alert_events showed ten identical warnings in a 24-hour window even though the setting was disabled five days ago. The _read_bool helper was treating missing rows and decrypt failures as "false" when it should have treated them as unknown. We added a unique sentinel for the default to surface that uncertainty and opted into fail_closed=True so the probe stays disabled if we can't read the setting.

While triaging that, we found the every-four-hour backfill jobs were treating dev_diary posts like regular content, accumulating 10 podcasts and 8 videos. We initially tried filtering on media_to_generate, but every post in the schema has an empty array there, so that filter would have killed all media generation. The slug check -- slug NOT LIKE 'what-we-shipped%' -- excludes dev_diary without touching the rest of the pipeline, so PR #481 stops dev_diary from accumulating podcasts and videos.

We also updated the surface. PR #473 repointed comments in the CLI and brain modules to 0000_baseline (files folded into that migration on 2026-05-08), and PR #474 regenerated the app-settings documentation. The RSS feeds finally passed the Spotify validation check: we added the missing <itunes:image> from a podcast_cover_url app_settings row, fixed the <atom:link rel="self"> to the public route, and populated the owner details. R2 re-uploaded the feed with these corrections, and the worker restarted to load the new code.

The sanitize-html bump to 2.17.4 closes a security gap with the xmp tag, and the ollama_client resilience suite picked up nine new tests for stream generation and edge cases. We have six items left from the audit, but the noisy defaults and the broken feeds are fixed.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

What we shipped on 2026-05-18

Matthew Gladding — Mon, 18 May 2026 22:05:44 +0000

We spent the day reinforcing the reliability of our build and test suite, prioritizing stability over new features. The most concrete win came from PR #471, where we expanded coverage of _telegram_creds to 13 tests and patched a pre-existing import-order bug where fixtures weren't loading correctly. The new tests explicitly guard against partial cache poisoning, ensure empty strings and None are treated as identical, and validate the specific min=1, max=1, timeout=2.0 configuration for connection pools. Behind the scenes, this defensive testing was paired with a wave of dependency churn to keep the stack modern. We bumped prefect to 3.7.1 (PR #469), updated python-multipart to 0.0.29 (PR #470), and upgraded datasets to 4.8.5 to fix decoding issues (PR #443). On the infrastructure front, we updated the CI runner tools, moving actions/checkout to v6 (PR #464) and actions/setup-python to v6.2 (PR #466), ensuring our pipelines have the latest security patches and Node compatibility. It was a quiet day of tightening the screws--upgrading Starlette to its first stable 1.0.0 release (PR #448) and cleaning up linting rules--but it feels like the right kind of work for a platform that needs to scale without breaking.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

What we shipped on 2026-05-17

Matthew Gladding — Sun, 17 May 2026 23:33:12 +0000

The biggest lift today was fixing the sequential choke point in the scene visuals stage. We introduced bounded concurrency via asyncio.Semaphore in PR #456 so SDXL wouldn't oversubscribe the GPU, which was killing wall-clock time on long-form content. We added a new app_settings key video_scene_visuals_max_concurrent with a default of 1 to preserve the existing safety behavior, but now operators with VRAM headroom can push the cap and resolve scenes in parallel.

The implementation includes a fresh migration to seed the knob so a bare DB documents the change immediately. We also added a metadata.elapsed_s field to capture per-scene wall-clock data and a new video.scene_visual_resolved audit_log row to give us the timing metrics needed to decide if the cap is actually worth bumping. Misconfigured values for 0 or -1 clamp to 1, so a typo won't deadlock the stage.

On the operational side, we tackled the decaying reference page that has 700+ rows. With PR #453, we turned the manual audit script into a nightly CI job. The scripts/regen-app-settings-doc.py now honors REGEN_DATE_OVERRIDE to pin the banner stamp to the commit date, ensuring the output is byte-identical to the source state and preventing spurious PRs.

We also backfilled ten edge-case tests for the LiteLLMProvider class in PR #455 to close the gap in model namespacing and response normalization coverage. This sits alongside a raft of dependency bumps for lint-staged, Playwright, and anchore/scan-action to keep the CI pipeline modern.

The parallelism win unlocks faster content generation without risking out-of-memory errors, while the automated docs win means the reference page stays synchronized with the codebase.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

What we shipped -- 2026-05-15

Matthew Gladding — Sat, 16 May 2026 04:28:04 +0000

We spent today closing the loop on a race condition that was silently killing the first turn of voice conversations. PR #436 introduces a fallback where ClaudeCodeBridgeLLMService catches a specific Session ID <uuid> is already in use error during the very first user interaction and retries the spawn with --resume flags. We debated the approach between retrying versus deferring. Deferring UUID generation would have only solved the 2026-05-08 specific scenario. By detecting the already in use stderr on the first turn and transparently resuming against the existing JSONL on disk, we cover a broader class of races--preflights, healthchecks, and restart loops. The implementation is guarded by self._first_turn so we only attempt this one-shot retry once, ensuring a real downstream regression on the resume path still raises a proper error. It's a handful of lines, one --resume flag, and a WARNING log entry in Loki to keep the recovery path visible. While the voice agent stabilizes, we expanded the test net in test_topic_queue_cap.py from 5 to 18 tests, directly exercising the helpers in services/topic_proposal_service.py. This means future refactors of the queue logic won't silently break our contract for pending_topic_count or resolve_max_pending. The broader ecosystem needed attention too. We patched backup-visibility bind mounts and addressed schema/dependency bugs surfaced by the post-audit health check, ensuring our infra remains robust enough to support the agent's edge cases. From here, the voice agent handles the collision recovery gracefully, so we can focus on what comes next. We are still not in love with the QA threshold tuning, but at least we have data now.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

https://github.com/Glad-Labs/glad-labs-stack/pull/436

What we shipped -- 2026-05-14

Matthew Gladding — Fri, 15 May 2026 16:28:03 +0000

Today the cofounder-OS thesis stopped being a hypothesis. Module v1, FinanceModule, Mercury balance flowing into Postgres--all shipped in one day because we deferred every refactor we hadn't earned yet. The 'lite' approach kept the work cheap and ended with two real bugs caught and killed in passing.

We hooked up the per-module migrations and module_schema_migrations table for boot, then deployed the FinanceModule F2 schema and hourly polling job. Route auto-discovery stitched the ContentModule skeleton into the stack. PR #433 forced the writer pipeline through the dispatcher so LiteLLM could finally emit Langfuse spans.

Before shipping, we killed a test suite fluke where a shared httpx client leaked between tests, breaking auth on the revalidation service. After resetting that isolation, we pinned 13 contract edges on traced_method to prevent silent regressions. We also deployed a banned_transition_opener validator rule to catch overused stock phrases.

Shipping this way is about finding the path without the bloat. From here, the architect composes graphs against the live module registry instead of hand-coded factories.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

https://github.com/Glad-Labs/glad-labs-stack/pull/433

What we shipped -- 2026-05-13

Matthew Gladding — Thu, 14 May 2026 04:02:01 +0000

The brain_daemon PSU watchdog was silent excepts. PR #428. It wasn't enough to fix the logic; we had to make the system report when it breaks. The cost dashboard no longer shows static 150W during a PSU outage because the exporter-fetch failure is now logged. We carried this philosophy through the rest of the #455 batch, hunting down the "silent except" traps across the entire stack.

In tap_runner and retention_runner, malformed JSONB rows in config and metadata stopped silently falling through to raw strings. PR #427. We did the same for social_poster metric increments and validator_config bootstrap imports, so Prometheus errors and DSN resolution failures surface in traces. PR #426.

The content pipeline stages dropped issues when schemas shifted, so the rewriter looped endlessly on the same draft. PR #425. jobs/check_memory_staleness and media reconciliation jobs stopped passing without a breadcrumb, and task_executor stopped hiding model selection failures and timeout errors. PR #423 and PR #418. The rails--deepeval, ragas_eval, self_consistency--now log why they're disabled when is_enabled() fails, and RAGAS notifies us when it can't find a judge model. PR #421 and PR #420.

Performance-wise, we stopped creating new httpx.AsyncClient instances for every URL check and revalidation burst. PR #424 and PR #419. The GPU scheduler and URL validator share one client, closing it cleanly on shutdown to avoid leaking TCP pools. PR #417.

We traded code volume for data density. The operator finally sees which jobs are dormant and why the cost dashboard is drifting. We still don't love the amount of glue code needed to make this loud, but at least we're not guessing anymore.

Auto-compiled by Poindexter from today's commits and PRs.

Sources

What we shipped -- 2026-05-12

Matthew Gladding — Tue, 12 May 2026 19:30:48 +0000

We shipped Phase-0 Prefect orchestration last week, but today we discovered the post-pipeline actions were silently skipping execution in the new system. The 130-line success block in task_executor ran exclusively in the legacy orchestrator, leaving us with four silent regressions: the task.completed webhook never fired, auto-curation never triggered for low-quality scores, and auto-publish never shipped trusted niche content without manual intervention PR #371.

The silence of the system is dangerous, which is why the security audit on 2026-05-12 demanded an immediate response. We found three live secrets--discord_ops_webhook_url, indexnow_key, and langfuse_public_key'--shipped to the public mirror via a baseline migration. We cleared those values and gated Dynamic Client Registration and/voice/join` behind authentication flags to prevent anyone from minting admin clients or joining Matt's voice room without a token PR #375.

Beyond the hotfixes, the operational reliability required a deep dive into our static exports. R2, the source of truth for the homepage, was lagging the database by three days because publish_post_from_task fired an export_post task that died on process boundaries. We replaced the fire-and-forget pattern with a synchronous await export_post call and added a 15-minute reconciliation watchdog to catch drift between the DB and the static manifest PR #374.

We also finished the schema consolidation cleanup. The redundant infrastructure/local-db/init.sql file was deleted, and the test suite was redirected to replay the baseline schema instead of the orphaned init script. This removes the technical debt of two IF NOT EXISTS schema definitions and ensures the migration system owns the database creation process from first boot onward PR #373.

Shipping these changes unlocks a period of stability where we can trust the orchestrator signals and the data integrity checks without chasing silent failures. The operator dashboard will now surface static export failures, and the Grafana approval panel will have working preview links once we restore the missing preview_token generation in the finalization stage PR #368.

Auto-compiled by Poindexter from today's commits and PRs.