Forem: Salt Creative

PUBLICMCP

Salt Creative — Thu, 30 Apr 2026 20:09:15 +0000

PUBLICMCP lets any business expose structured identity data that Claude, GPT, Gemini, and any MCP-compatible agent can query directly — no scraping, no hallucinating, no dead ends. https://publicmcp.org

Salt Creative Launches SEO Intelligence: An AI Platform That Thinks, Remembers, and Grows — So Small Businesses Don't Have To

Salt Creative — Mon, 20 Apr 2026 17:33:04 +0000

FOR IMMEDIATE RELEASE

Contact:
Salt Creative
hello@sltcreative.com
www.sltcreative.com/seo-intelligence

BOISE, Idaho — April 20, 2026 — Salt Creative, a web design and digital marketing agency serving small businesses across the United States, today announced the commercial launch of SEO Intelligence — a first-of-its-kind AI platform that operates autonomously on a client's website, learns continuously from real search data, and improves its own performance over time without human instruction.

Unlike traditional SEO services, which deliver monthly recommendations that require a business owner to act on them, SEO Intelligence is a running system. It connects directly to a client's Google Search Console account, stores performance data in a private warehouse, and uses that data to identify which pages are underperforming, why, and what to do about it — then does it. Every change is scored before it ships and scored again after it indexes. Wins are kept. Misses are rolled back. Nothing is left to guesswork, and nothing requires the client to understand how any of it works.

"What the market was missing was a tool that manages AI memory, AI agents, and multiple language models working together — one that grows and adapts over time to accomplish real SEO goals for small businesses," said Joe Provence, co-founder of Salt Creative. "Most business owners don't have time to become SEO experts. We built something that removes that requirement entirely. The system carries the memory of every experiment ever run on a client's site, and it uses that memory to make better decisions going forward — autonomously."

That persistent memory is one of SEO Intelligence's most significant technical achievements. The platform maintains a complete record of every action taken, every result measured, and every pattern observed across a client's entire web presence. Over time, the system doesn't just repeat tasks — it refines them, informed by everything it has learned since the first day it was deployed.

The platform currently operates with more than twenty specialized tools — each built to handle a distinct aspect of search performance analysis, content evaluation, or technical site health — and that number continues to grow as Salt Creative refines its understanding of what small business SEO requires at scale.

Equally central to the platform are its AI agents: autonomous programs that run on scheduled intervals without any human trigger. Current agents handle weekly ranking surveillance across all tracked keywords, automated experiment evaluation, and comprehensive content audits that score every important page on a client's site for quality signals. Additional agents are actively in development as Salt Creative continues to map the full scope of tasks that can be systematically removed from the small business owner's plate.

The platform also self-upgrades. As more capable AI models become publicly available, SEO Intelligence incorporates them into its workflow — meaning the system's analytical depth and output quality improve over time, at no additional complexity to the client.

For the estimated thirty million small businesses in the United States that pay for SEO without seeing transparent, measurable results, SEO Intelligence offers something the traditional agency model has never been able to deliver at scale: a system that shows its work, remembers everything, and never stops working.

Salt Creative is currently accepting new clients for SEO Intelligence enrollment. Details are available at www.sltcreative.com/seo-intelligence.

About Salt Creative
Salt Creative is a web design and digital marketing agency founded in 2018 and headquartered in Boise, Idaho. The agency specializes in building digital infrastructure for small businesses, including websites, SEO systems, and custom marketing tools. Salt Creative serves clients across the United States.

###

Adding a Free Overflow Model to Your MCP Server: Gemma via the Gemini API

Salt Creative — Sun, 12 Apr 2026 02:46:56 +0000

Most agentic workflows have a single failure mode nobody plans for: the primary LLM hits its rate limit mid-session and everything stops. You can't log a result. You can't draft the next section. The workflow is blocked until the window resets. After hitting this enough times, I started treating it as an architecture problem rather than a billing problem.

The fix turned out to be simpler than I expected.

The Insight Hidden in the Gemini Docs

While auditing our Google AI Studio integration, I noticed that Gemma — Google's open-weight model family — is served through the exact same API endpoint as Gemini. Same Python SDK, same API key, different model string. And Gemma 3 27B costs $0 per million tokens on the free tier. If you already have a Gemini API key, you already have free access to a capable open-weight model. No new credentials, no additional SDK, no separate account.

That's the whole unlock.

Registering the Tool in FastMCP

Adding query_gemma to a FastMCP server is a thin wrapper — roughly fifteen lines:

import google.generativeai as genai
from fastmcp import FastMCP

mcp = FastMCP("my-server")

@mcp.tool()
def query_gemma(prompt: str, model: str = "gemma-3-27b-it") -> str:
    """Send a prompt to Gemma. Use for generation tasks to reduce primary LLM token usage."""
    client = genai.GenerativeModel(model)
    response = client.generate_content(prompt)
    return response.text

The model parameter defaults to gemma-3-27b-it but accepts the full family:

Model	Best for
`gemma-3-1b-it`	Minimal tasks, fastest
`gemma-3-4b-it`	Classification, simple formatting
`gemma-3-12b-it`	General use
`gemma-3-27b-it`	Default — best Gemma 3 quality
`gemma-4-26b-a4b-it`	Gemma 4, efficient
`gemma-4-31b-it`	Gemma 4, highest quality

After adding the tool, reconnect your MCP connector to reload the manifest. That's the entire deployment.

The Workflow Split That Makes This Useful

The important constraint: query_gemma is text in, text out. Gemma has no access to your tool registry. It can't call other MCP tools, query your data layer, or read session state. It only knows what you explicitly pass in the prompt.

This forces a clean separation that turns out to be the right design anyway. The primary LLM handles tool calls, data retrieval, QA, and logging. Gemma handles generation-heavy tasks — drafting, summarizing, classifying, formatting. The primary LLM does less of the expensive token work. When it hits rate limits, Gemma absorbs the generation queue while the primary LLM recovers.

The split also makes each model's role legible. If something fails, you know immediately which layer to look at.

The Gap That Remains

The free tier rate limits are real. Gemma 3 models allow 5–15 requests per minute depending on model size. For interactive workflows, that's usually fine. For anything resembling batch processing, you'll hit the ceiling fast and need retry logic.

The deeper limitation is context. Gemma doesn't know what your other tools returned unless you tell it. Every query_gemma call needs to be self-contained — task description, relevant data, output format, all passed explicitly. That's more prompt engineering overhead than calling a context-aware primary LLM, and it matters for complex tasks.

What This Is and Isn't

This isn't a replacement for your primary LLM. For tasks requiring tool calls, structured reasoning over live data, or anything where the model needs to know what happened earlier in the session — you still need the primary stack.

For pure generation tasks, it works well and it's free. The practical framing: treat it as a relief valve on your token budget, not a second brain.

Build your overflow capacity the same way you build your primary stack — thin interfaces, clear contracts, explicit failure modes. A model you can swap in when the primary one is saturated is worth more than a more powerful model you can't afford to run continuously.

From Monolithic Prompts to Modular Context: A Practical Architecture for Agent Memory

Salt Creative — Fri, 10 Apr 2026 17:46:19 +0000

Most teams building on top of LLMs treat the system prompt as a static artifact — write it once, tune it occasionally, move on. That works fine for simple assistants. It breaks down the moment your agent needs to operate across multiple domains, maintain state across sessions, and actually learn from its mistakes rather than repeating them.

After running a production agentic workflow for several months, I rebuilt the memory layer from scratch. Here's what I learned.

The Problem with Monolithic Context

The original system had a single large context file loaded at the start of every session. It contained everything: infrastructure details, client rules, workflow protocols, historical session logs, SEO doctrine, tool documentation — all of it, every time.

This violates a principle that should be obvious but isn't: context is an attention budget, not a storage bin. Research on context rot (Chroma, 2024) shows that LLM recall degrades nonlinearly as context length increases. You're not just adding tokens — you're diluting attention across an increasingly noisy signal space. Every irrelevant token you load competes with every relevant one.

The other problem: a monolithic file has no mutation mechanism. It grows. It never gets smarter. Failures get logged as narrative and immediately buried under new entries. The system had no immune memory.

The Architecture: Six Files, Three Load Tiers

The redesign splits context across six files organized by load trigger — not by topic.

Tier 1 — Always loaded (~1,000 tokens total):

Core identity file: project structure, infrastructure, tool index, session rules. Session rules appear first — not buried — because of the "lost in the middle" attention gradient documented by Liu et al. (2023).
Failure pattern file: every entry is a real production failure encoded as a structured triple: Failure | Trigger | Rule. Always loaded. Consulted before tool calls.

Tier 2 — Client-scoped (loaded via explicit switch):

Client context file: domain-specific rules, approved sources, active work log, client-specific failure patterns. Never loaded during other client sessions. Zero cross-contamination.

Tier 3 — Task-scoped (loaded by session type):

Workflow file: embedded executable sequences, not passive documentation. The NLP audit loop is written as a runnable checklist, not prose.
Architecture file: decision gates written as prompts. Before any structural recommendation, the agent runs a three-step investment check — hardcoded as a trigger, not a suggestion.
Session notes: ephemeral working memory. Cleared each session. Holds active threads, decisions made, blockers.

Total always-on footprint: ~1,000 tokens. Task-scoped files add 500–700 tokens only when relevant. This is a ~60% reduction from the monolithic baseline, with higher signal density at every tier.

The Failure Pattern File: Applied RCL

The most important file in the system isn't the one with the most information — it's the one that encodes what went wrong.

Recent work on Reflective Context Learning (RCL, arxiv 2604.03189) formalizes something practitioners have been doing informally: treating context optimization as a training loop. The forward pass executes the agent. The backward pass reflects on the trace and identifies which context entry was absent or wrong. The optimizer step mutates that entry.

The failure pattern file is the mutation log. Each entry follows a strict schema:

| Failure | Trigger | Rule |
|---------|---------|------|
| Silent tool timeout | Batch API call | Single requests only — 
  no error is thrown on batch failure |
| OAuth token expiry | ~90 day intervals | Re-authenticate before 
  session; token refresh is not automatic |
| Entity misclassification | Repeated superlative phrases | 
  Rewrite entire sentence — removing one word doesn't 
  clear the pattern |

Critically, the update_context tool now accepts an optional failure parameter. When something breaks mid-session, the agent writes to both the session log and the failure pattern file simultaneously. The mutation is captured at the moment of failure — not reconstructed from memory at session end.

This is what Meta's engineering team described in their April 2026 post on tribal knowledge capture: the most valuable context isn't what the system does when it works — it's what causes it to fail silently.

The "Compass, Not Encyclopedia" Constraint

Meta's framework for context file design: 25–35 lines per file, four sections maximum — Quick Commands, Key Files, Non-Obvious Patterns, See Also. Every line earns its place or gets cut.

The instinct when building these systems is to add more. More rules, more examples, more edge cases. That instinct is wrong. A 4,000-token context file with 80% signal is worse than a 1,000-token file with 95% signal, because attention is not uniformly distributed across tokens. The model doesn't read your context file the way a human reads a document. It attends to it — and attention degrades with distance and density.

The design principle that follows: never put passive information where active instructions belong. If a rule matters, write it as a trigger. If a workflow matters, write it as a sequence. Documentation is for humans. Context is for attention.

What This Changes in Practice

Three things that work better with modular context:

First, session startup is declarative. Instead of one large file that's always partially irrelevant, the agent loads exactly what it needs. A client-specific session loads the client file. An audit session loads the workflow file. The core file stays small and stable.

Second, failures compound into capability. Every production issue that gets structured into the failure pattern file makes the next session marginally more reliable. The system gets harder to break over time without any model fine-tuning — purely through context engineering.

Third, the system is auditable. Because context is modular and versioned in git, you can trace exactly what information was available to the agent during any session. When something goes wrong, you can identify whether the missing rule existed in the failure log, and if not, add it.

The Gap That Remains

The honest limitation: Claude.ai's MCP connector, as currently implemented, loads one context file automatically. The sub-files require explicit tool calls to retrieve. This means the agent must be instructed to load its own context — it doesn't happen natively.

The workaround is a get_subcontext tool that reads any file in the context directory by name. It works, but it's a patch on a deeper architectural gap: LLM interfaces don't yet treat context as a first-class, dynamically composable resource. They treat it as a static field.

That's the next frontier. Not larger context windows — smarter context routing.

Building something similar? The patterns here — tiered load triggers, structured failure logs, embedded executable prompts — generalize beyond any specific stack. The core insight is simple: treat your context files the way a good engineer treats a codebase. Small, modular, version-controlled, and self-documenting.

I Built a Self-Updating Memory System for Claude Using a Custom MCP Server

Salt Creative — Fri, 03 Apr 2026 16:59:51 +0000

I've been running a custom MCP server connected to Claude.ai for several months as part of a proprietary SEO intelligence platform. The setup works well for structured data queries — rankings, crawl issues, keyword gaps — but session memory was the weak point.

I had a flat markdown file that stored project context: open tasks, doctrine, session history. It loaded at the start of every session via a get_context tool. The problem? Updating it required manually editing the file on the server after every session. It kept drifting.

The fix was simpler than I expected.
I added a single append-only tool called update_context to my FastMCP server. It takes one argument — a plain text summary of what happened in the session. The tool auto-injects the date and appends a dated entry to the Session History section of the context file.

python

from datetime import datetime
from pathlib import Path

@mcp.tool()
def update_context(entry: str) -> str:
    """
    Appends a dated session history entry to the context file.
    Date is auto-injected — pass only the summary text.
    Format: "[what was done]. Next: [what to monitor or do next]."
    """
    context_path = Path.home() / "project-context.md"
    today = datetime.now().strftime("%Y-%m-%d")
    formatted = f"\n**{today}:** {entry}"

    content = context_path.read_text()
    marker = "## Session History"

    if marker not in content:
        return "Error: Session History section not found."

    insert_point = content.rfind("\n**2025-")
    if insert_point == -1:
        insert_point = content.find(marker) + len(marker)
    else:
        end_of_line = content.find("\n", insert_point + 1)
        insert_point = end_of_line

    new_content = content[:insert_point] + formatted + content[insert_point:]
    context_path.write_text(new_content)

    return f"Context updated: {formatted.strip()}"

Now at the end of every Claude session I just say "log this session" and Claude calls the tool directly. No copy-pasting, no opening files, no forgetting.

The full memory stack looks like this:

get_context — loads the flat markdown file at session start (project history, open TODOs, doctrine)
update_context — appends session summary at session end
DuckDB — structured queryable data (rankings, crawl data, keyword gaps)
Claude.ai native memory — personal preferences and recurring facts

Three complementary layers, no RAG, no vector database, no additional infrastructure.

Why not RAG?
I evaluated it. For a single-project setup with well-curated context, RAG is overkill. The flat file loads instantly, costs nothing, and you control exactly what the model knows. RAG earns its place when you have hundreds of unstructured documents you need to search semantically. I'm not there yet.
The flat file + append tool beats RAG for the majority of single-project use cases.

The whole append tool is about 30 lines of Python. If you're already running a FastMCP server and maintaining a context file manually, this is a straightforward upgrade worth doing.
Happy to answer questions in the comments.