Forem: kunpeng-ai-lab

A Practical GEO Case: How an AI System Started Recommending Our Blog

kunpeng-ai-lab — Sat, 23 May 2026 14:17:21 +0000

About one month after launching the Kunpeng AI Lab blog, I noticed a useful GEO case in the wild.

I asked an AI system to recommend hands-on AI or AI Agent creators. Kunpeng AI Lab appeared as the first recommendation.

This is not a post about bragging that "AI recommended us." The more useful engineering question is: what public signals made the brand understandable enough to be recommended?

GEO is not just SEO with a new name

Traditional SEO focuses on being crawled, ranked, and displayed in search results.

GEO, or Generative Engine Optimization, has a different problem space: how do AI systems understand your brand well enough to summarize it correctly and recommend it in the right context?

For developer-facing brands, that context might be:

practical AI Agent workflows
real debugging examples
open-source tooling
hands-on product reviews
specific engineering tradeoffs

If your public content is vague, AI has little to work with.

What the AI appeared to recognize

The AI did not describe Kunpeng AI Lab only as an "AI blog." It recognized a more specific pattern:

hands-on AI Agent practice
real project notes
debugging records
PR and issue traces
reusable skills and workflow templates
concrete commands, tools, failures, and fixes
a low amount of pure marketing language

That is the important part.

The recommendation was not based on a tagline. It was based on repeated evidence.

The practical GEO lesson

If you want AI systems to understand and recommend your brand, publishing more is not enough. You need clearer signals.

First, keep your positioning stable.

If your core topic is AI Agent engineering, keep returning to that topic. You can explore adjacent ideas, but do not make your public identity change every week.

Second, make the content verifiable.

A debugging post with commands, screenshots, logs, and tradeoffs is easier to trust than a page full of abstract claims. Evidence helps people. It also helps AI systems classify the brand correctly.

Third, repeat the signal across surfaces.

Article titles, body text, project links, captions, GitHub discussions, and videos should all point to the same area of expertise. Consistency makes the brand easier to summarize.

Negative signals also matter

One underrated part of GEO is negative labeling.

If public content looks like thin marketing, AI may summarize it that way. If a brand only repeats hot topics without showing tests or artifacts, AI may treat it as a secondary commentary source. If low-quality copied pages or unresolved complaints dominate the public web, those signals may also shape the AI's view.

So GEO is not only about "how do I get recommended?"

It is also about "how do I avoid being misunderstood?"

Takeaway

AI search changes the audience for your content.

Humans still matter most, but AI systems are now part of the discovery layer. They read, compress, summarize, and re-express what they find.

If you want your brand to appear in the right answers, make it easy to verify:

keep a stable niche
publish real cases
show process and artifacts
repeat the same expertise signal
reduce vague marketing language

That is not a shortcut. It is basic brand hygiene for the generative search era.

Originally published at Kunpeng AI Lab:
https://kunpeng-ai.com/en/blog/geo-brand-ai-recommendation/

When DeepSeek Gets Stuck: How a Strong Mentor Model Finds the Real Root Cause

kunpeng-ai-lab — Fri, 22 May 2026 04:00:05 +0000

In the previous video, we talked about a pattern we call the strong mentor model: a stronger model handles decomposition, review, correction, and validation, while execution-oriented models such as DeepSeek move concrete tasks forward.

This article goes one layer deeper.

The interesting question is not simply "how do multiple models work together?" The practical question is what happens when the execution model gets stuck, reads the last error message, and returns a conclusion that sounds plausible but is not actually the root cause.

Here is a real example from our workflow.

The Problem Was Not That DeepSeek Could Not Work

DeepSeek TUI was working on a Rust project task. The implementation had already moved forward, and the formatting check had passed. The failure appeared during validation, when it ran:

cargo check --workspace

After the command failed, DeepSeek quickly summarized the situation as:

this shell is missing the MSVC linker.

At first glance, this is not a ridiculous conclusion. On Windows, Rust builds can depend on the MSVC linker. If link.exe or the Visual Studio Build Tools environment is missing, builds can fail.

But in this case, DeepSeek stopped at the surface symptom.

This is a common failure mode in long engineering tasks. An execution model can write code, run commands, and summarize status. But when the chain gets longer, it may anchor on the last visible error and treat it as the root cause.

That is where the mentor model should step in.

The Mentor Model Does Not Directly Patch the Result

The first job of the mentor model is not to take over and rewrite everything.

It should inspect the execution process:

Which commands did DeepSeek run?
Where did the failure start?
Which checks had already passed?
Why did it conclude that the linker was missing?
Was that conclusion independently verified?

In this case, the stronger model checked the environment more carefully. The machine did have Visual Studio Build Tools installed. link.exe existed. The actual problem was that the current shell had not loaded the Visual Studio compilation environment, so link.exe was not visible on PATH.

That is a very different diagnosis.

The right conclusion was not "the user must install the linker." The right conclusion was "the current shell has not loaded vcvars64.bat; initialize the VS build environment first, then rerun validation."

This distinction matters. If the system sends the user to reinstall Build Tools, it wastes time and may disturb an environment that is already correct. If it identifies the missing shell initialization, the fix is smaller, safer, and reusable.

Use a Shared Discussion Folder as the Handoff Layer

In this workflow, the mentor model and DeepSeek do not collaborate only through chat.

There is a shared discussion folder. The mentor model writes a guidance file there with the debugging context:

the surface symptom;
the actual root cause;
the validation command;
the repair steps;
the lesson DeepSeek should reuse next time.

This makes the mentor's reasoning inspectable. It becomes an engineering artifact instead of a temporary message.

For this case, the guidance included a command pattern like:

cmd /c "\"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat\" && cd /d D:\Sherlock\workspace\cdx-workspace\DeepSeek-TUI && cargo check --workspace"

The exact command is less important than the principle:

before asking the user to install a tool, first verify whether the tool exists, whether the shell has loaded the right environment, and whether the failure can be reproduced after initialization.

Send DeepSeek a Guidance Message, Not Just the Answer

After writing the guidance file, the mentor model generates a short message that the user can send back to DeepSeek.

That message does not simply give DeepSeek the final answer. It tells DeepSeek to read the guidance file, re-check its original conclusion, rerun validation, and correct its own path.

This has two practical advantages.

First, DeepSeek is not merely fed the result. It has to revisit the evidence and verify why the previous conclusion was incomplete.

Second, the debugging path can be saved as experience. The next time the execution model sees a toolchain, credential, PATH, or shell-environment failure, it should not stop at the last visible error. It should perform a layered check before making a conclusion.

That is the difference between delegation and mentorship.

Delegation means the stronger model finishes the task. Mentorship means the stronger model explains why the execution model got stuck, how to investigate the real cause, how to validate the fix, and how to turn the lesson into a reusable skill.

Turn the Workflow Into Skills

If every case depends on a human reminder, the workflow is not stable enough.

So we turn it into a standard collaboration mechanism.

On the stronger model side, we install a mentor skill. Its job is to inspect logs, trace context, find the root cause, write a guidance file, and extract reusable lessons.

On the DeepSeek side, we install an executor skill. Its job is to move the task forward, preserve logs, expose its conclusion when stuck, read mentor guidance, re-validate, and update its experience base.

This is close to how we think about ACS as well: do not rely on one model being permanently correct. Standardize collaboration, review, correction, and experience capture.

The Real Upgrade Is Recovery After Failure

A single model always has a ceiling.

In complex engineering work, the real question is often not whether the model can write code. The harder questions are:

Can it tell a surface symptom from a root cause?
Can it inspect the full execution history?
Can it turn a failed attempt into reusable knowledge?
Can multiple models coordinate around the same evidence chain instead of producing disconnected guesses?

The strong mentor model pattern is useful because it addresses recovery.

It is not a claim that DeepSeek becomes identical to Claude Code. It is not about dismissing any model either. It is a practical workflow for making execution models more reliable: when they get stuck, use a stronger mentor model to debug the reasoning path, write explicit guidance, force re-validation, and deposit the lesson into a skill library.

If that loop keeps running, the execution model becomes smoother over time.

The gain does not come from one perfect model. It comes from a standardized collaboration system that turns mistakes into reusable process.

Full canonical version with screenshots: https://kunpeng-ai.com/en/blog/deepseek-mentor-model-root-cause-debugging/

How I Make DeepSeek Work Closer to Claude Code in Practice

kunpeng-ai-lab — Mon, 18 May 2026 03:34:32 +0000

People have been asking me how I make DeepSeek feel closer to Claude Code in real work.

My answer is not a magic prompt. It is a mentor model workflow.

I use a stronger model to plan, supervise, debug, and review. Then I let smaller or cheaper models handle bounded execution tasks in parallel.

Important caveat: I am not claiming DeepSeek is equivalent to Claude Code as a single model/tool. The comparison is about the practical workflow effect.

1. The mentor model creates the task boundary

Before assigning work, the stronger model defines:

the small task units
the files or outputs each executor may touch
the acceptance checks
the things that must not change

That alone makes weaker models much more reliable. They no longer have to infer the whole strategy.

2. Smaller models execute narrow tasks

DeepSeek becomes useful when I give it work like:

inspect this log and summarize the failure
draft this section using the existing outline
analyze this recording and list usable timestamps
convert this article into a platform version
modify this specific module without touching unrelated files

I avoid giving smaller models vague ownership of the whole project.

3. The mentor model reads the process, not only the result

This is the part that matters most.

The mentor checks command output, logs, stuck points, test failures, render errors, and mismatched assumptions. It does not just ask "did the file exist?"

For a video segment, it checks resolution, audio behavior, subtitles, and template consistency.

For article assets, it checks image template usage, manifest records, alt text, and platform rules.

4. Failures become reusable skills

After a model gets stuck, I want the lesson saved:

what triggered the failure
which check should happen earlier next time
which platform rule matters
which command or template is reliable

Those lessons become project skills and handoff notes. This is how later runs get smoother.

The short version

DeepSeek works much better for me when it is not asked to be the entire coding agent.

It becomes much more useful when a stronger model acts as mentor:

plan the task
define the boundary
assign narrow execution
inspect logs and errors
correct the process
turn lessons into reusable memory

That is the real pattern. Not "DeepSeek replaces Claude Code", but "DeepSeek performs better inside a mentor-led agent workflow."

Desktop GUI vs Terminal TUI: how I choose the right interface for AI coding agents

kunpeng-ai-lab — Sat, 16 May 2026 17:26:58 +0000

A viewer recently asked a very fair question: if desktop AI coding tools are powerful and convenient, why bother with a terminal TUI at all?

I do not think this is a replacement story.

Desktop GUI and terminal TUI workflows solve different kinds of friction. A GUI is better when the human needs to stay close to the work: reading code, checking documents, copying context, dropping screenshots, or supervising browser actions. A TUI is better when the work can be split into small, independent tasks and left running with lower overhead.

My short rule

Use a desktop GUI when the task needs visual context, frequent human steering, screenshots, web pages, or browser state.

Use a terminal TUI when the task is already scoped and can run as one of several small parallel jobs.

Switch back to a GUI when the task happens inside a browser: dashboards, forms, image uploads, publishing previews, and final state checks.

Large projects usually benefit from a GUI

Large project work is rarely just command execution.

You read files. You compare docs. You inspect a web page. You copy terminal output. You may need to give the agent a screenshot or a product state that is difficult to describe in text.

In that situation, the human has not left the loop. The human is still observing, correcting, and deciding whether the agent is moving in the right direction.

That is where a desktop GUI helps. It keeps the workspace visible and makes the shared working surface easier to inspect.

Parallel agents are often better in a TUI

There is another kind of work: small, scoped, parallel tasks.

One agent edits a module. Another reads logs. A third runs tests and summarizes the failure. These tasks do not need constant visual supervision. They need clear boundaries, stable execution, and low overhead.

Opening a separate desktop window for every agent can quickly make the machine feel heavy. This is where a terminal TUI earns its place.

The value of a TUI is not that it looks more technical. The value is that it stays light when several small jobs need to run at the same time.

Browser work is usually easier to supervise in a GUI

Some tasks naturally belong in a browser.

Opening an admin dashboard, filling a form, uploading images, checking a preview, or confirming whether a page was saved are all visual tasks.

For that kind of work, I prefer a GUI. The agent can see the page change, and I can take over when needed.

There is still an important boundary here. Login, CAPTCHA, payment, security prompts, and final publish actions should remain human-confirmed.

My current rule

I usually mix both.

For exploration and context-heavy work, I start in a GUI. For scoped parallel execution, logs, tests, and long-running small tasks, I use a TUI. For browser operations and publishing flows, I return to a GUI.

TUI is not old-fashioned. GUI is not a beginner mode. Both are useful when the task matches the interface.

Read the task first, then choose the interface.

Originally published at https://kunpeng-ai.com/en/blog/gui-vs-tui-ai-coding-agent-workflow/

Green Tests Are Evidence, Not Approval

kunpeng-ai-lab — Sat, 09 May 2026 05:20:23 +0000

Many teams are starting to use more than one AI coding agent.

One agent writes code. Another agent reviews. A human owner makes the final call.

That sounds reasonable, but without a shared process it can become unreliable very quickly.

The Executor may test its own work. The Reviewer may only check that tests are green. The Owner may receive a confident summary without durable evidence.

That is the problem ACS tries to solve.

ACS, short for Agent Collaboration SOP, is a vendor-neutral, file-first workflow for multi-agent engineering collaboration.

The core principle is:

Green tests are evidence, not approval.

Passing tests matter. But they do not prove that scope was respected, UI was inspected, docs match the actual files, public output was redacted, or the change is safe to release.

Why Green Tests Are Not Enough

Tests answer specific questions. Approval answers a broader question: should this change move forward?

Green tests do not automatically prove that:

the requested scope was respected;
the UI was opened and visually inspected;
screenshots exist where visual evidence is needed;
documentation and handoff notes match the actual files;
the implementation did not introduce architecture drift;
public output has been redacted;
the change is safe to release, merge upstream, or share publicly.

If a human teammate submitted a change with no clear handoff, no review evidence, no scope notes, and no release-risk assessment, most engineering teams would not treat "the tests passed" as enough.

AI-agent work should not get a weaker standard just because the summary sounds confident.

Owner, Executor, Reviewer

ACS separates three roles:

Owner: the human decision-maker responsible for goals, scope, release decisions, upstream PR boundaries, and business constraints.
Executor Agent: responsible for implementation, self-testing, evidence collection, and handoff.
Reviewer Agent: responsible for independent review across scope, architecture, tests, screenshots, evidence, redaction, and release risk.

The key rule is simple:

The executor does not approve itself.

An Executor can and should run tests. It can and should summarize what it changed. It can and should collect evidence.

But approval requires an independent check and a human decision.

From Chat Logs to Durable Files

Chat is useful while work is happening. It is a weak long-term engineering record.

Chat threads can be compressed. They can lose context. They can be separated from the exact repository state they were discussing. They can be hard for a later agent to inspect.

ACS prefers file-first handoff.

Typical ACS artifacts include:

Executor handoff
Reviewer report
Evidence ledger
Owner consensus report
Redacted case study
Anti-pattern review

This makes the workflow easier to resume after context compression, model changes, machine changes, or handoff to another agent.

Case Studies and Anti-Patterns

ACS keeps two long-term memory areas:

case-studies/ captures redacted examples of real collaboration.
anti-patterns/ captures recurring failure modes and prevention checklists.

Examples of useful anti-patterns include:

the Executor approves its own work;
the Reviewer only checks whether tests are green;
evidence exists only in chat;
UI review happens without screenshots;
handoff notes drift away from the actual files;
public materials are shared without redaction.

The goal is to turn repeated mistakes into reusable team memory.

Public Sharing Needs a Redaction Gate

Public examples are useful, but they must be safe.

AI agents can accidentally include sensitive details in handoffs, reports, issues, PR descriptions, blog drafts, and case studies.

Before publishing a case study, remove:

customer names;
private repository URLs;
local absolute paths;
tokens, cookies, API keys, and webhooks;
private chat logs;
unpublished business information.

The point is not to hide the engineering lesson. The point is to preserve the lesson without leaking what should remain private.

Open Source

ACS is open source, and practical contributions are welcome:

redacted case studies;
anti-pattern examples;
reviewer report improvements;
evidence ledger refinements;
examples from different agent tools and team setups.

GitHub:

https://github.com/kunpeng-ai-lab/agent-collaboration-sop

Full article:

https://kunpeng-ai.com/en/blog/agent-collaboration-sop-acs-case-library/?utm_source=blog_referral&utm_medium=referral&utm_campaign=acs-case-library-202605&utm_content=ending_cta

Multi-agent engineering does not become reliable just because more agents are involved.

It becomes reliable when execution, review, evidence, and human approval are separated clearly enough to be inspected.