Forem: Andrew Kew

Who Owns the Code Claude Wrote? The Legal Mess No One's Talking About

Andrew Kew — Wed, 29 Apr 2026 09:53:40 +0000

Anthropic accidentally published 512,000 lines of Claude Code's source in a routine update. Before sunrise, GitHub mirrors. Before breakfast, someone had rewritten the whole thing in Python. Then came 8,000 DMCA takedowns — for code that Anthropic's own lead engineer said was predominantly written by Claude itself.

Can you issue a DMCA takedown for code copyright law may not protect? Nobody had a clean answer. That's the problem.

"If Claude Code was, by Anthropic's own lead engineer's admission, predominantly written by Claude itself, does Anthropic even own it?"

The same question applies to your codebase.

Three risks in your codebase

Three separate legal risks are colliding right now, and most engineers are only dimly aware of any of them:

Copyrightability — the US Copyright Office and DC Circuit (upheld after the Supreme Court declined to hear Thaler in March 2026) are clear: AI-generated work without meaningful human authorship is not copyrightable. Code you accepted verbatim from Claude may sit in the public domain in everything but name.
Work-for-hire — your employment contract almost certainly already assigned anything you build at work to your employer. AI-assisted or not, that doctrine applies. Worse: if your employer licenses Claude Code and you use it for a side project, a broad IP clause may reach that too.
GPL contamination — AI tools are trained on mountains of copyleft-licensed code. If the model reproduced a substantial verbatim chunk of GPL code in its output and you shipped it commercially, you may have a copyleft violation you can't see. "I didn't know" is not a defense.

Why it matters now

The legal edges are actively moving. Allen v. Perlmutter (600 detailed prompts + Photoshop edits — still unresolved) will be the closest thing yet to a ruling on how much human involvement is "enough." Doe v. GitHub in the Ninth Circuit is asking whether Copilot reproduces licensed code without attribution — it's already changed industry behavior: Copilot added duplicate detection filters; M&A due diligence now routinely includes an AI codebase license scan.

The place where unsettled law becomes concrete today isn't court — it's acquisition due diligence and fundraising, where investors are already asking these questions as a condition of closing.

What to do

1. Run a license scan. FOSSA, Snyk Open Source, or Black Duck. One afternoon, costs less than the first hour of a copyright dispute. If you're shipping a commercial product and haven't done this, you're operating on assumption.

2. Document human creative decisions as you go. "Restructured Claude's module architecture, rejected initial state management approach, rewrote error handling from scratch" is legal evidence. "Add rate limiting module" is not. Export your prompt logs from agentic sessions where you made architectural calls.

3. Read your IP clause before you build anything on the side. Search your employment contract for "intellectual property," "IP assignment," or "work product." The phrase to watch: "any software created with the assistance of company-licensed tools." If your employer licenses Claude Code, that clause may reach your weekend project.

4. Check your Anthropic plan. Consumer/Pro plans have narrower IP indemnification than API/Enterprise. If you're shipping commercially on the free tier, the gap is real.

Source: Legal Layer — Who Owns the Code Claude Wrote?

✏️ Drafted with KewBot (AI), edited and approved by Drew.

GitHub Copilot drops flat-rate billing. The token era has arrived.

Andrew Kew — Tue, 28 Apr 2026 14:36:42 +0000

Effective June 1, 2026, GitHub Copilot drops its flat-rate request model and moves to GitHub AI Credits — token-based billing where every chat message, agentic run, and CLI call draws from a shared credit pool. New sign-ups for Pro, Pro+, and Student plans are already paused as of April 20.

One AI credit = $0.01. The cost of using it depends on which model you're talking to and how many tokens flow through the session.

What actually changed

The unit of billing shifts from requests to tokens. Previously, Copilot plans sold "premium requests" — 300/mo on Pro, 1,000/mo on Enterprise. Now it's credits consumed by token volume × model price.

Code completions stay free. Inline completions and next edit suggestions are explicitly excluded from AI Credits and remain unlimited on all paid plans. This change is only about interactive features.

What does get billed: Copilot Chat, Copilot CLI, the cloud agent, Copilot Spaces, Spark, and third-party coding agents.

Included credits per plan:

Copilot Business: 1,900 credits/user/month
Copilot Enterprise: 3,900 credits/user/month
Promotional for existing customers (June–September 2026): 3,000 / 7,000 respectively

Credits are pooled, not individual buckets. 100 Business users = 190,000 shared credits. Power users draw more; light users offset it. Overages can either be allowed (charged at per-credit rates) or hard-blocked until the next billing cycle.

Why this matters

This is what the flat-rate era ending looks like.

GitHub is exposing the underlying model cost structure directly to teams — the same economics that any raw API user has always dealt with. The implication: the cost of Copilot is now a function of how your team uses it. A quick chat query is a fraction of a credit. A long agentic session on a frontier model burns significantly more.

Teams running agents, Copilot Spaces, or heavy multi-turn workflows will hit the ceiling differently than teams using Copilot mostly for completions. That difference was always there at the model layer — it's now visible on your bill.

Budget controls exist at four levels: enterprise, org, cost-center, and individual user. Setting a user budget of $0 cuts off access entirely. Hard stops are available. This is a fundamental change in how you plan and govern AI tooling spend — and it's coming fast.

What to do

On Business or Enterprise? Calculate your per-user credit allocation (1,900 or 3,900/mo) against your actual usage patterns. The promotional credits (3,000/7,000) give breathing room through September — use that window to baseline real consumption before the promos expire.
Running agents or Copilot Spaces heavily? Those are the high-burn features. They behave nothing like chat; model the cost explicitly.
On Free/Pro/Pro+? Your plan is transitioning too. GitHub is contacting affected customers — watch your inbox for migration comms.
Engineering leaders: Get budget controls configured before June 1. Decide now: hard stop on overages, or allow additional spend with a cap?

Sources: The New Stack · GitHub Docs

✏️ Drafted with KewBot (AI), edited and approved by Drew.

The five loops between AI coding and AI engineering

Andrew Kew — Mon, 27 Apr 2026 08:51:40 +0000

One developer. No team. Just two AI coding agents running in parallel terminal sessions.

Four months later: 81% PR acceptance, 91% test coverage, bugs going from report to merged fix in roughly thirty minutes.

It wasn't a better model. It was what the codebase learned to measure.

"The intelligence in an AI-assisted codebase lives less in the model and more in the loops the codebase wraps around it."

What actually changed

KubeStellar Console — a multi-cluster Kubernetes management dashboard in the CNCF Sandbox — was the proving ground. Five rungs of the AI Codebase Maturity Model emerged from that experience, tracing the path from agentic honeymoon to near-autonomous development loop:

1. Instructed — Externalise what you keep correcting: a CLAUDE.md, PR conventions, a rejection-reasons guide — together they covered ~90% of the reasons AI PRs were being rejected.

2. Measured — Tests aren't a correctness layer; in an autonomous workflow, they're the trust layer — 32 nightly suites, 91% coverage, acceptance rates logged by category. (A flaky test doesn't annoy you here — it quietly corrupts every merge decision downstream.)

3. Adaptive — Once you're measuring, let the automation adjust itself: categories with low acceptance rates get deprioritised; CI cycles shift to what's actually landing.

4. Self-sustaining — The codebase becomes the operating manual; issues get triaged, fixed, tested, and queued before the maintainer looks at them.

5. Question, don't command — "Why didn't you catch this?" beats "fix this bug" — the first gets a patch, the second gets a root-cause, a new test, and a whole class of future failures blocked.

The lesson

The model is not the differentiator.

"The model is a commodity component, and swapping one for another is a weekend of work. Rebuilding the surrounding feedback system is a quarter of the work."

What matters: instruction files, test suites, acceptance metrics, workflow rules. That's the intelligence infrastructure. Teams optimising for model selection are optimising the wrong variable.

For open source maintainers specifically, this reframes the burnout problem. If the codebase encodes enough judgment that agents can handle triage and generate PRs, maintainers shift from daily operators to system architects.

What to do

Still in "write prompts, review output" mode? That's the first rung. Normal starting point. Ask: what's the most common reason you reject AI output? Write it down. That's rung two.
Have tests but still getting drift? Determinism first. Flaky tests are catastrophic in autonomous workflows — fix them before you build anything on top.
Logging acceptance rates by category? You're probably ready for adaptive weighting. Don't automate before you can measure.
Leading engineering? Stop optimising for which model you're using. Ask which feedback loop is missing.

Source: Beyond prompting: How KubeStellar reached 81% PR acceptance with AI agents — The New Stack

✏️ Drafted with KewBot (AI), edited and approved by Drew.

AI agents are opaque. Jaeger v2 + OTel GenAI conventions are the fix.

Andrew Kew — Sun, 26 Apr 2026 21:58:36 +0000

AI agents are distributed systems. They fan out across LLM calls, tool invocations, memory lookups, and multi-step reasoning loops — often asynchronously. But until recently, the observability tooling hadn't caught up. You'd get logs, maybe a dashboard, but no trace of what actually happened across a full agent run.

That's the gap Jaeger v2 is positioned to close — and it's not a stretch.

What actually changed in Jaeger v2

Jaeger v2, released in late 2024, didn't just add features. It replaced its entire internal architecture with the OpenTelemetry Collector framework as the core foundation.

What that means in practice:

Native OTLP ingestion. No more translation layer from OTLP → Jaeger internal format. Telemetry flows in as-is, with no data loss from conversion.
Single binary, OTel-native config. The old jaeger-agent, jaeger-collector, jaeger-ingester, jaeger-query split is gone. One binary, configured via the same YAML model as OTel Collector.
Access to the full OTel Collector ecosystem. Tail-based sampling, span-to-metric connectors, PII filtering processors, Kafka pipelines — all available without Jaeger maintaining separate implementations.
Tail-based sampling, previously hard to retrofit, is now first-class via the upstream OTel contrib processor.

The architecture shift means Jaeger v2 inherits everything OTel ships — including the new GenAI semantic conventions.

The GenAI conventions: tracing AI agents properly

OpenTelemetry is now actively developing semantic conventions specifically for AI workloads. These define how to represent:

Model spans — individual LLM inference calls (token counts, model name, latency)
Agent spans — the higher-level reasoning loops and orchestration steps
Events — prompt inputs, completions, tool call results
Metrics — token usage, latency distributions, error rates

And coverage is already provider-specific: OpenAI, Anthropic, AWS Bedrock, and Azure AI Inference all have dedicated conventions. There's even a draft for Model Context Protocol (MCP) — so tool calls via MCP-compatible servers can be traced as first-class spans.

These conventions are still in Development status, but the instrumentation is shipping now. Libraries like LangChain, LlamaIndex, and OpenAI's own SDKs are beginning to emit OTel-compatible telemetry. Jaeger v2 — being natively OTLP — can receive all of it.

Why this matters for teams building agents

The classic distributed tracing use case is: trace a request across microservices, find the slow hop, fix it. The AI agent version is: trace a user prompt → agent planning span → LLM call → tool invocation → second LLM call → final response. Across potentially different services, with retries, branching, and non-determinism.

Without proper trace context propagation, this is a black box. With OTel GenAI conventions + Jaeger v2, you get the full picture — latency per LLM call, token consumption, which tool calls fired and how long they took, where the reasoning went sideways.

That's debugging capability that didn't exist in a standardised form until now.

What to do

Already on Jaeger v1? Check the v1→v2 migration guide. The architecture shift is real but the storage backends are backward-compatible.
Building AI agents? Start instrumenting with OTel GenAI semconv now, even in Development status. You'll be ahead of the curve when it stabilises, and Jaeger v2 will ingest it today.
Using LangChain/LlamaIndex/OpenAI SDKs? Check their OTel instrumentation status — several already support it or have experimental packages.
Not on Jaeger? The GenAI conventions are backend-agnostic. Any OTLP-compatible backend (Grafana Tempo, Honeycomb, etc.) can receive this telemetry.

Sources: The New Stack · Jaeger v2 release post · OTel GenAI semantic conventions

✏️ Drafted with KewBot (AI), edited and approved by Drew.

GPT-5.5 is in the API. Don't just bump the version string.

Andrew Kew — Sat, 25 Apr 2026 10:41:37 +0000

GPT-5.5 dropped in the OpenAI API this week — more token-efficient, better at agentic workflows, already live on Vercel's AI Gateway.

But the first line of OpenAI's own migration guide is a warning:

"Treat it as a new model family to tune for, not a drop-in replacement."

That's the part worth paying attention to.

What actually changed

Two variants: gpt-5.5 for agentic coding and multi-step tool workflows; gpt-5.5-pro for demanding multi-pass work where quality beats latency.

More efficient reasoning. Same quality, fewer tokens. Compounds fast in long agent runs.

Reasoning effort defaults to medium. Previously high. OpenAI's advice: start at medium, only push to high if evals show a measurable gain. Higher effort can actually make outputs worse on tasks with weak stopping criteria.

Outcome-first prompts work better. Describe what you want, not how to get there. Skip step-by-step unless the sequence is non-negotiable.

Better tool use. More precise argument selection on large tool surfaces. Fewer hallucinated calls.

The key migration advice

Start clean. OpenAI explicitly says to begin from the smallest prompt that preserves your product contract — don't carry over an old prompt stack and expect it to work.

Every instruction you added to paper over a quirk in the previous model is now debt.

What to do

Not using it yet? No urgency. gpt-5.4 isn't going anywhere.
Evaluating? Spin up a fresh system prompt, benchmark against your actual use cases, run ablations.
On Vercel AI Gateway? Already available: openai/gpt-5.5 or openai/gpt-5.5-pro.
Running agents? This is where the upgrade is most worth it — efficiency gains and tool use improvements compound on longer runs.

→ Full details in OpenAI's Using GPT-5.5 guide and Prompting Guide.

✏️ Drafted with KewBot (AI), edited and approved by Drew.

Harness bugs, not model bugs

Andrew Kew — Fri, 24 Apr 2026 14:51:12 +0000

For six weeks, developers have been complaining that Claude got worse. You've seen the posts — "Claude Code is flaky", the AI-shrinkflation discourse.

Yesterday, Anthropic shipped a postmortem. Three unrelated bugs, all in the Claude Code harness. The model weights and the API were never touched.

That distinction is the whole point.

What actually broke

Default reasoning effort got dialled down. A UX fix dropped Claude Code's default from "high" to "medium" in early March. Users noticed it felt dumber. Reverted April 7.
A caching optimisation dropped prior reasoning every turn. Supposed to clear stale thinking once per idle session; a bug made it fire on every turn. Claude kept executing without memory of why. Surfaced as forgetfulness and repetition. Fixed April 10.
A verbosity system prompt hurt coding quality. "Keep responses under 100 words." Internal evals missed a 3% regression on code. Caught by broader ablations. Reverted April 20.

None of this was a model change. The weights didn't move. The API was never in scope.

The lesson

The model is not the product. What your users experience is model + harness + system prompt + tool wiring + context management + caching. Each layer has its own bugs. When someone says "Claude got worse," the weights are usually the last thing that changed.

API-layer products were unaffected. If you're building directly against the Messages API, none of these bugs touched you. This is why "am I on Claude Code, or am I on the raw API?" matters.

"Eval passed" ≠ "no regression." The verbosity prompt passed Anthropic's initial evals. Only a broader ablation — removing lines one at a time — caught the 3% drop. Fixed eval suites miss behavioural drift; ablations catch it.

What to actually do

On Claude Code? Update to v2.1.116+. You're already through it. Usage limits got reset as an apology.
On the API directly? Nothing to do. Stay on whatever model you were on.
Shipping your own harness on top of a frontier model? Read the postmortem twice, then audit your prompt + caching + context-management pipeline for the same silent-failure modes. The bugs Anthropic described are exactly the ones every harness reinvents.

The meta-lesson is boring and important: most of the quality variance lives between "the model" and "the thing your user sees." Ship good harnesses.

✏️ Drafted with KewBot (AI), edited and approved by Drew.