Forem: Kody from Kodus

Bitbucket AI Code Review Tools: top options for 2026

Kody from Kodus — Sat, 25 Apr 2026 16:28:00 +0000

The volume of pull requests is exploding, in part because a lot of code is now generated by AI. This combination has turned code review into a real bottleneck. Manual review cannot keep up with this pace, and when time gets tight, feedback becomes shallow. AI code review tools for Bitbucket show up exactly to deal with this. They help filter simpler issues early on, before reaching human review. This allows reviewers to focus on what actually matters, like logic and architectural decisions.

The market is full of options promising to improve this process. This article compares the main tools that work with Bitbucket:

Kodus (Open Source)
CodeRabbit
Qodo
Bito
CodeAnt AI
Bitbucket AI by Atlassian.

We will look at the differences in how they connect to repositories, the quality of their suggestions, how much they understand code context, and how they charge for usage. The goal is to give you enough ground to decide what makes more sense for your team.

What to look for in an AI code review tool

Choosing an AI tool is not just about comparing feature lists. It needs to fit into your workflow without breaking what already works. Before looking at the options, it helps to be clear on what actually matters.

Code understanding: Does the AI only look at the diff or does it understand the repository as a whole? Good suggestions depend on knowing team patterns, past changes, and architectural decisions.

Adapting to how your team works: Can it adapt to your team’s rules? Being able to define your own rules, whether through configuration, natural language, or feedback, is what separates something generic from a tool that actually helps with review.

Pricing: The pricing model directly impacts cost and how far you can scale. There are plans per user, per volume of code, credit-based systems, and the BYOK model, where you use your own keys and pay the LLM provider directly.

Comparing AI code review tools for Bitbucket

Each tool handles code review in a different way. The best choice depends on what matters most to your team, whether that is cost control or a deeper level of code analysis.

Integration and workflow

How the tool fits into the pull request flow determines whether it actually helps or just becomes another step in the process.

Bitbucket AI, being native to Atlassian, is the best integrated. It lives inside the Bitbucket interface and uses context from Jira tickets. This is its main strength. For teams already using the Atlassian ecosystem, setup is simple and the experience is consistent. There is no need to leave the tool to use it.

Most of the other tools, including Kodus, CodeRabbit and CodeAnt AI, also connect directly to the PR. They show up as an automated reviewer, leaving inline comments just like any team member. Setup is usually quick, often taking just a few minutes to install the app in Bitbucket and grant repository access.

The main difference is that independent tools require a separate account and configuration to manage. Bitbucket AI is already included in your Atlassian subscription. If your team uses GitHub or GitLab alongside Bitbucket, tools like Kodus or CodeRabbit keep the same experience across all of them.

In the case of Kodus, you can control when the review runs. It can start automatically when a PR is opened, when new commits are pushed, or when someone triggers it manually. It also tracks only new changes, instead of reviewing everything from scratch every time the PR is updated. This avoids a common issue in AI review tools, where comments start to become repetitive or disconnected as the PR evolves.

Context and comment quality

An AI review tool is only as good as what it suggests. If it generates too much noise, it gets in the way instead of helping. Context is what makes the difference.

Tools that only look at the diff tend to miss context. CodeAnt AI tries to solve this by analyzing the entire codebase, which helps identify architectural inconsistencies or issues in how new and existing code interact. It also brings “Steps of Reproduction” for bugs, which helps when validating reported issues.

Qodo stands out in independent benchmarks for its high precision. It uses a multi-agent system that builds a deeper understanding of the code and performs semantic search to find similar patterns across thousands of repositories. It works well in larger companies with complex systems.

CodeRabbit builds a "code graph" to map dependencies and keeps a semantic index of the code. This helps it understand what already exists and check if new changes are consistent. Bito follows a similar path with its "AI Architect," building a knowledge graph of the entire codebase, which is useful in microservices environments.

Kodus takes a different approach, learning from your team’s past pull requests. It analyzes previous comments and code changes to understand team patterns. This makes its suggestions more aligned with real context. Instead of just pointing out a possible issue, it can explain why that might be a problem in that specific code and suggest a fix following the same patterns the team already uses.

It also tracks the history of the review itself. Instead of repeating suggestions or commenting on points that were already resolved, Kodus focuses on what actually changed. Feedback stays consistent with the current state of the PR and avoids unnecessary noise.

Customization and control

Generic rules are a good starting point, but every team works differently. The ability to adjust how the AI behaves is what makes the difference over time.

Bitbucket AI is more limited here. Rules come from Jira ticket acceptance criteria. This helps ensure the code follows business requirements, but it does not give much direct control over code standards.

Other tools offer more explicit configuration. CodeRabbit has a "learnings" system where you write preferences in natural language. Bito supports custom guidelines via dashboard or a .bito.yaml file in the repository.

Kodus has the most flexible rule mechanism. It allows defining rules in natural language, but also supports more advanced logic. For example, you can create a rule that flags a deprecated library only in certain microservices, or require a specific error handling pattern in new endpoints.

You can also adjust how the review shows up day to day. You can control severity levels, group comments, limit suggestions, and define what should be ignored. This helps avoid noisy PRs and keeps focus on what actually matters for the team.

Pricing models

Pricing varies a lot across these tools.

CodeRabbit, Bito and CodeAnt AI mainly use a per developer per month model. CodeRabbit costs $60 per dev/month on monthly, or $48 per dev/month yearly, but it comes with PR review limits, which can hurt teams with high volume. Bito costs $25 per dev/month on monthly, or $20 per dev/month yearly. CodeAnt AI is around $30 per dev/month on monthly, or $24 per dev/month yearly.

In the case of Qodo, the Teams plan costs around $30 per dev/month yearly, or $38 per dev/month monthly, which puts it among the more expensive options in this comparison.

Bitbucket AI uses a credit-based system. You buy a monthly volume and each action consumes part of it. It can be cheaper for irregular usage, but can be unpredictable in more active months.

Kodus follows a different model with "Bring Your Own Key" (BYOK). You pay the platform $10 per dev/month on monthly, or $8 per dev/month yearly, but you pay token usage directly to the LLM provider (like OpenAI, Google, or Anthropic). This brings two benefits. First, you choose which model to use, and can switch to a newer, better, or cheaper one without being locked in. Second, the total cost is usually lower because there is no markup on model usage.

This model also avoids locking you into a single provider or a fixed cost per PR, which gives more predictability as pull request volume grows.

Choosing the right tool for your team

All six tools can work for teams using Bitbucket. The better question is what kind of tradeoff your team is willing to make.

Bito makes sense if you want something quick to adopt and simple to use. CodeRabbit fits better if you are looking for deeper analysis in review. Qodo works well for teams that want a more polished solution. CodeAnt AI is a better fit if you want to combine review, security, and CI in one place.

Kodus stands out for teams that want more control. Control over review quality, noise level, model choice, and cost. This becomes more visible as PR volume grows and the team needs to adjust how review works day to day, instead of just accepting the default behavior of the tool.

Most popular open source AI tools among developers in 2026

Kody from Kodus — Fri, 24 Apr 2026 16:23:37 +0000

I’ve seen a lot of developers looking for good open source AI tools. It makes sense. We want to understand how things work under the hood, control our own setup, and keep code away from third-party services. There are quite a few options out there, and it’s not always clear which tool solves which problem. This is an overview of the tools I’ve seen teams actually use, covering everything from running models on your own machine to automating code reviews.

Open source AI tools for code review and workflow

These tools go beyond generating code snippets; they try to automate larger parts of the software development lifecycle.

OpenHands

OpenHands is an agentic framework built to handle end-to-end engineering tasks. You give it a high-level goal, like "fix this GitHub issue" or "add a new API endpoint," and it creates a plan, writes the code, and tries to execute it. It runs in an isolated environment, which means it has access to a shell and file system to get the job done.

It’s more experimental than the other tools. It was designed to automate complex, multi-step tasks that would normally require full developer attention, delegating entire chunks of work to an autonomous agent instead of doing pair programming.

When it’s useful: You’re exploring automation of complex engineering tasks and are willing to invest time in setting up and guiding a more autonomous system.

Kodus

Kodus operates in a different, and often overlooked, part of the development process: code review. Most tools help you write code faster, but Kodus helps ensure that the code being merged actually has quality. It integrates with GitHub, GitLab, and other platforms to act as an AI reviewer in your pull requests.

Its strength is how it learns from the existing codebase and from your team’s review conventions. It doesn’t just look for generic issues. Instead, it brings feedback on logic, security, and performance that actually makes sense within the specific patterns of your project. You can also define custom rules in natural language, helping the team maintain certain standards.

Since it’s model-agnostic, you can connect it to any OpenAI-compatible endpoint, including local models running via Ollama. That gives you flexibility in cost, performance, and privacy. Unlike code generation tools that create code from scratch, Kodus looks at diffs and gives feedback. It’s a different task that requires actually understanding the existing code.

When it’s useful: Your team wants to improve the quality and consistency of code review. You need an automated reviewer that understands your team’s specific standards, not just generic best practices.

Open source AI models

Before using most developer tools, you usually need a way to run models locally. This keeps everything private, gives you control, and helps avoid API costs.

Ollama

Ollama made running large language models (LLMs) locally actually work for most developers. It packages open models like Llama 3, Mistral, and Gemma into a single command-line tool. You run ollama run llama3 and you already have a local inference server with a REST API ready to go.

It handles the annoying parts of model management, like quantization and GPU setup, so you can just use the model. Many other tools in this list can connect to an Ollama instance, which makes it a common building block for a local AI setup.

When it’s useful: You need a straightforward way to run open models on your machine or on a shared team server. It’s usually the first thing you set up in a self-hosted AI workflow.

Open WebUI

If Ollama is the backend, Open WebUI provides a self-hosted frontend. It gives you a clean web interface, similar to ChatGPT, but connected to local models via Ollama or other compatible APIs. It works well for teams that need a central, private place to test different models and prompts without relying on a public service.

You can extend and configure it for different roles and access levels. It can be used to test model responses or as an internal chat for the engineering team, keeping data and interactions inside your own infrastructure.

When it’s useful: Your team needs a shared, browser-accessible chat interface for local models.

Open source AI tools for writing and editing code

This category looks at the core of development: writing, editing, and cleaning up code. These tools usually live inside the editor or the terminal.

Tabby

Tabby is a self-hosted code assistant, basically an open source alternative to GitHub Copilot. Its main function is code completion. It becomes really useful because it can connect to your Git repositories. It analyzes your codebase to offer more relevant suggestions, aligned with your project’s patterns and conventions.

Since it’s self-hosted, all inference runs on your own infrastructure, which is essential for organizations with strict privacy requirements. The setup takes a bit more effort than a cloud service, but the level of control usually justifies it.

When it’s useful: You want a self-hosted autocomplete tool that learns from your own code. Keeping data private is an important concern.

Continue

Continue is an open source extension for VS Code and JetBrains; it works as an assistant inside the editor. You can chat with it, ask it to edit or generate files, and debug code. It works with both locally running models via Ollama and cloud APIs. One interesting feature is the ability to define reusable "slash commands" to automate common development tasks for your team.

The idea is to go beyond a simple chat window, with features that understand terminal history and project files. It sits somewhere between basic autocomplete and more autonomous agents.

When it’s useful: Your team mainly uses VS Code or JetBrains and wants an assistant integrated into the editor that can be customized for specific workflows.

Aider

Aider is aimed at developers who spend most of their time in the terminal. It’s a command-line chat tool that helps you work with AI to edit code across multiple files. What really sets it apart is the Git integration. Every change made by the AI automatically becomes a commit, keeping a clean history of what was done. You can easily review diffs and undo any change you don’t like.

This approach makes the interaction with AI feel like real pair programming. You give it a task, it writes the code and commits it, and you review it like you would with another person. It’s a very different workflow from IDE-based tools.

When it’s useful: You’re comfortable in the terminal and want an AI assistant that works directly with your Git repository.

Choosing the right tool for the job

The decision depends on which part of your workflow you want to improve.

To run models with full control, start with Ollama as the base and add Open WebUI if you need a shared interface.
To write code faster, Tabby is a solid option for self-hosted autocomplete with context, while Continue offers an assistant experience inside the IDE. Aider works best if you live in the terminal.
To automate full tasks, OpenHands shows what autonomous agents can do, although it requires more setup and guidance.
To improve code review quality and consistency, Kodus was built specifically for that. It focuses on analyzing changes within the context of your project, which is a very different problem from code generation.

Most teams I’ve seen end up combining these tools. They might use Ollama to run models locally, Tabby for autocomplete, and Kodus to ensure that the code being written follows quality standards before merge.

The Future Of Software Engineering according to Anthropic

Kody from Kodus — Mon, 30 Mar 2026 20:48:00 +0000

The initial productivity gains from AI code assistants are starting to come with a cost. The speed is undeniable. You can generate hundreds of lines of boilerplate code or even a complex algorithm in seconds. The problem shows up later, in the pull request. Reviewing AI-generated code feels different from reviewing a colleague’s work. You can’t ask for its reasoning or discuss trade-offs. You’re left looking at a block of code that is syntactically correct and often seems logical, but whose assumptions are a mystery. The gap between how fast we can generate code and how long it takes to verify it is the main challenge of bringing AI into our work, and it shows how our roles need to change. Anthropic’s research suggests a path forward, a future of software engineering where the engineer’s primary role is directing systems, not writing code.

This is a fundamental shift in where we apply our expertise.

The growing gap between generation and verification

We’re hitting a limit with the current generation of AI tools. They are excellent at speeding up tasks we already know how to do. The bottleneck is no longer writing code; it’s making sure the generated code is correct, maintainable, and safe to merge.

The speed vs. control problem

A junior engineer submits a 500-line pull request. You can review it by inferring intent, correcting logic, and teaching better patterns. A 500-line pull request coming from an AI is all or nothing. You can accept it, reject it, or spend more time editing it than it would take to write it from scratch. The tool gives you speed, but you give up control and understanding. This leads to a process where engineers generate code and then spend an equivalent amount of time reverse engineering the logic to write tests and confirm the behavior matches what they originally wanted.

From implementation details to specification failures

Our current code review process focuses on implementation. We check style, performance, and logical errors. But when a bug shows up in AI-generated code, the root cause is usually a flaw in the original prompt or a misunderstanding by the model, not a simple coding mistake. The conversation has to move to a higher level. The question is no longer “is this for loop correct?” but “did my prompt describe all edge cases accurately?” The developer’s focus shifts from writing code to writing an unambiguous specification that a machine can execute.

Managing complex generated systems

The problem gets worse when you move from isolated functions to generating entire services. You end up with a large amount of code that no one on the team fully understands. Who owns it?

When something breaks in production at 3 AM, who has enough context to debug it quickly? The codebase starts to become harder to understand, and knowledge about the system erodes over time.

Anthropic’s perspective: the engineer as an AI system architect

Anthropic’s view is that this discomfort is a sign of a temporary phase. They propose a future where engineers are architects and validators of AI-driven development systems, rather than programmers focused on implementation. The value shifts away from the amount of code written and toward how we define problems and evaluate solutions.

A shift in value: from implementation to specification and validation

The core idea is that an engineer’s primary output will no longer be code. It will be a combination of two things:

1. A precise, machine-readable specification of the problem. This is not a Jira ticket. It’s a formal definition of inputs, outputs, behaviors, and constraints.
2. A complete validation and verification setup. This includes tests, property-based checks, and performance benchmarks that can certify, without human review, that the generated solution meets the specification.

In this model, the AI agent’s job is to find any implementation that satisfies the specification and passes the tests. The engineer’s job is to make the specification and tests so solid that any passing implementation is, by definition, correct.

AI agents as collaborators

This turns AI from a simple tool (like a compiler) into an active collaborator. It’s a system that takes structured, high-level direction and executes the repetitive work of implementation. The creative part of engineering, breaking down complex problems into logical and verifiable parts, remains deeply human. The mechanical part of translating that into syntax is handed off to the machine.

What this means for our daily work

If this idea is right, several of our practices will need to change. The structure of teams, daily routines, and how we define “done” will all be affected.

Code reviews become specification reviews

The pull request process tends to change. Instead of reviewing 500 lines of generated code, the focus shifts to the specification and the tests that support that solution.

The review is no longer centered only on readability or small implementation details. The main point becomes the clarity of the specification, whether it leaves room for misinterpretation, and whether the tests properly cover the defined scenarios, including more extreme cases.

The generated code might not even be the main focus. It becomes more of a piece of evidence that a valid solution exists for that specification.

Architecture as orchestration of AI agents

Architectural design also moves up a level. Instead of designing class hierarchies, you design systems of interacting AI agents. A diagram might show an “authentication agent” with a spec for handling JWTs and security policies, a “data processing agent” with a spec for a pipeline, and the APIs that define how they connect. The architect’s role is to define boundaries and success criteria for each part, and let the AI build and connect them.

Debugging starts with the specification

When a bug appears in production, the first step is not attaching a debugger to the generated code. It’s reproducing the failure in the verification environment. If you can write a test that demonstrates the bug, there are two possibilities:

1. The implementation is wrong. The AI agent failed to generate code that satisfies the specification. The solution is to report the failure and have the agent generate a new version that passes the stronger test suite.
2. The specification is wrong. The code correctly implements an ambiguous or incomplete spec. The solution is to fix the spec, strengthen the tests, and regenerate the component.

In this scenario, technical debt shifts toward poorly defined specifications and insufficient test coverage.

Where this vision is incomplete or optimistic

This idea is strong, but in practice things are messier. The model assumes new projects, where you can write good specifications from the start.

How do you apply this to a legacy codebase with no documentation and few tests? Creating a clear specification for a system no one fully understands is often harder than just fixing a bug.

Many software problems, especially in product work, are not mathematically precise. Requirements are discovered along the way. You build, show it to users, and adjust. A system that demands perfect specifications upfront doesn’t fit well with this kind of exploratory work.

And models are not compilers. They are non-deterministic. The same prompt can produce different results. This creates a new class of bugs where something works one day and breaks the next after a seemingly unrelated regeneration. Handling this requires a level of tooling and discipline we are only starting to build. This vision works better for well-defined problems like sorting algorithms or protocol parsers. It is less convincing for complex, user-facing products where the “spec” is always evolving.

What to do if Anthropic is even partially right

We don’t have to wait for this future to arrive to start adapting. If the most important work is moving from implementation to specification and validation, we can start now.

1. Treat specifications like code. Start writing more detailed and formal descriptions of component behavior before coding. Use more structured formats than a README, like TLA+ for system thinking or well-defined OpenAPI specs for APIs. Review the spec with the same rigor as code.
2. Invest in automated verification. Go beyond simple unit tests. Use property-based testing to validate rules across a wide range of inputs. Use mutation testing to find weaknesses in your test suite. The goal is to build a setup that gives you strong confidence in correctness, whether the code was written by a human or AI.
3. Practice instructing the machine. Knowing how to tell an AI agent what to do is a different skill from programming. It requires clarity of thought, understanding the model’s limits, and the ability to break problems into prompts that produce verifiable results. Start practicing this with current tools. When using an assistant, focus less on the generated code and more on refining your instructions until the output becomes consistently correct.

The engineer of the future may write less code, but will need to think with more structure than ever. The work is not going away. It is just moving up a level in the stack.

Will Engineers Be Replaced by AI? What’s Actually Changing in Software Engineering

Kody from Kodus — Mon, 30 Mar 2026 12:46:00 +0000

The question of whether engineers will be replaced by AI is not new, but this time it feels different. Previous automation focused more on build and deploy. Now AI goes straight into the code, generating functions and classes that look correct at first glance.

This touches what many of us considered our job and starts to create a constant concern about the future.

The answer is not a simple yes or no. While the cost of producing a line of code is dropping close to zero, the cost of understanding a complex system is not. The real work in software development is shifting. It is becoming less about typing and more about validation, integration, and making good decisions.

Where AI helps today

AI coding assistants do speed up some tasks. They are good at generating a first version of common patterns, writing boilerplate for a new component, or creating unit tests. Give them a well-defined problem with little context, and they can produce a solution in seconds.

I use them to understand unfamiliar code in legacy systems or to convert data from one format to another. In these more isolated tasks, the tool generates a solution, I review it, adjust what is needed, and move forward faster than if I had done it from scratch.

The limits of a quick draft

The code AI generates is usually a good starting point, but it is rarely a complete solution. It lacks the deep, implicit context of your project. It does not know about the technical debt in LegacyUserService, the upcoming database migration, or the specific coding conventions the team agreed on last quarter.

The output is clean, but without context. Our work is now more about reviewing, correcting, and integrating AI-generated code. We are the ones responsible for fitting that generic block into our specific and nuanced system.

The new problems in our workflow

While AI removes much of the repetitive work, it also introduces new problems that are harder to notice. The code it generates often looks correct, which makes the errors less obvious.

It may generate a function that follows the code style well but ignores error handling that the system requires to operate safely. It can also introduce a security vulnerability by using a library in a way that seems correct but is already known to cause issues.

This creates an additional mental load because every suggestion needs to be verified. We have to understand what the code does and also anticipate what it fails to do. The risk is accepting a block of code we do not fully understand, which is a fast path to accumulating technical debt. The bugs that come from this may only appear months later and will be much harder to trace. We are shipping more code faster, which increases the pressure on the review process.

The skills that matter more

As generating code becomes cheaper, the value of other skills increases. These are the parts of the job that AI does not do.

An engineer’s value comes from architectural design, which involves understanding dependencies between services, planning data flows, and making high-level decisions about system structure. AI can write a function, but it cannot design a distributed system that balances trade-offs between consistency, availability, performance, and cost.

It also comes from the ability to translate ambiguity. Turning a vague business need into a concrete technical plan involves asking questions and negotiating scope. This is a human process of building shared understanding. Complex debugging is another example, like tracing a production issue across multiple systems, logs, and dashboards. This requires a deep mental model of how everything fits together. Finally, communication and mentorship remain essential to explain technical trade-offs to product and set technical direction for the team.

Faster code generation makes a good engineer more valuable. An experienced engineer can use AI to increase output, test ideas, and build prototypes quickly, while applying system knowledge to ensure the final result makes sense.

Our role is to provide context

Our value is shifting. We are less focused on writing code and more on understanding the system context. Our job is to connect business requirements with technical constraints and ensure the code remains sustainable in the long term.

We spend less time writing a sorting algorithm and more time making sure the right data enters the system in the right way. Our role is to connect parts of the system that AI cannot see.

The focus shifts from the syntax of a loop to the coherence of the system as a whole.

How to adapt

This shift requires a conscious change in focus.

You need to treat AI as a very fast junior engineer that has no memory of past conversations. Give clear and specific instructions and small, well-defined tasks. Then critically review every line it generates. Do not fully trust it.

Build the habit of evaluating AI suggestions against your project’s constraints. Ask whether it follows your style guide, whether it introduces a dependency you do not want, or how it will behave under load.

Deepen your understanding of system design. Spend time reading architecture documents and analyzing how services in your company interact. The more you understand the system as a whole, the better you can guide AI-generated code to fit into it.

Focus on the human parts of the job. Get better at explaining complex ideas, negotiating technical plans, and working with your team. The hard problems have always been deciding what code to write. That part of the job is not going away.

AI Fatigue for Developers: Managing Cognitive Overload from Code Assistants

Kody from Kodus — Sun, 29 Mar 2026 08:44:00 +0000

The constant stream of suggestions from AI code assistants is creating a new kind of mental tax. The problem isn't prompt engineering or model accuracy. It's a deeper issue of control and focus, leading to what some engineers are calling AI fatigue.

When your IDE is constantly suggesting entire blocks of code, your job changes from creating to validating. You become a full-time reviewer for a junior developer who never sleeps, never learns your project’s specific context, and never gets tired. This model of passive assistance has a real, non-obvious cost. It trades the focused effort of building for the scattered effort of auditing, which can compromise the system’s integrity and the deep thinking that good engineering requires.

The silent cost of constant suggestions

Code assistants see a local window of your code, but they have no architectural awareness. They can generate a function that correctly implements an algorithm, but they don’t know about the new data access pattern the team agreed to last month or the long-term plan to deprecate a library. The developer is left to catch these subtle, critical deviations.

Shifting the review burden to the engineer

Every AI suggestion is a tiny pull request. You have to stop, read the suggestion, parse its logic, check its correctness, and make sure it aligns with the project's architecture. This is a real context switch that interrupts your train of thought.

When you write code from scratch, the logic flows from your mental model of the system. When you review an AI suggestion, you first have to reverse-engineer the AI's logic and then map it back to your own. This validation cycle, repeated dozens or hundreds of times a day, fragments your attention. The cognitive load isn't reduced, it’s just changed from a focused block of creative work into a scattered series of validation checks.

Why passive assistance erodes critical thinking

Deep work in software engineering requires holding a complex problem in your head. You build a mental model of how components interact, how data flows, and where things might fail. Constant suggestions from an AI assistant actively work against this. The stream of code snippets keeps your attention at the surface level, focused on the next few lines instead of the overall structure.

This continuous interruption prevents the sustained focus you need to see a better abstraction, question a requirement, or spot a potential performance bottleneck. The AI optimizes for local correctness, while an experienced engineer optimizes for the system’s global health. By offloading the "easy" parts, we risk losing the context that helps with the hard parts.

AI fatigue: A new source of engineering debt

The speed gains from AI assistants are immediate and easy to measure. The architectural drift and fuzzy ownership that come with them are not. This asymmetry creates a new, quiet form of engineering debt.

The subtle drift in code ownership

When you accept an AI suggestion, a small part of the code's "why" is lost. The code is there and it works, but the reasoning behind its structure and the alternatives that were discarded exist only in the model's latent space. You, the author, don't have the same depth of ownership because you didn't go through the process of creating it.

When it comes time to refactor or debug that code six months later, that lack of deep context is a liability. The team has a block of code that is technically sound but feels architecturally foreign. No one feels a strong sense of ownership, which makes it harder to evolve or fix.

When speed obscures architectural intent

An AI assistant will happily generate code that violates your team's established patterns if it saw a different, more common pattern in its training data. It might use a direct database call where you've built a repository layer, or implement custom state logic in a component that should be using a centralized store.

These small deviations seem harmless by themselves. They get the immediate task done faster. Over time, they break down the architectural coherence. The system becomes a patchwork of different patterns, making it harder to understand, maintain, and test. The speed you gained on the first commit is paid back with interest during every future maintenance cycle.

Defining AI boundaries: A control-first approach

The answer isn't to abandon these tools. We need to shift from passive, continuous assistance to explicit, on-demand use. The developer must be in control, deciding when and how to ask for help.

Explicit ways to use AI

Instead of a constant stream of suggestions, we can use AI for specific, high-value tasks.

Scaffolding on demand. Use the AI for boilerplate. Ask it to generate a new gRPC service with stubs, a new component with a test file, or a CI/CD pipeline from a template. This is a one-shot generation task that saves setup time without interfering with the core logic.
Generation from your patterns. For well-defined, repetitive tasks, show the AI examples of your own code. Give it a few existing data models and their repository classes, then ask it to generate a new repository for a new model. This constrains the AI to your project's specific conventions.
Help with refactoring. Use AI for mechanical changes on a selected block of code. Tasks like converting a `for` loop to a `map`, extracting a method, or changing a Promise chain to async/await are well-suited for automation. The scope is small and your intent is clear.

Team policies for code assistant output

To manage the code that comes from AI, teams need simple conventions. This is about managing risk and maintaining clarity.

Flag AI-generated code in reviews. A simple comment like `// AI-generated, reviewed for correctness and style` or a pull request label tells the reviewer to apply a different kind of scrutiny. The focus should be less on syntax and more on architectural fit.
Check for architectural compliance. The main job of a human reviewer for AI-assisted PRs is to be an architectural backstop. Does this code use the right data layer? Does it follow our error handling standards? Does it introduce dependencies we are trying to remove?
Document AI use in pull requests. A brief note in the PR description about which parts were AI-generated helps build a history. If you start seeing bugs from AI-generated code, this data becomes useful for refining team policies.

Measuring AI’s impact on architecture and reliability

Productivity claims for AI often focus on lines of code or commit frequency. These metrics miss the point. The real measure of a tool’s effectiveness is its impact on our mental state, the system’s reliability, and its long-term maintainability.

Evaluating the real cognitive load

We need to measure the total effort, not just the time it takes to write the first draft.

Compare time spent. For a given feature, track the full cycle time. Does the time saved in generation get eaten up by validation, debugging, and refactoring?
Track bugs from AI code. When filing a bug, add a field to identify if the code was known to be AI-generated. Over time, you can see if there is a correlation between AI use and certain types of defects, especially subtle logic or integration bugs.
Ask developers how they feel. Do they feel more focused or more fragmented? Do they feel like they are spending more or less time in a state of deep work? Anonymous surveys can provide honest feedback on the perceived mental cost.

Prioritizing deep work over suggestion streams

The goal of any developer tool should be to get out of the way so engineers can solve hard problems. AI assistants, in their current always-on form, often do the opposite. They pull your attention to the surface and encourage a reactive workflow.

By establishing clear boundaries and using AI for specific, targeted tasks, we can reclaim control. The goal is to make AI a tool you pick up for a job and then put down, not a constant presence that reshapes how you think. The most valuable work in engineering still happens during quiet, uninterrupted stretches of deep thought, and our tools should protect that focus.

Context Engineering vs Prompt Engineering: the shift in how we build AI systems.

Kody from Kodus — Sat, 28 Mar 2026 19:46:00 +0000

You receive a pull request coming from an AI. The code looks clean, follows the prompt, and all unit tests pass. Then you notice it uses a library the team deprecated last quarter. And a design pattern that violates the service’s architecture document. The code looks right, but it is completely wrong in the context of your system. This is the absolute limit of prompt engineering. The whole "context engineering vs. prompt engineering debate" comes down to a practical problem: our AI tools keep delivering code that requires careful manual fixes.

Prompt engineering works well for isolated tasks. Building software is not a sequence of isolated tasks. It is a chain of decisions constrained by existing code, team habits, and business rules. The problem is not a poorly written prompt. The problem is that the model has no idea what is happening outside its small window. Asking a better question does not help when the model cannot see the rest of the codebase.

The costs of prompt engineering at scale

Using prompt engineering for anything larger creates a fragile, high-maintenance system. You get stuck in a loop of tweaking prompts to fill gaps in the model’s knowledge, and then everything breaks when the model gets updated or the problem becomes harder.

A maintenance nightmare

When an AI system depends only on the prompt, the system becomes hard to maintain and easy to break. Engineers write absurdly complex prompts to get the right behavior, and those prompts turn into a mess that is impossible to maintain, full of assumptions about how the model "thinks".

This fails in a few ways. A small change in the output means you have to go hunting for which prompt to modify, which is even worse when prompts are chained. When a provider releases a new model version, your carefully tuned prompts can break. An instruction that worked perfectly yesterday can produce a completely different format or a logical error today. And when something fails, it is hard to debug. You end up guessing whether the problem was the prompt or the lack of context, because you cannot easily reproduce the exact state the model was in.

You can only optimize a prompt so far

There is a limit to what you can achieve by only refining a text prompt. Building software requires awareness, not just a good instruction. A prompt cannot contain a project’s dependency graph, the reasoning behind an old architectural decision, or how the team prefers to handle asynchronous operations.

You end up with code that is technically correct, but in practice is useless. Error handling is a perfect example. A model will generate generic `try/catch` blocks because it knows nothing about structured logging, error types, or the system’s metrics patterns. The code works, but it is incomplete and does not fit, which means a developer will need to fix it. Without system awareness, AI produces unpredictable results that make people lose trust in the tool.

Context Engineering vs. Prompt Engineering: a different way to think

We need to stop trying to cram everything into the prompt. We should focus on designing the environment the model works in, not just optimizing the instruction. This means building systems to provide the model with the specific and explicit information it needs to make decisions that actually fit.

Context engineering is the work of designing, building, and maintaining the systems that collect, filter, and provide this information to the model.

More than just a prompt

An interaction with an LLM should not be `prompt -> output`. It should be `(context + prompt) -> output`. The prompt is just a small part of a much larger package. This operational context can include data, like relevant code from other files, database schemas, or API contracts. It can also include tools, which are functions the model can call to get more information (like running a linter or checking user permissions). It can even include state, like which files the user has open or which commands they just ran.

This completely changes the work. We are architecting an AI environment, not just writing prompts. The real work is deciding which information needs to be explicitly provided and what the model can be expected to know. A code style guide is explicit context. Python syntax is implicit knowledge.

Context engineering is system design

This approach changes how we build with AI. Instead of endlessly tweaking text, we move to building structured and predictable components. A context engineer thinks about the whole system, while a prompt engineer focuses on a single interaction.

A prompt engineer tries to improve a response. They might change a prompt from "write a function that does X" to "acting as a senior software engineer, write a pure function that does X, follow functional programming principles, and include property-based tests." This might give you a better answer in that specific case.

A context engineer wants to make the entire system more consistent and reliable. They build the infrastructure that automatically finds the team’s functional programming principles in the wiki, pulls examples of existing property-based tests from the code, and provides all of that as context. The prompt can stay simple. The system becomes more reliable because the model’s decisions are now based on real project data.

The real intelligence is not in the model. It is in the system that feeds it. Building AI products that are sustainable depends on this.

Principles for building with context engineering

This requires a more disciplined way of passing information to models. It looks a lot like system design, which is a good sign.

Design your system to deliver context

The first step is to treat context as a core part of your system architecture. The code that fetches the database schema should not be mixed with the code that formats the prompt. You should create separate modules or services that only provide specific pieces of context. This makes the system much easier to test and maintain.

Whenever possible, provide context as JSON or YAML, not just as a large block of text. This helps the model interpret the information more reliably. For example, provide the style guide as a JSON object of lint rules instead of pasting raw text.

You also need versioning for your context. API schemas change, documents get updated. Your system should be able to point to specific versions of these sources. This is the only way to reliably reproduce a past generation for debugging.

Thinking in layers of context

It helps to think of context as a stack of layers. Each layer provides a different type of information. This helps prioritize and filter what you send to the model, which matters for staying within token limits and avoiding noise.

A context stack for a coding task could look like this:

Layer 0 (Global): The model’s built-in knowledge of a programming language.
Layer 1 (Organization): Company engineering standards or preferred libraries.
Layer 2 (Project): Architecture patterns for this project, lint rules, and dependency list.
Layer 3 (Local): The content of the current file and other related files the user has open.
Layer 4 (Dynamic): Real-time feedback from a compiler or test runner.

A feedback loop is already embedded in this idea. If the model generates code that fails a lint check, that failure becomes dynamic context for the next attempt. The system can self-correct using immediate and factual feedback.

Applying context engineering in practice

To make this work, you need to build the infrastructure to manage context as code.

Managing context with code

Orchestration tools can help, but a well-defined set of functions or microservices also works. The point is to have a programmable way to assemble and provide context. An orchestrator can call different context providers depending on the user’s request, assemble the final package of information, and send it to the model.

You also need to validate and monitor your context. Before sending information to the model, check it. Does the API schema parse? Does the file path exist? Keep an eye on the quality of your context sources. Outdated or incorrect context is worse than no context.

Finally, the system needs to adapt the context to the task. A request to refactor a function needs different information than a request to write a new database migration. Your code needs to be smart enough to fetch the right context for each type of work.

A checklist for building with context

When you are designing a new AI feature, ask these questions:

What information is needed? Identify every piece of information the model needs to make a good decision (source code, docs, schemas, git history, team patterns).
What is the scope? For a specific task, what is the boundary of the context? The current file, the package, the entire repository? How do you define this in code?
How do we know the context is good? How do you ensure the context is correct and up to date? What is the fallback if a source is unavailable?
How does it stay updated? When and how is the context updated? On every request, on a time interval, or after an event like a git commit?
Can we debug it? Are you logging exactly which versioned context was sent to the model along with the prompt? Can you perfectly reproduce a past generation

Answering these questions shifts the work from guessing the magic words in a prompt to building a software system that is reliable and easy to debug. Small improvements in a prompt lead to small improvements in the output. Improving the context changes the reliability of the entire system.

What Is Context Engineering and How to Apply It in Real Systems

Kody from Kodus — Fri, 27 Mar 2026 22:43:00 +0000

An AI code assistant generates a function to handle user uploads. The code looks correct, compiles, and passes all tests. You merge it. Two days later, a high-priority bug comes in. Files from premium plan users are being processed by the standard, slower queue. The generated code called the generic function `enqueue_job()` because it had no idea that a utility `priority_enqueue_job()` existed for specific user roles. The code was correct, but wrong for the system.

This is the ceiling that most teams hit with LLMs. The model’s reasoning ability is good, but it is almost completely unaware of your system’s operational reality. Fixing this requires a system-level approach. This is context engineering: the process of selecting and structuring the right information to give to the model, so the output is not just plausible, but correct within your specific environment.

The cost of disconnected models

When a model operates with incomplete context, it produces outputs that are plausible, but wrong. These errors are tricky because they pass static analysis and even basic unit tests. They are integration and logic bugs that reveal a gap between the model’s world and the reality of the codebase.

I have seen this happen in a few ways. A model suggests adding `axios` to a Node.js service, without knowing there is already an internal, hardened HTTP client with tracing and built-in error handling that is required for all network calls. An LLM refactors a method in Python to gain efficiency, changing the data type of a rarely used return value from `list` to `generator`. The local module’s unit tests pass, but a downstream service that consumes this output now fails at runtime because it expects to be able to call `len()` on the result. Or a model writes a database query that works perfectly in isolation, but omits a required `WHERE tenant_id = ?` clause because it is not aware of the system’s multi-tenant architecture.

In each case, a developer needs to manually step in, figure out what the model missed, and rerun the task with the missing information. This manual process of re-engineering context for each request is a productivity tax that does not scale. It is what turns a “10x” tool into a 1.1x tool with a high maintenance cost.

What is context engineering

Prompt engineering focuses on refining the instructions given to a model, on properly formulating the command. Context engineering is about providing the operational information required for the model to execute that command correctly. A prompt engineer works on the `System:` message. A context engineer builds the data pipelines that populate the `User:` message with everything the model needs to know.

This goes beyond simply throwing data into the prompt. It is about identifying the critical pieces of information that define the operational boundaries of a task. This information should be treated as a first-class architectural concern, not a last-minute detail.

Main operational context boundaries include:

Relevant data points: Instead of the entire database schema, provide the schemas of the tables related to the user’s request. Instead of all API endpoints, deliver the OpenAPI specs of the services involved.
User interaction history: What did the user just do? What error did they just see? What is their role and what permissions does that grant?
System state variables: Current feature flags, API rate limits of a dependent service, or the load on a database replica. This information is volatile and exists outside the codebase.

Treating context as a detail is why so many AI integrations feel fragile. Treating it as part of the system architecture is how you make those integrations reliable.

How to manage context

A systematic approach to context means identifying what matters, designing how to deliver it, and keeping the information up to date.

First, you need to map the categories of information a model needs to execute a task correctly. User context includes ID, permissions, preferences, and recent activity. Is the user an admin or a standard user? Are they on a free or enterprise plan? This often defines which business rules apply. Domain context is your business logic and constraints, like “Orders above $10,000 require a manual approval step”. Operational context is the real-time state of your system, like API rate limits or active feature flags. This is the most dynamic type of context. Finally, interaction context is the state of the current session, like previous questions in a chat or the error message from the last failed test.

Once you know what you need, you have to get it to the model. There are a few patterns, each with different trade-offs in performance and complexity. The simplest method is direct parameter passing, where you include short-lived context like a `session_id` directly in the API call. For more persistent information, a context store like Redis can store user profiles or permission sets, which your application retrieves before calling the model. To fetch information from large volumes of text like documentation or the codebase, Retrieval-Augmented Generation (RAG) can find the most relevant chunks to include. Some context can even be inferred from system events, like automatically injecting information about a high-latency microservice.

The main trade-off is between completeness and performance. A full RAG query can add 500ms of latency to a request. A larger context increases token cost and can introduce noise that worsens model performance. The goal is to provide the minimum sufficient context, not the maximum possible.

Outdated context is as bad as no context. A model that thinks a deprecated function is still in use will generate code that is already broken. Just like your API schemas, the structure of the context you provide also changes, so you should version your context objects. You need to monitor how stale the information is and define expiration limits. This should not be manual. Connect this to your CI/CD pipelines. When documentation is updated, a post-commit hook should trigger reindexing. When a new microservice is deployed, its OpenAPI spec should be automatically published to your context store.

Putting it into practice

Building context-aware systems means designing services where context assembly is a primary responsibility. Instead of each feature calling the LLM directly, you can have a central service that gathers information from different sources and provides it to other services.

This also means designing for failure. What happens when a context source is not available? The system should degrade in a controlled way. A code generation tool that cannot access the full codebase might refuse to perform a complex refactor and offer a simpler, safer alternative. It should signal that the response is based on incomplete information.

You need to measure the impact. Track how often the model’s output is accepted without changes versus when it requires manual edits. Instrument your systems to log when the lack of a specific type of context leads to an error. This data will show where context matters most and where it is worth investing in improving your retrieval pipelines.

Teams that treat AI as a prompt engineering problem quickly get stuck in a loop of tweaking instructions and manually fixing plausible but wrong outputs. They hit a reliability ceiling.

Teams that treat AI as a systems integration problem, a context problem, will build the infrastructure to provide models with the information they need to be actually useful. They will unlock a higher level of performance and reliability.

Scaling DevOps Culture: From Improvised Scripts to Platform Engineering

Kody from Kodus — Fri, 27 Mar 2026 12:41:00 +0000

The collection of scripts, manual configurations, and unwritten rules that helped your company get started will eventually begin to hold you back. What once felt improvised and efficient becomes slow and fragile as the team grows. This is a predictable breaking point in an organization’s DevOps culture. The systems that helped a team of ten people move quickly now create friction for a team of fifty. Operational work spreads out, slowing down feature development until you are spending more time dealing with custom deploy logic than writing code.

The costs of “good enough” automation

Early automation is usually just about being practical. You write a script that solves the immediate problem, set up the CI job that gets the build running, and move on. This works for a while, but the cumulative weight of these one-off solutions eventually starts pulling the entire engineering team down.

The limits of ad hoc scripts and tribal knowledge

The first signs of trouble are inconsistencies. Team A’s services run on a slightly different base image than Team B’s because they were configured months apart. One developer’s local environment works perfectly, while a new hire spends their first week fighting with dependencies because a setup script is outdated. It is a direct tax on productivity.

These improvised and undocumented setups end up creating recurring problems. Staging environments start drifting away from production as manual changes accumulate. As a result, deploys become less predictable.

Engineers end up getting pulled into tasks that have no direct connection to the product, such as debugging infrastructure, manually provisioning resources, or investigating deployment failures caused by differences between environments.

Pipelines start depending on specific scripts run locally or CI configurations that are difficult to understand and maintain. And new engineers face a steeper learning curve, not only to understand the codebase, but also to figure out which tools and processes they need to use in order to work on the system.

When “you build it, you run it” hits a limit

The principle of “you build it, you run it” is a great way to create a sense of ownership. A product team is responsible for its service, from code all the way to production. In a small company, this works well. At scale, the problems start to show.

When you have five, ten, or twenty teams all following this principle on their own, duplication appears. Each team builds its own Terraform modules for the same S3 bucket configuration. Each team creates its own alerts for CPU usage. Each team writes its own deploy pipeline for a standard web service.

This model creates bigger problems. The company pays for the same infrastructure and CI/CD work repeatedly, solved in slightly different ways by each team. Without a centralized approach, ensuring that all services follow security best practices or compliance standards becomes almost impossible. The burden ends up falling on individual teams that may not have the required expertise. Product engineers are forced to become specialists in Kubernetes, cloud networking, and observability tools just to keep their features running. Their attention gets split between building the product and managing its operational details.

Platform engineering as an evolution of DevOps culture

The answer is to evolve the environment where developers work while keeping the principle of ownership. That is the idea behind platform engineering. It changes the question from “How do we help each team run their own infrastructure?” to “How do we provide a platform that makes running infrastructure simple and consistent for everyone?”

Going beyond tool-centric DevOps

Many organizations see DevOps only as a set of tools such as CI/CD servers, infrastructure-as-code files, and monitoring dashboards. A platform approach is more about offering shared services and clear workflows.

Instead of simply giving developers raw access to cloud provider tools, a platform offers a higher level of abstraction. A developer should not need to write a complex CI/CD pipeline from scratch. They should be able to add a simple configuration file in the repository that connects to a standard pipeline managed centrally.

This means treating your internal infrastructure as a product. Your developers are your users. The success of the platform is measured by how much faster and more reliably they can deliver value to real customers. Self-service is a huge part of this. A developer should be able to create a new testing environment or check service logs without opening a ticket and waiting for another team.

Redefining roles and responsibilities in a scaling DevOps culture

This shift changes team structure and responsibilities. A common and effective pattern is to create a dedicated platform engineering team. This team is different from a traditional operations team that only acts as a gatekeeper. Its main job is to improve the developer experience and make engineers more effective.

The platform team concentrates the operational knowledge of the system. They build and maintain the core infrastructure, the CI/CD systems, and the observability stack. They also provide the tools and services that product teams use every day.

This creates a collaborative relationship. The platform team builds the standard path for development, and product teams use that standard to move faster. Product teams remain responsible for their own services, but the heavy lifting of the underlying infrastructure is already solved for them. That way they can focus on business logic, while the platform team ensures the infrastructure is secure and reliable.

Building your internal developer platform

Creating a platform is an ongoing process of identifying what slows developers down and building solutions to fix it.

Defining platform capabilities

Start by identifying the most common needs and problems across your engineering teams. The first platform services usually cover a few core areas.

You can offer standardized CI/CD pipelines, with reusable templates or workflows that handle build, testing, security analysis, and deployment for common application types. This way the developer only needs to define what is specific to their service.

Another important area is environment management, offering a simple way to create and destroy development, testing, and staging environments that stay consistent with production.

It is also worth centralizing logs, metrics, and traces. Services can be instrumented automatically with standard configurations, giving teams immediate visibility without requiring a lot of manual setup.

Finally, provide a secure way to manage application secrets and adopt strong security practices by default, such as secure network rules and IAM policies.

Adopting a product mindset for internal tools

Your platform will only succeed if developers actually use it. You cannot simply build it and assume they will show up. You need to treat it like any other product.

That means collecting feedback constantly. Talk to developers, run quick surveys, and create open office hours to answer questions and understand how they work and where they struggle. Use that information to decide what to build next.

Prioritize initiatives that solve real and recurring problems for multiple teams, not just ideas that seem technically interesting. Whenever possible, try to quantify the impact. Ask whether a new tool will save each developer an hour per week or reduce deployment failures by 50%.

It is also important to measure success by tracking adoption of platform services and developer feedback. Look at metrics such as deployment frequency, lead time for changes, and mean time to recovery. A good platform should improve these numbers.

Principles for a platform team

The culture of a platform team determines whether it becomes a help or an obstacle. Some principles tend to work well for successful platform teams.

Focus on helping developers, not controlling them. The platform should provide a standard path, with strong tools and well-supported patterns, but it should also allow alternatives when a team has a good reason to do something differently.

Treat platform services as internal products with clear APIs, good documentation, and easy-to-find materials. Build something small that already delivers value, put it into use, and evolve it based on feedback from teams. The idea is to avoid long cycles trying to build a “perfect” solution that may not solve what developers actually need.

It is also important to communicate continuously. Announce new features, changes, and deprecations through internal demos, posts, or newsletters. Teams should always know what the platform offers today and what is changing.

Making the transition without causing disruption

The goal is to evolve infrastructure gradually without needing to stop product development for a year to rebuild it.

Start with the biggest problem, the thing that generates the most complaints or wastes the most time. It might be inconsistent local environments or the manual process for creating a new service. Solve that first.

For each new platform service you build, provide clear instructions and support to help teams migrate from the old improvised way of working. Sometimes that means creating tools that automate parts of the migration.

The first team that adopts your new CI/CD pipeline and cuts deploy time in half will likely become a strong advocate for the change. Results like that help build momentum and encourage other teams to adopt the new approach.

AI-Generated Code Requires a Different Code Review Process

Kody from Kodus — Thu, 26 Mar 2026 19:40:00 +0000

Code review for AI-generated code is different. A pull request can look syntactically perfect, pass all local tests, and still be wrong in a way that is hard to notice.

Our review habits, built over years of reading code written by other people, are not prepared for this. We are used to looking for logic errors or style issues. What changes now is that AI can generate hundreds of lines of code that look correct at first glance, but were built on the wrong assumptions.

This changes where the bottleneck in software development sits. Writing code is no longer the slowest part. Verifying what was generated is.

When a developer can generate large volumes of code, the reviewer’s job shifts from fixing mistakes to validating intent. The cost of a superficial review changes as well. It stops being a small bug and can become an architectural flaw, a security risk, or a performance issue that only appears in production.

The costs of trusting AI-generated code

The immediate productivity gains from AI are clear. What comes after is not always. We are starting to see new types of problems in code that looks correct at first glance, but was built with a fragile or incorrect understanding of the system.

Ignoring failures in AI outputs

AI-generated code often looks complete. It generates functions with docstrings, adds basic error handling, and follows the general syntax of the codebase. At first glance everything looks right, but important details may be missing.

The code is usually written for a generic problem, not for our specific operational context. It may also lack the extra checks that a more experienced engineer would normally add based on experience. For example, it might not consider the case where a downstream service returns a malformed object or times out under load, because those failures are specific to our system, not to the training data. The generated code might even include a `try/catch` block for a network failure, but it will not validate the payload of a successful but corrupted response.

Another common issue is the introduction of obscure or non-standard libraries. An AI model may solve a problem using a niche package it saw in training data, adding a new maintenance burden and a new security surface without the developer noticing. The code works, but now the team is responsible for a dependency they never chose.

Why current code review processes fail

Our code review practices were built around one assumption: there is a human author whose reasoning can be questioned. We review code by looking at logic and maintainability, trusting that the author has a mental model of the system. AI-generated code breaks that assumption.

The illusion of correctness

The biggest challenge is that AI code looks correct. Often it is cleaner and more consistent in style than code written by a junior developer. That aesthetic can put reviewers into a false sense of security. We look for bugs that are obvious, but miss the ones that are strategic.

An AI can generate a perfectly functional data transformation script. The reviewer confirms that it works with the sample data. What goes unnoticed is that the script loads the entire dataset into memory, a solution that works with a test file of 100 records but will crash the server when it runs against the production database with 10 million records. The code is not technically buggy, but it is operationally unviable.

This gets worse when the code contains many repeated sections. An AI can produce a 200-line controller that appears to follow the team’s REST patterns. Somewhere inside that code there may be a direct database query bypassing the service layer and its validation logic.

A human reviewer, seeing familiar patterns, may move quickly through the code and miss the architectural violation. There is no authorial intent to question, only an output to validate. You cannot trace the machine’s “reasoning” because it does not exist.

Ignoring security and performance regressions

AI models are trained on public code, including examples with known vulnerabilities and inefficient patterns. Because of that, they can repeat those solutions without evaluating the impact they introduce.

A 2021 Stanford study showed that developers using AI assistants were more likely to write insecure code than those who did not use them.

AI may suggest a deprecated encryption algorithm because it appeared in older training data. It may generate code vulnerable to a regular expression denial of service attack (ReDoS) by suggesting a complex regex pattern copied from a public forum. These are not simple mistakes. They are inherited vulnerabilities that linters and basic tests often fail to detect.

Performance regressions are also common and difficult to identify. AI tends to solve the immediate problem without considering the performance impact on the system as a whole. It may generate a solution that processes items in a list using nested loops, resulting in O(n²) complexity. This passes a unit test with 10 items but nearly stops the application with 10,000.

A human developer, with context about the system’s scale, would probably avoid this. AI does not have that context.

The same happens with error handling. In one function, the AI may use exceptions based on training examples. In another, it may return null or error codes, creating inconsistent behavior and a more fragile system.

Adapting code review for the AI era

To deal with these new risks, we need to shift the focus of code review from code correction to code verification. The question becomes: “Does this code do the right thing, for the right reasons, within the constraints of our system?” Every block of AI-generated code should be treated as if it came from a new developer who has no idea how your project works.

Prioritize intent over syntax

The review process needs to start before you look at the code. The first questions from the reviewer to the person who created the change should be: “What prompt did you use exactly?” and “What problem were you trying to solve?” This reframes the review around what the code is supposed to do.

First, verify whether the generated code actually solves the intended problem. It is common for AI to solve a similar but subtly different problem. The solution may be correct for the prompt but wrong for the business requirement.

Next, check whether the AI implementation fits into the system’s architecture. If the task was to add a simple validation rule, did it correctly modify an existing service, or did it generate a new class and fragment the logic?

Finally, mentally trace the data flow. In any non-trivial function, follow the data from input to output. What happens if an input is null? If a string is empty or contains unusual characters? If a network call fails? This deliberate tracing forces deeper analysis than a simple read-through.

A checklist for reviewing AI-generated code

To make this systematic, teams should adopt a verification-focused checklist for any pull request that contains a large amount of generated code. This moves the review away from subjective judgment and into a structured process.

Does it understand our business domain? AI has no domain context and will fill gaps with generic assumptions. For example, did it assume a user has only one email address when our system allows several?
Are the tests actually good? If the AI generated the tests, do they only cover the happy path? AI-generated tests are a starting point, but they rarely cover edge cases or failure modes specific to your system. The expectation for test coverage in AI-generated code should be higher, not lower.
Is it secure? Treat the code as untrusted input. Did it introduce new dependencies, and were they evaluated? Does it handle user input safely? Does it use approved cryptographic libraries?
Is it efficient? What is the algorithmic complexity of the generated functions? Does it access data inside a loop? Is it memory efficient? The reviewer is now also responsible for the performance analysis the generator ignored.
Can a human maintain this? Is the code easy to understand? Did the AI choose an algorithm that is more complicated than necessary? Is the code documented to explain why it works this way, not only what it does? The person who used the prompt needs to be able to explain the output. If they cannot, do not merge it.
Does it follow the project’s standards? AI does not know your specific architectural standards, preferred libraries, or error-handling strategies. The reviewer needs to make sure those standards are respected, checking whether the generated code integrates well with the rest of the system instead of introducing inconsistent logic.

This approach demands more from the reviewer. The reviewer’s role stops being just checking style or syntax. They need to understand whether the code actually makes sense within the system.

Otherwise, the codebase begins to accumulate subtle mistakes and problems that only appear later in production.

The speed of AI code generation is a major advantage, but it only works with strong code review processes.

How to Manage Dependencies and Packages in Large-Scale Projects

Kody from Kodus — Thu, 26 Mar 2026 10:35:00 +0000

When a single dependency update in one service causes a runtime failure in another, that is not an accident. It is what happens when lack of coordination becomes the norm. Any large project that treats dependency management as a local decision made by each team will eventually reach this point. The problem becomes a burden on the engineering team, showing up as build failures and hours spent debugging version conflicts instead of shipping features.

The impact of dependencies in large systems

How this affects engineering speed

Uncontrolled dependencies cost time. When Team A uses version 1.2 of a library and Team B’s service pulls version 1.3 through a transitive dependency, the build system may not flag any issue. At runtime, however, a small difference in the API can trigger NoSuchMethodError exceptions that are difficult to trace. Debugging this requires engineers to understand not only their own code, but also the dependency graphs of several loosely connected services.

This complexity slows down the build process. Resolving conflicting dependency trees is expensive for CI/CD pipelines. More dependencies mean larger artifacts. A service that should be a 50MB container grows to 500MB because it includes three different HTTP clients and two JSON parsing libraries, each with their own transitive dependencies. Every developer pays this tax on every build.

Security and compliance risks that go unnoticed

Every package you add is a new attack surface. A vulnerability in a library three levels deep in your dependency tree is just as exploitable as one in your own code. Without a central view, teams often do not even know they are using a vulnerable package until it is too late. The response turns into an emergency state where engineers run through multiple repositories trying to find affected services and apply patches.

License compliance is another blind spot. A developer might include a library without realizing its license conflicts with the company’s legal rules. Manually auditing licenses across thousands of dependencies is impossible. Without automated checks, you are exposed to legal risk, and fixing it later often requires expensive rewrites to replace the problematic dependency.

The case for more opinionated dependency management

Why individual package choices by teams cause problems

Giving every team full autonomy to choose their dependencies creates divergence that slows down the entire organization. When different teams pick different libraries for the same task, such as logging or database access, shared knowledge disappears. An engineer moving between teams has to relearn basic tools. Solutions developed in one part of the organization do not transfer.

Maintenance is usually the biggest problem. When a security flaw appears in a library used by several teams, each of them has to stop what they are doing, understand the patch, and deploy it. If a central library releases a new major version with breaking changes, this work repeats in every team.

This distributed effort is far less efficient than addressing the problem once, in a coordinated way by a platform team.

How to give autonomy without breaking system stability

We need to frame the question of developer choice more carefully. Instead of thinking only about which package solves the problem fastest, the team also needs to consider which option is more stable for the system. Autonomy means being able to ship features without worrying that the platform will behave unpredictably.

The short-term convenience of adding a new dependency without review creates a long-term cost paid by everyone. That cost appears as weekend hours responding to incidents or weeks of work fixing a vulnerability. Adding a bit of friction at the start, such as requiring new dependencies to be evaluated, prevents much larger problems later. You are choosing the stability of the whole system instead of the local optimization of a single team.

Establishing a dependency governance model

Choosing approved packages and versions

A good governance model starts with an approved list of libraries for common tasks. Instead of ten teams choosing ten logging libraries, you standardize one or two. This list works as the default path for development. It provides clear, supported options that have already been evaluated for security and license compliance.

This list should also come with clear versioning guidelines. For example, you might decide that all services must use the same minor version of a framework to avoid compatibility problems. This can be enforced using internal package registries or mirrors. They act as an intermediary between your developers and public repositories, allowing you to host validated versions of packages. This creates a central control point that prevents unapproved packages from entering the system.

Automating security and compliance checks

Human review does not scale, so your dependency policy needs to be automated. Integrate vulnerability and license scanners directly into your CI pipeline. A build should fail if it introduces a dependency with a known vulnerability or an incompatible license.

This automation turns security and compliance into part of the development cycle. Developers receive immediate feedback on pull requests and can fix issues before merge. Security debt does not accumulate, and the policy becomes a mandatory check for everyone instead of a document that nobody reads.

Strategies for managing transitive dependencies

The dependencies you declare are only a small part of the story. Transitive dependencies, the packages your dependencies rely on, make up most of your node_modules folder. Managing them is essential to maintain stability.

You should pin exact versions in lockfiles. Every package manager generates a file such as package-lock.json or yarn.lock that records the exact version of each dependency. Committing this file to the repository makes every build reproducible, whether on a developer’s laptop or on the CI server. This eliminates “works on my machine” problems.

You also need an explicit override policy. Sometimes a transitive dependency has a vulnerability, but the direct dependency has not yet been updated to fix it. You need a way to override the version of that transitive dependency. This should be a temporary and documented fix that you track and remove once the main package is updated.

Finally, regularly audit the entire dependency graph. This can reveal excessive packages and show where libraries can be consolidated. It also gives a high-level view of the third-party surface of your system.

Practical approaches for managing dependencies at scale

Tools and practices for large projects

In a monorepo, use a package manager that enforces a single version of any dependency across all projects. This makes version conflicts impossible by definition. If one project needs to upgrade a library, all other projects that use it are upgraded at the same time, forcing a single coordinated change.

For any type of repository, automated dependency update bots can reduce the work required to keep packages up to date. These bots open pull requests to update dependencies and usually include release notes. This turns updates into a simple task of “review and merge”, preventing your dependencies from becoming dangerously outdated. To add new dependencies, a review process with a platform team can ensure that new additions follow your standards and do not introduce risks.

Building a shared dependency policy

Having these rules documented makes your approach clear. This document should explain the “why” behind the decisions and be updated as things change.

Your policy should define how upgrades are handled. You might decide to apply minor and patch updates quarterly, while planning major version upgrades as separate projects. This creates a predictable rhythm for maintenance.

It should also define deprecation paths. When a standard library is replaced, give teams a clear deadline for migration, along with documentation that helps with the process. Set a firm date to remove the old library from the approved list.

The policy also needs to cover incident response. When a zero-day vulnerability like Log4Shell is announced, what happens? The policy should specify who evaluates the impact and who coordinates the remediation effort. Having that plan before you need it helps a lot.

How to implement DevOps without creating more complexity

Kody from Kodus — Wed, 25 Mar 2026 21:40:00 +0000

Most large DevOps projects fail to deliver results. They usually start with a new tool or with a top-down directive to “be more agile,” but they rarely solve the real problems that are slowing down software delivery. Teams end up with complex CI/CD pipelines that only automate a broken process, or with Infrastructure as Code that provisions inconsistent environments. Instead of faster and more reliable releases, everyone just ends up frustrated. Everything goes wrong because people start from the solution instead of starting from a specific and expensive problem.

The mismatch in adopting new practices

The desire to adopt a new tool is often the starting point. A platform team builds an internal deployment system, only to discover that product teams won’t use it. The reason is usually simple. The new tool solves the platform team’s problem, but creates new problems for the product team. Maybe it requires a major rewrite of the application’s deployment logic or imposes a workflow that simply doesn’t fit how the team operates.

That is why migrations done all at once usually fail. Forcing every team to adopt a new CI system or a standard Kubernetes platform by a specific date almost always creates resistance. Teams with stable legacy systems are pushed to do high-risk work with low return. Teams with tight deadlines see the directive as a distraction. If they don’t see an immediate benefit to their own work, the new system is just extra overhead.

Many times we also measure success in a way that is disconnected from reality. Deployment frequency is a popular metric, but it can be misleading. A team might deploy 20 times a day, but if the lead time from commit to production is still five days because of slow manual QA and long review cycles, the real bottleneck is still there. You only sped up the final, automated step. Real improvement comes from measuring lead time, change failure rate, and mean time to recovery (MTTR), which show the health of the delivery process as a whole.

Why good intentions fail: understanding resistance

When people resist change, there are usually good technical or organizational reasons behind it. No one is against something new just for the sake of opposing it.

Established workflows are hard to change. A senior engineer who knows exactly how to manually deploy a critical service tends to see a new automated system as a risk. The current process, even if it is slow, is predictable. A new pipeline that the team does not fully understand yet can fail in ways that are hard to diagnose. The resistance comes from the need for stability.

Skill gaps are another major obstacle. You cannot ask a backend team that has always depended on a central ops team to suddenly start writing and maintaining its own infrastructure configuration. That requires training, time to learn, and a manager willing to accept a few mistakes along the way. Without that time and support, teams will go back to the old methods because they are faster and safer in the short term.

A lack of clear leadership sponsorship can also kill any new initiative. If engineering managers do not protect the team’s time to learn and adapt, this work will always be pushed aside in favor of features. When leadership celebrates feature releases but ignores engineering improvements, the message is clear. The initiative dies from neglect.

Focus on outcomes, not practices

Instead of saying “We need to adopt Infrastructure as Code,” ask: “What is the most expensive problem in our delivery process?” Cost is not just money. It is developer time, delayed releases, and production incidents.

Look for places where a small change can generate a large impact.

An unstable end-to-end test environment that takes hours to spin up might be a bigger problem than deployment speed. Fixing that can free up more developer time than any new CI tool.
A manual database schema migration process that requires coordination between three people is an obvious bottleneck. Automating that single step might be the most valuable project you can take on.

This way, engineering work becomes tied to something the business actually values. Reducing the time to fix a production bug from four hours to 15 minutes is a clear win for everyone. It is a much simpler conversation than discussing the abstract benefits of a specific tool.

Define and measure success

To get buy-in for this type of work, you need to connect the initiative to measurable metrics. Before starting anything, establish a baseline.

If you cannot measure the problem, you cannot prove that you solved it.

For a problem like “Staging environments are always broken and out of sync with production”:

Baseline: It takes two days to provision a new staging environment. We receive 25 support requests per month related to staging issues.
Goal: A developer can provision a new production-like environment in less than 30 minutes. Staging-related requests drop by 90%.

For a problem like “Hotfixes for critical bugs take hours to reach production because of manual tests and release checklists”:

Baseline: Our mean time to recovery (MTTR) for P0 incidents is 4.5 hours.
Goal: We can get a hotfix into production within 20 minutes after the code is merged. Our MTTR drops below one hour.

When you communicate these numbers to stakeholders, an internal engineering project starts to be seen as a visible business improvement. The conversation stops being about cost and becomes about investment.

A step-by-step approach to improving

A successful rollout is a sequence of small wins, not a single massive project. First, understand where your teams are today. Some may have excellent CI setups, while others still deploy manually via FTP. A single plan for everyone will fail. The idea is to find the biggest bottleneck for a team or service and solve that point. Then you find the next bottleneck.

A simple way to move forward

Here is a path to get started.

Step 1: Find the biggest source of delay.

Sit down with a team and map the entire process from commit to production. Where does work get stuck? Waiting for code review? A QA environment? Manual approval from another team? Identify the biggest waiting time. For example, a team may realize that their two-week sprints are always delayed because getting a new database instance from the DBA team takes, on average, four days. That is where you start.

Step 2: Define a specific and measurable goal.

Based on the delay you identified, define a clear outcome. Using the database example, the goal might be: “Any developer on the team can provision a new database for testing in less than 10 minutes without opening a ticket.”

Step 3: Choose the smallest change that works.

What is the simplest tool or process change that achieves the goal? Maybe you do not need a full self-service cloud platform. The first step could be a set of standardized and versioned scripts, reviewed and approved by the DBA team. This moves the process from a manual ticket-based flow to an automated code-based flow. That is Infrastructure as Code used as a solution to a specific problem, not as an end in itself. In the same way, you can introduce CI simply by automating the unit tests that everyone should already be running locally.

Step 4: Run a pilot with one team.

Choose a team that feels the pain and is willing to experiment. Do not start with the most critical system or the most skeptical engineers. You want a quick win to learn from the process and generate momentum. That pilot team becomes your first success case.

Step 5: Measure, learn, and repeat.

After the pilot, go back to the baseline metrics.

Did you reach the goal?

Did the change create new problems?

Maybe the self-service database scripts worked, but now developers forget to deprovision them and costs are rising. That is just a new problem to solve. This feedback cycle is what actually drives improvement over time.

How to maintain progress

As more teams adopt new practices, the risk of fragmentation appears. If every team builds its own deployment pipeline or writes its own infrastructure modules, you create a maintenance nightmare. This is where some patterns come in.

The goal of governance is to make the right way the easiest way.

This usually becomes the responsibility of a platform team or internal engineers who build improvements for others to use.

Shared pipeline templates: Provide preconfigured CI/CD templates for common application types (such as a Go backend or a React frontend). A team can have a secure and efficient pipeline running in minutes instead of weeks.
Reusable infrastructure modules: Create a library of versioned IaC modules for standard resources such as databases, caches, and load balancers. This ensures consistent and security-approved configurations.
Clear ownership: Define who is responsible for each part. Does the product team own the application pipeline end-to-end? Does the platform team own the build infrastructure? Lack of clarity about responsibilities leads to systems that no one maintains.

This approach gives teams the freedom to move quickly using the standard paths, while the platform team ensures stability for the entire organization.

You avoid both the chaos of everyone doing their own thing and the bottleneck of a central Ops team that has to approve every change. The only way to keep this working is to keep listening to what teams need and constantly improve the standard paths they use in their day-to-day work.

Platform Engineering Best Practices for Building Internal Platforms That Scale

Kody from Kodus — Wed, 25 Mar 2026 16:32:32 +0000

Many companies see the cost of their internal developer platform skyrocket. The goal of Platform Engineering is to unify tools and accelerate delivery, but things rarely go as planned. Soon, teams start struggling with inconsistent tools. Some give up and keep their own customized setups, in a silent resistance to the platform. Feature delivery practically grinds to a halt, even with all the money invested in the central system. This is what happens when a platform creates more problems than it solves.

When Platform Engineering initiatives stall

The first sign of trouble is when the platform, which was supposed to be an accelerator, starts slowing everyone down. The failure is usually in the concept, not the code. The platform engineering team often makes decisions that seem to make sense at first, but turn into problems over time.

The dilemma of the product mindset for internal platforms

People love to say that you should apply a "product mindset" to an internal platform, but that advice often backfires. The team starts building for imaginary use cases instead of observing how developers actually work. They create complicated features based on what they think developers should want, while ignoring what they clearly need.

A platform built in isolation reflects the platform team’s ideal version of development, not the way the rest of the company actually builds software. You end up with a tool that solves theoretical problems while ignoring the repetitive day-to-day tasks developers deal with. The platform may offer a perfect one-click deploy for a certain type of service, but if 90% of the company’s services don’t fit that model, the effort was wasted.

Abstraction debt and rigid design

Good platforms use abstraction to reduce complexity. Bad platforms use abstraction to hide important decisions. When abstraction goes too far, it hides exactly the details engineers need to debug and tune performance.

The developer tries to understand why the service is slow but cannot access infrastructure configurations, network rules, or resource limits. They have no visibility into what is really happening.

This rigidity traps teams. If someone needs a database version that is not offered by the platform, or a specific sidecar for observability, there is nowhere to go. The platform’s design blocks basic technical choices and they lose the ability to manage their own services. They become completely dependent on the platform team even for small changes, which turns that team into a permanent help desk.

Practical ways to build a good platform

To avoid these stalls, the approach needs to change. Stop building a monolithic product and start providing a layer that helps developers do their work.

Focus on developer experience and workflow

A useful platform is a usable platform. The focus needs to be on the main developer workflows, or “journeys.” First, map the most common and critical tasks. This might include scaffolding a new service, running tests in a CI environment, deploying a change to staging, or accessing production logs. These are the flows that should be simplified first.

Then measure what really matters. Track adoption rates of platform components, but also keep an eye on developer satisfaction. Simple surveys or regular office hours provide direct feedback on what is working. If adoption is low, find out why developers are choosing other tools. The goal is for developers to be able to do their own work without opening a ticket. For that, you need clear documentation, well-defined standards, and interfaces such as API, CLI, or UI that allow them to create resources and access metrics without depending on another team.

The platform as a support layer

A good platform does not solve every problem. It solves common, undifferentiated problems so development teams do not have to. It should feel less like a restrictive system and more like a set of prepared paths.

That means standardizing things like Kubernetes clusters, IAM roles, or VPC networks. The platform provides clear boundaries and well-chosen defaults, giving developers a safe and efficient starting point. It offers a standard set of tools for CI/CD, observability, and secrets management, so teams do not have to research and configure everything from scratch.

But it also provides escape hatches for teams with specific needs.

Iterative development and continuous feedback

All-at-once platform launches almost always go wrong. They arrive late, over budget, and when they finally ship, the problems they were meant to solve have already changed.

A better approach is to launch MVPs. Deliver the smallest possible improvement that creates value for a small group of developers.

Create clear communication channels, such as a dedicated Slack channel or user forums, to collect immediate feedback. This feedback loop should drive priorities. The platform roadmap should come directly from the needs of your internal customers, not from a big predefined vision.

Breaking down common anti-patterns

Recognizing and actively dismantling bad habits is just as important as adopting good practices.

The idea that “if you build it, the team will use it”

A platform engineering team cannot simply launch a new tool and expect it to be adopted. This kind of thinking comes from a lack of internal communication and onboarding. Often, after the initial launch, user feedback is ignored while the platform team moves on to the next feature without checking whether the first one was actually useful. Adoption requires ongoing effort, clear documentation, and a simple explanation of the value for development teams.

Monolithic platforms and dependencies

Building the entire platform as a single system creates a huge single point of failure and limits your technological choices. If the whole deploy system is tightly coupled to a specific CI vendor, switching becomes almost impossible.

A better design uses loosely coupled components with well-defined APIs. This allows individual parts of the platform to be updated or replaced without disrupting everything else, which reduces the cost of swapping any isolated component.

The Platform Engineering team as a support desk

When a platform lacks self-service capabilities or has unclear boundaries, the platform team becomes a support queue. They spend the day handling operational tasks such as provisioning access, debugging application-specific deploy issues, or manually configuring resources.

This work consumes all their time, preventing improvements to the platform itself. It is a vicious cycle: a difficult platform generates more support requests, which leaves less time to make it easier to use.

A way to guide platform team decisions

To stay on track, a platform team needs a simple way to guide its choices.

Assess developer needs, not just technical specifications

Start with user research. Talk to developers. Map their current workflows, from local machine to production.

The goal is not to ask which features they want, but to observe what slows them down.

Prioritize work based on a combination of impact (how much time an improvement would save) and frequency (how many developers face this problem).

Define clear boundaries and responsibilities

Be explicit about what the platform provides and what development teams are expected to own. This contract prevents confusion and finger-pointing.

Platform responsibility: The platform engineering team may be responsible for the Kubernetes control plane, CI runner infrastructure, and base container images.

Development team responsibility: The development team is responsible for application code, its dependencies, pipeline configuration, and production monitoring.

Define clear service level objectives for platform components. If the platform provides a shared database, what are its availability and latency guarantees?

Finally, create clear ways for teams to contribute to or extend the platform. An inner source model can be a powerful way to scale platform development and ensure it meets diverse needs.

Measure value through adoption, efficiency, and satisfaction

A platform’s success comes from its impact, not its technical complexity.

Adoption: Track usage metrics of platform components. How many services are using the standardized CI pipeline? How many teams have migrated to the new logging system?

Efficiency: Quantify time saved. This can be measured through metrics such as “commit-to-production time” or by calculating the reduction in time spent on manual operational tasks.

Satisfaction: Run regular surveys with your users. A simple Net Promoter Score (NPS) or a more detailed survey can provide useful qualitative data about where the platform is succeeding or failing.

A successful internal platform is not the one with the most features. It is the one developers voluntarily choose to use because it makes their work simpler and faster.