Forem: Sunil Kumar

Agentic AI in 2026: Why Your Copilot Is Already Obsolete (And What Comes Next)

Sunil Kumar — Fri, 22 May 2026 06:08:58 +0000

Disclosure: I work at Ailoitte, which runs AI Velocity Pods — the delivery model referenced in this post.

There's a moment in every technology shift when the old metaphor stops working. We called them "copilots" because AI was in the passenger seat, suggesting routes while humans drove. That metaphor expired sometime around Q1 2026.

Today, the AI isn't in the passenger seat. It's running parallel routes, stress-testing the suspension, and filing the route report — while you decide where to go.

This is the agentic shift, and it's moving faster than most engineering teams realise.

What the 2026 data actually says

Anthropic's 2026 Agentic Coding Trends Report and GitHub's engineering data tell a coherent story:

46% of all code on GitHub is now AI-generated
Gartner projects this reaches 60% of all new enterprise code by the end of 2026
Global Git pushes increased 78% year-over-year — teams are shipping dramatically more
Only 17% of organisations have deployed AI agents in production (Gartner 2026 CIO Survey)
60%+ of organisations plan to deploy agents within two years — the steepest adoption curve Gartner has ever measured

The gap between "has deployed agents" and "plans to deploy agents" is the most interesting tension in software right now. Most organisations are still running copilot-era workflows while trying to benefit from agent-era speed.

Copilot vs. agent: the actual difference in practice

The distinction matters more than the terminology suggests.

Copilot mode:
Developer writes function → AI suggests autocomplete → Developer accepts/rejects
Developer encounters bug → AI suggests fix → Developer applies manually

Agent mode:

The developer defines the objective + constraints
Agent: researches codebase context → plans implementation → writes code
→ runs tests → identifies failures → iterates → submits diff for review
Developer: reviews, steers, approves

The unit of work shifts from line to task. The human's role shifts from writer to orchestrator.

This isn't theoretical. Tools like Claude Code, Cursor's background agents, and Devin-style systems are running multi-file, multi-step changes with test validation loops today. The question is how your team's workflow adapts.

Three failure modes teams hit in the transition

1. Adopting agents without redesigning review processes

Agent-generated PRs are larger and faster than human PRs. A review process designed for 50-line diffs doesn't scale to 500-line agentic commits. Teams that don't adapt their code review cadence get bottlenecked on the one thing they didn't automate.

2. No guardrails on agent scope

Agents with write access and no scope constraints will "solve" problems you didn't ask them to solve. Security boundaries, branch permissions, and explicit task scoping aren't optional — they're the whole architecture.

3. Measuring the wrong thing

Velocity metrics designed for human-written code (story points, lines of code) become meaningless when agents are in the loop. Teams that don't shift to outcome metrics — features shipped, bugs caught before prod, customer impact — lose visibility into whether agentic workflows are actually working.

What effective agentic engineering looks like

The teams getting this right share a pattern. They're not just plugging agents into existing workflows — they're redesigning the loop.

The key architectural insight: agents are fastest when they have the clearest constraints. Ambiguity that a human engineer navigates through intuition becomes a token-burning loop for an agent. Invest time up front in scope definition, and agents will return it tenfold in execution speed.

A concrete example of this working in practice: automated regression agents running continuously against every PR, catching regressions before human review. This structure removes a 2–3 day QA cycle from every sprint — without adding headcount.

What engineers should actually be learning in 2026

The most useful skills aren't new programming languages. They're:

Agent orchestration — how to design multi-agent workflows with clear handoff points and fallback logic.

Prompt engineering for task decomposition — breaking product requirements into agent-sized tasks with unambiguous success criteria.

Agentic security — understanding the attack surface that comes with agents that have write/execute permissions.

Outcome-based thinking — measuring engineering work by shipped value, not hours spent.

Gartner's prediction that 80% of large software engineering teams will restructure into smaller, AI-augmented units by 2030 isn't a threat, it's a description of what already happened at the fastest-moving product teams.

The honest answer: this is still early

Only 17% of organisations have deployed agents. The tooling is moving fast; best practices are still crystallising. Teams that experiment now, document what works, and build internal expertise in agent governance will have a significant advantage in 18–24 months.

The copilot era gave everyone a speed boost. The agent era will separate teams by how well they've redesigned their systems around AI execution.

That's the actual opportunity.

Building with agentic workflows? What's been the biggest friction point: tooling, team buy-in, or review process redesign? Drop it in the comments.

AI Velocity Pods: How Small Agentic Teams Are Outshipping Large Dev Orgs in 2026

Sunil Kumar — Thu, 21 May 2026 07:24:06 +0000

Introduction
For a decade, the software industry defaulted to a simple equation: more developers equals more output. Hire faster, scale headcount, ship more.

That equation broke in 2026.

Anthropic's 2026 Agentic Coding Trends Report reveals that engineering organisations treating agentic AI as a platform program — rather than an individual productivity tool — see roughly 2–3x the measurable productivity lift of those that leave adoption to individual developers. Gartner independently projects that 40% of enterprise applications will embed AI agents by year-end, up from less than 5% in 2025.

The old model — large specialist teams, siloed workflows, quarterly delivery cycles — isn't scaling into the agentic era. A new structure is emerging: the AI Velocity Pod.

What is an AI Velocity Pod?

An AI Velocity Pod is a small, cross-functional engineering unit — typically 3–6 humans — that governs a team of specialised AI agents across the full software development lifecycle, from requirements and architecture through build, QA, and deployment.

The human roles inside a pod shift dramatically from traditional teams:

Pod Lead / Architect: Defines intent, system guardrails, and outcome criteria. The most critical human role.
Domain Expert: Provides context the AI can't infer — business logic, regulatory constraints, user nuance.
Review Engineer: Validates agent output at key checkpoints; approves diffs, not lines.
QA Orchestrator: Manages agentic test pipelines rather than writing test cases manually.

The AI agents handle first-draft code generation, multi-file refactoring, test suite generation, documentation, code review comments, and deployment configuration.

Why pods beat larger teams on speed AND quality

Counter-intuitively, the smaller-pod model produces better output than large traditional teams — for three structural reasons.

1. Parallelisation without coordination overhead

In a 20-person team, coordination burns 30–40% of available engineering hours — standups, PR review queues, knowledge transfer, dependency management. AI agents running in parallel within a governed pod eliminate most of this friction. Agents don't block on each other's schedules.

2. Senior judgment concentrated, not diluted

Large teams necessarily hire mid-to-junior engineers to fill headcount. In pods, every human is a senior decision-maker. Junior-level execution moves to agents, which in 2026 generate 46% of all code on GitHub (per GitHub's internal data) with hallucination rates that have fallen from 18.5% in 2024 to 4.6% today.

3. Agentic QA catches regressions immediately

Traditional dev teams run QA in cycles — often days after code is written. Agentic QA pipelines run continuously, testing against intent at commit-level. Bugs caught in 4 minutes vs. 4 days changes the entire velocity calculation.

How to start restructuring your team around pods

If you're moving from a traditional team to a pod model, here's the sequence that actually works.

Phase 1: Audit current workflow for agent-replaceable tasks

Code generation (first drafts, boilerplate, migrations)
Test case generation
Documentation
Code review comments on style/pattern violations

Phase 2: Designate one senior engineer as Pod Lead
Their job shifts from "coding" to "defining intent + reviewing agent output."

Phase 3: Stand up agentic QA before agentic development
You need the safety net before you increase velocity.

Phase 4: Run one project with 60% fewer junior engineers
Measure: ship time, defect rate, rework rate.

Phase 5: Scale the model

The bottleneck in most organisations isn't access to AI tools — it's the operating model that deploys them. Individual developer AI adoption produces linear gains. Structured pod-based orchestration produces compounding gains.

What this means for engineering leaders

The decision facing engineering leaders in 2026 isn't "should we use AI?" It's: "Are we deploying AI as a tool or as a team?"

Organisations that answer "team" — and restructure accordingly — are compressing 4-month roadmaps into 6-week sprints. Those still treating AI as an individual productivity layer are gaining 20–30% efficiency and calling it a transformation.

The gap between the two will only widen.

Has your team started experimenting with any form of agent orchestration — or are you still in the individual-tool adoption phase? Curious what the actual blockers are for engineering leaders considering this shift.

Agentic Coding in 2026: Why AI Copilots Are Being Replaced by AI Orchestration

Sunil Kumar — Wed, 20 May 2026 05:49:13 +0000

For the past two years, "AI-assisted development" meant one thing: a smart autocomplete that finished your lines and suggested function signatures. GitHub Copilot, Tabnine, Codeium — great tools. But they were fundamentally reactive. You still drove every step.

That model is being replaced — fast.

In 2026, the leading engineering teams aren't using AI to write faster. They're using AI agents to think at a different altitude entirely.

What's Actually Changed

The numbers tell the story clearly:

84% of developers are using or actively planning to use AI tools in their development workflow (2026 survey data)
46% of all code on GitHub is now AI-generated, with Gartner projecting 60% by year-end
Multi-agent system inquiries surged 1,445% from Q1 2024 to Q2 2025 (Gartner), representing a fundamental shift in how teams think about automation

The key shift: from copilots (reactive, single-step assistants) to agents (autonomous, multi-step executors that research, write, test, and iterate with minimal human intervention per cycle).

An agentic coding workflow looks something like this:

# Traditional AI-assisted workflow
developer: "write me a function that validates email"
copilot: [suggests function body]
developer: reviews, accepts, moves on

# Agentic workflow
developer: "Implement the full user onboarding flow — validation, welcome email trigger, analytics events, and tests."
agent: [reads codebase → writes feature → runs tests → fixes failures → opens PR with description]
developer: reviews PR, merges or redirects

The human is still essential, but operating at the level of intent and review, not keystroke-by-keystroke implementation.

The New Developer Skill: Orchestration

This is where it gets interesting (and where many teams are struggling).

Writing code well is no longer the differentiator. The new leverage point is AI orchestration: the ability to decompose a complex outcome into well-defined agent tasks, validate outputs at the right checkpoints, and catch the specific failure modes that agentic systems introduce.

And there are real failure modes. Gartner has projected a 2,500% increase in generative AI software defects in 2026. The teams that win aren't just shipping faster, they're building governance layers: automated QA pipelines, output validators, and structured review protocols that catch AI-generated errors before they reach production.

This is the area where engineering maturity matters most right now.

What "Agentic QA" Actually Looks Like

One pattern we've seen work well in production — and that we've refined through 300+ shipped products at Ailoitte — is pairing agentic code generation with an agentic QA layer.

Instead of human testers running test cases after the fact, the QA pipeline runs in parallel with the build:

Intent capture — the engineer specifies what "done" looks like (acceptance criteria, edge cases, security boundaries)
Agent build — code is generated and iterated against the spec
Agentic QA sweep — a separate agent family runs OWASP checks, regression tests, and functional validation
Diff review — a senior engineer reviews the validated diff, not the raw code

The result is dramatically compressed review time and fewer production incidents. You can read more about how Ailoitte's Agentic QA Pipeline works in practice.

The Governance Problem No One's Talking About

Most of the 2026 agentic coding conversation focuses on speed. Less discussion happens around governance, and this is where serious engineering teams differentiate themselves.

Key governance questions for teams adopting agentic workflows:

Access scope: What systems can agents read/write to? (Incredibuld's new Islo sandbox addresses exactly this problem)
Audit trails: Can you trace every agent action for compliance or debugging?
Model switching: If your primary coding model changes or regresses on a task type, can you swap it without rewriting workflows?
Cost attribution: Who on the team is spending what on model inference, and is it mapped to business outcomes?

The teams investing in these questions now will have a massive structural advantage in 12–18 months.

Where to Start

If your team is early in this transition, a few practical starting points:

Run a bounded pilot: Pick one internal tool or non-critical feature and run a fully agentic sprint. Measure actual time vs estimate.
Instrument the QA layer first: Before scaling agentic generation, build the validation layer. You need the safety net before the speed.
Separate planning from implementation agents: Models that plan well often don't implement well (and vice versa). Multi-model workflows outperform single-model all-in approaches.
Define "done" more precisely than you ever have: Agentic systems are only as good as the acceptance criteria they're given. Garbage spec in, garbage code out.

The shift from Copilot to orchestration isn't a productivity upgrade. It's a fundamental change in what it means to be a senior engineer. The teams that are building this muscle now, in real production contexts, not just demos, are compounding an advantage that will be very hard to close later.

What's your team's current state on this? Running fully agentic sprints, or still in the copilot-assisted phase? Would love to hear what's working (and what isn't) in the comments.

Ailoitte is an AI-native product engineering company that has shipped 300+ products across 21 countries using AI Velocity Pod methodology — small elite teams paired with governed agentic workflows. Learn more at ailoitte.com.

Agentic AI Coding Teams in 2026: Why Small Pods Are Outshipping Large Engineering Orgs

Sunil Kumar — Tue, 19 May 2026 06:22:49 +0000

Something quietly seismic happened in software engineering between 2024 and 2026: the AI copilot, that helpful autocomplete sitting in your IDE, evolved into something closer to an autonomous engineering team.

Anthropic's 2026 Agentic Coding Trends Report quantified what many practitioners were already feeling: AI now writes 46% of all code on GitHub, with Gartner projecting 60% by the end of 2026. 84% of professional developers reach for AI tools every working day. But the more interesting signal isn't usage rates, it's the structural change happening to engineering teams.

Large engineering orgs are being replaced by small, highly coordinated pods. And the pods that are winning aren't just using AI, they're orchestrating it.

What "agentic" actually means in a dev team context

The word "agentic" gets thrown around loosely, so let's be precise.

An agentic AI coding workflow is one where the model runs a loop autonomously:

Read — ingests codebase, tickets, and context
Plan — decomposes the task into sub-steps
Implement — writes, edits, or refactors code
Test — runs tests, lints, checks coverage
Iterate — fixes failures without human prompting
Report — surfaces what it did and flags decisions for human review

In 2023, "AI for coding" meant autocomplete. In 2025, it meant chat-based pair programming. In 2026, it means an agent running full sprints while a human engineer focuses on architecture decisions and output review.

The difference isn't just speed — it's the cognitive load that shifts. Human engineers are becoming orchestrators of intelligent systems rather than writers of individual functions.

The pod model: how small teams are outshipping large ones

Here's the pattern emerging across high-performing engineering organisations in 2026.

Traditional model (2022):

12–18 engineers
Sprint-based, story points
1 QA engineer per 3 devs
Average cycle time: 90–120+ days from spec to production

AI Velocity Pod model (2026):

3–5 senior engineers
Each engineer orchestrates 2–4 AI agents (architecture, implementation, QA, security review)
Agents work asynchronously, including while the team sleeps
Average cycle time: 30–45 days from spec to production

A small team operating this way ships at the pace of a company three to five times its size. One organisation cited in the Agentic Coding Trends Report saved over 500,000 engineering hours through AI agent integration in a single year.

The bottleneck has moved. It's no longer "can we write this code?" It's "can we define, govern, and review what the agents produce?"

What governed AI workflows actually look like

The word governed is key. Ungoverned agentic AI produces technical debt at scale. Gartner has projected a 2,500% increase in AI-generated software defects — and teams running agentic workflows without guardrails are already hitting this wall.

High-performing pods in 2026 structure their AI workflows with explicit constraints:

Scope guards: Agents are given explicit codebase boundaries, they can't touch modules outside their remit
Test gates: No agent output ships without automated test coverage above a defined threshold
Review checkpoints: Human engineers review agent decisions at architecture inflection points, not every line
Security alignment: OWASP and dependency checks run automatically as part of every agent loop

At Ailoitte, our AI Velocity Pods were built around this principle: governed AI workflows, not raw AI speed. The distinction matters. Raw AI speed produces 4× more code duplication (documented in recent engineering analyses). Governed AI velocity produces clean, production-ready code on 38-day cycles.
We apply this model across mobile development, enterprise platforms, and our agentic QA pipeline — and it's why we've shipped 300+ products across 21 countries without sacrificing code quality for speed. Our ISO 27001 + ISO 9001 certifications and OWASP-aligned workflows aren't compliance checkboxes; they're the governance layer that makes agentic scale safe.

What engineers should be learning right now

If you're an individual engineer, the highest-leverage skill shift in 2026 isn't learning a new framework; it's learning to orchestrate.

Specifically:

Prompt architecture for multi-step tasks — how to break work into agent-friendly sub-tasks with clear inputs, outputs, and failure conditions.

Agent evaluation and review — reading AI-generated code critically, not just trusting it because it compiles.

System design at higher abstraction — since agents handle implementation details, humans need stronger system-level thinking.

LLM tool selection by task type — not every task needs frontier-model reasoning. Fast 7B local models handle autocomplete at <200ms latency; powerful models handle architecture review. Matching model to task is now a core engineering skill.

The engineers who'll thrive in 2026 are the ones who treat AI agents like junior engineers they're responsible for, not magic that removes their own judgment.

Closing thought

The 2026 Agentic Coding Trends Report isn't a forecast anymore; it's a field report. The teams that have already restructured around small, AI-orchestrating pods are pulling ahead. The organisations still measuring productivity in story points and headcount are about to feel a competitive gap they don't yet have language to describe.

The pod model isn't a cost-cutting tactic. It's a fundamentally different theory of how engineering work gets done.

What does your current team structure look like, and have you started experimenting with agent orchestration? Drop your experience below.

The Hourly Billing Trap: Why Outcome-Based Software Development Wins in 2026

Sunil Kumar — Mon, 18 May 2026 07:34:04 +0000

There's a misalignment baked into most software development contracts, one that nobody talks about openly.

When an agency bills by the hour, its revenue goes up when your project takes longer. When they hire more people, their revenue goes up. When there are bugs to fix, scope creep, and re-planning meetings, their revenue goes up.

Your incentives and theirs are pointing in opposite directions.

How We Got Here

Hourly billing became the default because estimating software complexity is genuinely hard. Nobody could reliably say "this will cost exactly $X", so billing for time spent felt like the safe, transparent option.

But "transparent" and "aligned" are two different things.

A transparent billing model shows you exactly how many hours were spent. An aligned model means both sides benefit from the same outcome: shipping fast, shipping clean, shipping right.

What Changed in 2026

Two things shifted the calculus:

1. AI-accelerated development collapsed traditional time estimates

Work that took a senior developer a week now takes an AI-augmented engineer a day. If you're still billing hourly against old benchmarks, someone is capturing enormous arbitrage — and it isn't the client.

According to Anthropic's 2026 Agentic Coding Trends Report, framework adoption for agentic coding nearly doubled YoY. Multi-agent coordination is compressing delivery timelines to a fraction of what they were 18 months ago.

2. Outcome clarity is now achievable

Better tooling, better scoping practices, and AI-assisted estimation make fixed-scope delivery far more reliable than it was five years ago. The excuse of "too complex to estimate" is holding up less often.

The Real Risk of Fixed-Price — and How to Handle It

Fixed price isn't risk-free. Done wrong, it either:

Leaves the client with a rigid contract that doesn't flex when requirements evolve
Leaves the vendor cutting corners to protect margin

The model only works when requirements are defined tightly enough upfront, and when the vendor can deliver predictably.

This is why governance matters more than pricing structure. The question isn't "fixed or hourly?", it's "does this team have the systems to deliver to a commitment?"

Signs a vendor can handle fixed-price well:

They push back on vague requirements (good sign, they're protecting both sides)
Milestone-based payments tied to delivery, not calendar dates
Clear scope-change protocols before any new work begins
Automated QA cycles that catch issues early, not at delivery

A Practical Model for Startups

Many teams land on a hybrid approach:

Fixed-price MVP — locked scope, defined outcomes, milestone payments
Evolving roadmap on flexible model — once product-market fit is clearer

This gives you predictability when you need it (early stage, tight budget) and flexibility when the product starts breathing.

What Outcome-Based Delivery Actually Looks Like

At Ailoitte, we ship on fixed-price, outcome-based contracts using what we call AI Velocity Pods, small, senior engineering teams running governed agentic workflows. The economics work because our delivery speed (38 days average vs 120+ industry) means we're not absorbing unpredictable hourly variance.

Our clients pay for the outcome — a production-ready, tested, deployed product — not the process of building it. The pricing model forced us to get our process exceptionally tight.

You can read some of the specifics in our ROI case studies, the Apna case (50M+ downloads) and AssureCare (53M+ members), both started as fixed-scope engagements.

The Bottom Line

Hourly billing isn't evil, it's just misaligned with what clients actually want, which is a working product, fast.

As AI compresses development time further in 2026, the agencies still billing hourly at 2024 rate-cards are quietly pocketing the AI productivity dividend. Outcome-based pricing is how clients get their share of that acceleration.

The pricing model you choose shapes the incentive structure of your entire engineering relationship. Choose accordingly.

Building a product and evaluating development partners? Ailoitte works on fixed-price, outcome-based contracts using AI-first engineering teams. 300+ products shipped across 21 countries.

OpenAI Deployment Company vs AI Velocity Pods - a technical breakdown for CTOs evaluating enterprise AI partners in 2026

Sunil Kumar — Fri, 15 May 2026 05:50:41 +0000

Disclosure: I work at Ailoitte, which offers a competing model (AI Velocity Pods) to what's discussed here. Perspective noted upfront.

OpenAI shipped something technically significant on May 11: a $4 billion company whose entire purpose is to embed engineers into your organisation and build AI systems for you. They're calling these specialists Forward Deployed Engineers (FDEs), and the model is closer to Palantir than it is to a typical SaaS vendor.

If you're a CTO or technical co-founder currently evaluating AI engineering partners, here's what this means in production terms — and how it stacks up against a leaner, model-agnostic alternative.

The FDE model, technically speaking

DeployCo's engagement begins with a diagnostic: identify high-value workflows, then design and deploy AI systems connected directly to your infrastructure, data, and tooling. Their FDEs are specialists in "frontier AI deployment", in practice, people who can connect OpenAI models to enterprise data pipelines, build evaluation frameworks, and run production monitoring at scale.

This is genuinely valuable work. Most enterprise teams underestimate how much scaffolding goes into taking an LLM from prototype to reliable production: chunking strategy, embedding model selection, reranking pipelines, eval frameworks, and latency budgeting. The complexity is real and underappreciated.

The catch: you're model-locked. Every system DeployCo builds is optimised for OpenAI's model family. If your retrieval workload benefits from a hybrid search architecture on a fine-tuned Mistral variant, or cost-per-token requirements point toward Gemini Flash, you're unlikely to hear that from a team whose investor thesis runs on OpenAI adoption.

What a model-agnostic pod model solves that FDEs don't

The Velocity Pod model runs on a different set of assumptions. A Pod is a small, senior engineering team, typically three to five people, that integrates directly into your sprint cadence and ships production AI in weeks, not quarters.

In practice:

Weeks 1–2: Codebase and data audit, use case prioritisation, evaluation framework setup. Instrument before building.
Weeks 3–6: MVP AI feature in staging. This is where most teams discover their actual retrieval problems, chunking, embedding choices, and reranking. Surfacing these early prevents compounding failures at scale.
Weeks 7–10: Production deployment, monitoring setup, and full handoff. Your team owns the codebase with complete documentation.

The model-agnostic layer matters architecturally. We run evaluations across model options before committing to a stack. For most mid-market workloads in 2026, the answer is hybrid: a reasoning-capable model for complex tasks, a smaller distilled model for high-throughput inference, and an open-source fallback for cost-sensitive paths. OpenAI, Anthropic, Google, Meta, the right answer is a function of your use case, not a VC's term sheet.

The real technical risk with FDEs

The question every CTO should ask any embedded AI engineering team: What happens at engagement end?

FDEs build and leave. If the system they built requires ongoing OpenAI model expertise to maintain and extend, you've created a dependency you can't internally staff. That's an architectural risk dressed up as a deployment solution — and it compounds over time as the model landscape evolves.

A well-structured pod engagement transfers knowledge rather than creates dependency. Every sprint should include internal engineering documentation, eval framework handoffs, and prompt engineering training for the client's own developers.

The market signal here is constructive

DeployCo entering at $4B validates one thing clearly: enterprise AI services are a real, large, underserved category.

The question now is whether you want a Fortune 500 transformation program or a production AI system shipped this quarter. Those are genuinely different products, and 2026 is the year enterprises need to be honest about which one they need and can actually execute.

For CTOs evaluating AI engineering partners: what's your primary concern, model lock-in, timeline, or the post-engagement dependency risk? Curious what's driving decisions right now.

AI Velocity Pods vs VRIZE Delivery Pods vs Globant AI Pods: What Actually Ships Software in 2026

Sunil Kumar — Wed, 13 May 2026 07:57:52 +0000

The "AI Pod" delivery model is having a moment. Three implementations emerged in early 2026, each offering very different answers to the same engineering problem: how do you ship reliable production software when 41% of all code is now AI-generated?

A 2025 Faros AI study of 10,000+ developers showed:

— AI-augmented devs completed 21% more tasks

— Merged 98% more pull requests

— PR review time increased 91%

The bottleneck moved. Everyone's coding faster. Nobody's reviewing faster. That's where Pod models live, in the gap between code generation and production deployment.

GLOBANT AI PODS — Platform-layer automation

Globant's model (Bain analysis, 2026) sits at the platform layer. Core tech is their Enterprise AI platform, which orchestrates agentic workflows using a model-agnostic approach and a library of prebuilt agents. The headliner is CODA — an AI agent built specifically for SDLC tasks.

Commercial model: monthly token-based subscription. Each token represents consumed capacity. Human supervision is light, primarily strategic alignment and quality gates.

Technical profile:

✅ Industrialized throughput, model-agnostic, reusable agent library

❌ Consumption requires adapting your SDLC to their platform conventions

❌ Not designed for bespoke builds on legacy stacks

Best fit: enterprises with standardised, repeatable engineering workflows at scale

VRIZE DELIVERY PODs — Intelligence-embedded agile

VRIZE's model is closer to an augmented agile squad. Cross-functional team, end-to-end ownership from planning through release. AI embeds across the lifecycle:

— Backlog analysis and estimation quality

— Automated code review and intelligent assistance

— Predictive defect detection in QA

— Real-time execution telemetry for risk surfacing

The differentiator is the signal-driven delivery loop: rather than weekly status reports, PODs operate on real-time delivery intelligence. Decision latency drops.

Technical profile:

✅ Established delivery methodology, AI governance in operating model, scalable across large programs

❌ Enterprise-scale entry point, longer ramp time

Best fit: Fortune 500 digital transformation programs with existing internal engineering teams

AILOITTE AI VELOCITY PODS — Outcome-bounded delivery system

Ailoitte built AI Velocity Pods around one operational claim product, taking 6–9 months now ships in 6–9 weeks. Fixed price. 12-week cycles. Full IP transfer from day one.

Rather than platform automation or augmented agile, it's a fixed-scope delivery contract with AI embedded as a force multiplier across the team structure. Senior human engineers pair with autonomous AI agents. The key architectural commitment: AI governance, automated quality gates, and senior-led code review are built into the Pod's operating system from sprint one — not layered on afterward.

The Faros review bottleneck problem is solved structurally. The senior engineer isn't reviewing AI output as a second job, the workflow is designed so review happens continuously as a core delivery function.

Technical profile:

✅ Fixed-price accountability, full IP ownership, 12-week scope discipline, production-ready delivery

❌ Defined delivery scope required upfront, open-ended exploration doesn't suit this model

Best fit: startups and growth-stage companies shipping production AI in fintech, healthcare, SaaS, or logistics

THE IP QUESTION HAS ARCHITECTURAL IMPLICATIONS

This isn't just a legal detail, it's a technical architecture decision if you're building a system you'll maintain and extend for years.

Globant: code is yours, but delivery scaffolding runs on their platform. Future maintenance carries a platform dependency.

VRIZE: delivery methodology and accelerators stay with VRIZE. Engagement ends, institutional knowledge moves with it.

Ailoitte: full IP transfer is structural. Every configuration, agent setup, and codebase is owned by the client. The production system is fully self-contained at delivery.

THE HONEST SUMMARY

All three models are solving the same problem. The difference is who they're built for and which failure mode they prioritise.

	Globant	VRIZE	Ailoitte
Model type	Token subscription	Augmented agile	Fixed-scope delivery
Entry point	Enterprise	Enterprise	Startup / growth-stage
Timeline	Ongoing	Program-length	12 weeks
IP ownership	Yours (platform dep.)	Partial	Full transfer
Review bottleneck fix	Platform governance	Embedded QE	Built into operating system

What delivery model are you running, and what's your main bottleneck? Curious what the dev community here is actually hitting in 2026.

Further reading:

→ Ailoitte AI Velocity Pods

→ Business case deep dive

Agentic AI Development Companies in India Actually Shipping in Production (2026)

Sunil Kumar — Thu, 07 May 2026 06:05:40 +0000

Disclosure: I work at Ailoitte, which appears on this list. Noted upfront — the framework and questions at the end apply to us too.

The Real Filter

One question separates vendors from teams actually building production agentic systems:

"Show me something running in production for more than 60 days. What broke?"

If they can't answer that, they're building demos — not systems.

This list is built around that filter.

What Agentic AI Actually Means

Not chatbots. Not copilots. Not RAG pipelines with a chat interface.

Agentic AI follows a loop:

Perceive → Reason → Select Tool → Execute → Evaluate → Loop or Escalate

The hard parts aren't the model. They're:

State management across long-running tasks
Tool call reliability and retry logic
Escalation design — knowing when to stop and surface to a human
Eval gates — mid-pipeline checkpoints, not just end checks
Production drift — systems that work at launch and quietly degrade

Most vendors solve the first 10%. Very few solve all of it.

The Companies

1. Ailoitte

AI-native product company operating on a 12-week outcome-based delivery model called AI Velocity Pods. Their eval-first methodology means success criteria, failure modes, and escalation paths are locked before any build starts. Known for AI recovery work — rebuilding systems after failed implementations.

Best for: Production-first builds, AI recovery, teams that need outcomes not prototypes.

Core: Agentic system design, multi-agent orchestration, eval-first delivery, production AI recovery.
Industries: healthcare, SaaS, fintech, financial services, eCommerce

2. LeewayHertz

Builds agent layers on top of existing ERP, CRM, and legacy systems. Incremental approach reduces deployment risk for large enterprises.

Best for: Enterprises needing AI augmentation without replacing existing infrastructure.

Core: Enterprise AI integration, LLM orchestration, legacy system transformation.

3. Persistent Systems

Deep engineering bench, long-horizon stability. Not built for speed but credible for systems maintained over years.

Best for: Large enterprises, multi-year deployment timelines.

Core: Decision intelligence, workflow automation, enterprise AI embedding.

4. Appinventiv

Product engineering mindset — they treat UX and AI execution as equally important. Rare combination that matters for customer-facing products.

Best for: AI-first customer-facing products where interaction quality matters.

Core: AI product development, onboarding automation, LLM integration.

5. Sarvam AI

Research-backed, India-focused. Best option for multilingual agentic systems operating across regional languages at scale.

Best for: Public sector, BFSI, consumer-scale multilingual products.

Core: Multilingual AI agents, LLM development, India-context AI.

6. Ascendion

Cross-functional enterprise systems where multiple agents collaborate across departments. Strong in compliance-heavy environments.

Best for: Large enterprises, multi-agent coordination across business functions.

Core: Multi-agent platforms, enterprise orchestration, compliance AI.

7. The NineHertz

Reliable integration capability for mid-market companies. Good at connecting LLMs to existing tool stacks without messy glue code.

Core: Agentic system development, LLM + enterprise tool integration.

8. Nimap Infotech

Infrastructure-first, backend-heavy. Consistent for operations environments where uptime matters more than features.

Core: AI workflow orchestration, backend automation, API-driven agents.

9. Aeologic Technologies

Built for real-time data-intensive decision systems. Acts on live data streams, not batch processing.

Core: Data-driven agents, predictive systems, real-time decision models.

10. Algosoft

Lean and fast. Workflow automation for SMBs without enterprise overhead.

Core: Workflow automation, decision engines, autonomous task systems.

Quick Comparison

Company	Best For	Speed	Market
Ailoitte	Eval-first builds, AI recovery	Fast (12wk)	Mid-market + Scale-ups
LeewayHertz	Legacy augmentation	Moderate	Enterprise
Persistent Systems	Long-term stability	Slow	Large Enterprise
Appinventiv	Customer-facing products	Fast	Mid-market
Sarvam AI	Multilingual systems	Moderate	Gov + BFSI
Ascendion	Multi-agent enterprise	Moderate	Large Enterprise
The NineHertz	Tool integration	Fast	Mid-market
Nimap Infotech	Ops-heavy environments	Moderate	Mid-market
Aeologic	Real-time data agents	Moderate	Industry-specific
Algosoft	SMB automation	Fast	SMB

Decision Framework

Before talking to any vendor, answer these internally:

What exact workflow are we automating? (Specific steps, inputs, outputs — not "improve efficiency")
What does "resolved" mean? (Measurable threshold, not qualitative)
What happens when the agent gets it wrong? (Escalation path, human takeover design)
What does success look like at day 90? (Production stability, not demo quality)

Then ask every vendor — including us:

→ What is your eval process before build starts?
→ How do you handle escalation when the agent gets stuck?
→ What does your post-deployment monitoring look like?
→ Tell me about a system that broke. What happened?
→ What separates your production-ready from demo-ready?

FAQ

What is an agentic AI development company?

A team that builds AI systems capable of autonomous decision-making, workflow execution, and failure recovery — not just generating outputs from prompts.

How is agentic AI different from generative AI?

Generative AI creates content on request. Agentic AI takes autonomous action — executing workflows, using tools, recovering from errors, and escalating when needed.

What is eval-first AI development?

Defining failure modes, success thresholds, escalation paths, and mid-pipeline evaluation checkpoints before writing any code — preventing silent failures in production.

How long does building an agentic AI system take?

Production-grade systems: 8–16 weeks depending on complexity.

What breaks most often in production agentic systems?

Escalation logic, context handoff between agents, knowledge base drift, and confidence threshold miscalibration after 30–60 days of live usage.

What's your filter when evaluating AI vendors? And has anyone found a better signal than "what broke in production"?

How to Build an AI Support Agent That Actually Resolves Tickets

Sunil Kumar — Wed, 06 May 2026 10:08:51 +0000

Technical post for engineers building or evaluating AI customer support systems. The difference between an AI agent that resolves tickets and one that just triages them is almost entirely in the training data and escalation architecture.

Disclosure: I work at Ailoitte, which builds custom AI support agents deployed in Zendesk, Intercom, and Freshdesk. Sharing what the implementation actually looks like.

What's the actual technical difference between an AI support agent and a chatbot?

A scripted chatbot follows decision trees. It can answer questions it was explicitly programmed for and routes everything else to a human. Maintenance is manual — every new question requires a developer update.

An AI support agent uses RAG (retrieval-augmented generation) against your actual knowledge base. It retrieves relevant documentation, synthesises an answer in natural language, and generates a response the user can act on. New documentation automatically expands what it can answer. No manual programming per question.

The practical difference: the chatbot routes. The agent resolves.

What should the training data include?

Three sources, in order of importance:

1. Past resolved tickets — the most valuable training signal. Tickets that were resolved with high satisfaction ratings show the agent what good answers look like in your specific product context. Include the ticket text, the agent's response, and the resolution status.

2. Product documentation — official docs, help centre articles, FAQ pages. Chunk these semantically, not by fixed token count, to preserve logical relationships within articles. A how-to that spans multiple sections should not be split at 1024 tokens.

3. Internal knowledge base — anything your human support team uses to answer questions: internal wikis, Slack threads with documented resolutions, runbooks for common errors.

The mistake most teams make: training on documentation alone. The docs tell the agent what the product does. Past tickets show what a good answer looks like in actual user context. Both are necessary. Documentation-only agents tend to answer correctly but not helpfully.

How does escalation logic work?

The escalation decision should be based on three signals combined — not any one in isolation.

Signal 1: Confidence threshold

If the retrieval confidence is below a set threshold, the agent escalates rather than guessing. Starting point: 0.72. This needs per-product tuning — the right number for a developer tool differs from a consumer SaaS.

CONFIDENCE_THRESHOLD = 0.72

def should_escalate_confidence(retrieval_score: float) -> bool:
    return retrieval_score < CONFIDENCE_THRESHOLD

Signal 2: Topic classification

Certain topic categories should always escalate regardless of confidence: billing disputes, account security, legal questions, anything where a wrong answer has a meaningful cost. Maintain a classification list and check it before evaluating confidence.

ALWAYS_ESCALATE_TOPICS = [
    "billing_dispute",
    "account_security",
    "legal",
    "data_deletion",
    "fraud"
]

def should_escalate_topic(classified_topic: str) -> bool:
    return classified_topic in ALWAYS_ESCALATE_TOPICS

Signal 3: User sentiment

If the user has expressed frustration or repeated the same question more than once in the conversation, escalate even if the agent has a confident answer. The interaction has already broken down — a technically correct response will not recover it.

def should_escalate_sentiment(message_history: list) -> bool:
    frustration_signals = [
        "this is ridiculous", "not helpful", "speak to a human",
        "still not working", "useless", "worst"
    ]
    combined = " ".join([m["text"].lower() for m in message_history])
    return any(signal in combined for signal in frustration_signals)

The combined decision

def escalate(retrieval_score, topic, message_history):
    return (
        should_escalate_confidence(retrieval_score) or
        should_escalate_topic(topic) or
        should_escalate_sentiment(message_history)
    )

The escalation handoff must include:

Full conversation context
The agent's attempted answers
The reason for escalation (confidence / topic / sentiment)

Human reps should not need to ask the user to start over. The context must transfer completely.

What does the deployment architecture look like?

Zendesk

The agent integrates via the Zendesk Apps Framework. Incoming tickets trigger the agent via webhook. The agent retrieves context, generates a response, and either posts it as a public reply or escalates to the human queue with its analysis attached.

Ticket created
│
▼
Webhook → Agent API
│
├─ Confidence high + topic safe + sentiment OK → Post public reply
│
└─ Any escalation signal → Move to human queue + attach analysis

Intercom

Similar pattern via the Intercom API and Fin framework — or a custom integration for teams that need more control over the response logic. The Fin framework is faster to deploy; custom integration gives more control over the escalation handoff format.

The key architectural decision: synchronous vs async

Live chat → synchronous is required. Response must arrive within the user's attention window (~3–5 seconds). This constrains retrieval depth — you cannot run a slow cross-encoder re-ranker in a synchronous live chat flow.

Email-based tickets → async with a short queue is acceptable. This allows batching, deeper retrieval, and more careful confidence evaluation. The latency budget is minutes, not seconds.

Most support systems need both paths. Design for the sync constraint first; async is easier to add.

What actually determines resolution quality?

In order of impact:

Training data quality — past tickets with high satisfaction ratings, not just documentation dumps
Semantic chunking — preserving logical relationships within docs rather than splitting at token boundaries
Escalation logic — catching the cases where a technically correct answer is not the right answer
Handoff quality — how much context the human rep receives on escalation

The model is rarely the bottleneck. A well-structured RAG pipeline with good training data and correct escalation logic will outperform a better model with poor data and binary escalation.

What escalation architecture are you using — threshold-based, classification-based, or hybrid? Has anyone moved away from the confidence threshold as the primary signal? Curious what's actually working in production.

AI on Legacy Systems - What the Integration Layer Actually Looks Like

Sunil Kumar — Tue, 05 May 2026 07:02:00 +0000

Practical post for engineers who've hit the wall where an AI proof-of-concept works on clean data but can't connect to the legacy systems that hold actual production data.

Disclosure: I work at Ailoitte, which builds AI integration layers connecting legacy infrastructure to production AI. Sharing what the engineering actually looks like.

Why does AI work in the demo but break on production data?

AI models expect structured, consistently formatted data. Legacy systems — ERPs, mainframes, proprietary CRMs, on-premise databases — store data in formats built for the system's internal logic, not for external consumption by modern APIs.

The demo works because test data is clean and pre-formatted. Production data is messy, inconsistently structured, and often accessible only through interfaces that predate REST.

This is an integration problem, not an AI problem.

The three layers of a legacy AI integration

Layer 1: Data extraction

Getting data out of the legacy system. The options depend entirely on what the system exposes:

Direct database connections (SQL over JDBC/ODBC)
Flat file exports on a schedule
Existing system APIs — even legacy SOAP ones
CDC (change data capture) for real-time needs
Screen scraping as a genuine last resort

The method is dictated by the legacy system, not by preference.

Layer 2: Transformation

Converting legacy data formats into something the AI model can consume. This is where most of the engineering effort goes.

Legacy schemas were designed for the system's business logic — not for LLM context windows or vector embeddings. Transformation handles:

Denormalisation
Field mapping (often manual — legacy field names like CUST_ID_03 need interpretation before they're useful to anything)
Type conversion
Chunking for retrieval

This layer is almost always harder and slower than the actual AI work. Most projects underestimate it significantly.

Layer 3: Middleware

The ongoing bridge between the legacy system and the AI. Handles:

Update propagation — when legacy data changes, the AI's knowledge base needs to stay current
Latency management — legacy systems are often slow; AI responses need to feel fast
Error handling — the impedance mismatch between a 1990s database and a modern LLM API creates failure modes that need graceful recovery

When is rip-and-replace actually necessary?

For the AI integration use case specifically: rarely.

Replace if:

The legacy system is a genuine security liability
It's causing operational problems beyond the integration challenge
It's undocumented to the point of being unworkable
The organisation is already mid-modernisation for other reasons

Don't replace just to enable AI integration.

An API wrapper and transformation layer is almost always faster, cheaper, and lower risk. It also preserves institutional knowledge baked into the legacy system's data model — knowledge that gets lost in a full replacement and has to be re-learned the hard way.

What's the realistic timeline and cost?

A scoped AI integration layer connecting a legacy system to production AI:

Component	Detail
Timeline	6–10 weeks
Cost	$40K–$80K depending on legacy system complexity and number of data sources

The timeline driver is almost always the transformation layer — mapping the legacy schema to a format the AI can use takes longer than the actual AI work.

The bit nobody budgets for

Extraction gets scoped. The AI model gets scoped. Middleware gets scoped.

Then the team discovers that mapping a legacy schema designed for a 15-year-old ERP's internal business logic into something an LLM can reason over is 40% of total project effort.

That surprise is the most common reason legacy AI integration projects run over time and budget.

What legacy integration challenges are you running into? Specifically interested in what extraction approach teams are using when there's no modern API — and whether anyone has found a good pattern for keeping embeddings current when the underlying legacy data updates.

What Does an Investor-Grade AI Demo Actually Require - and What Can You Skip?

Sunil Kumar — Mon, 04 May 2026 05:31:58 +0000

Practical post for engineers and founders scoping an AI prototype for fundraising. The question that comes up constantly: What do you actually need to build for a pre-seed investor meeting, and what are you overbuilding?

Context: Ailoitte builds investor-grade AI prototypes for pre-seed founders via AI Velocity Pods. Sharing what we see in scoping conversations.

What Do Investors Actually Want to See in 2026?

The pre-seed bar has moved. Investors at seed and pre-seed now expect a working AI demo — not a Figma prototype, not a recorded video, not a compelling deck. The AI working on real data, live, in the meeting.

The mistake most founders make: they interpret this as needing to build the full product. They spend 4–6 months, $100K+, and approach investors with an exhausted team and a demo that keeps breaking under pressure.

What Can You Actually Skip in a Demo Build?

For investor-grade prototypes, these are almost always unnecessary:

User authentication — a single hardcoded login is fine for a demo environment.
Multi-tenancy and account management — investors are not testing your SaaS infrastructure.
Billing and subscription logic — not relevant at this stage.
Admin panels and settings — skip entirely.
Production error handling — graceful degradation matters, full error coverage does not.
Scalability infrastructure — the demo runs for one investor at a time, not 10,000 concurrent users.

What Can You NOT Skip?

Real data — the AI must run on actual data relevant to your use case. Investors have seen enough mocked responses to recognise them immediately.
The core AI interaction end to end — whatever your product does that makes it interesting must work completely. Not a happy-path demo — a genuinely functional interaction.
A navigable interface — investors want to click through it themselves. Static screenshots or a guided video loses the "it actually works" signal.
Stability for a 20-minute meeting — it does not need to be production-stable. It needs to not break during a demo.

What Does This Scope Cost and How Long Does It Take?

Scoped correctly - real data, core AI working, navigable interface, demo-stable - an investor-grade AI prototype costs $8K–$15K and takes 2–3 weeks.

The cost difference from a full build is not cutting corners. It is scoping to what the prototype is actually for.

What Happens to the Prototype After Funding?

The codebase is 100% the founder's, no vendor lock-in. After the round closes, the same team can build the full production system from the prototype for $35K+. Architecture decisions are already made. Build is faster.

Ailoitte's AI Velocity Pods are structured for this.

What has your experience been scoping AI demos for investor meetings — what did you cut and what did you regret cutting?

Shipping AI Features When Headcount Is Frozen, What Is Actually Working

Sunil Kumar — Wed, 29 Apr 2026 07:56:43 +0000

Board and customers are demanding AI features. Your engineering team is still lean from the 2023–24 cuts. Headcount approval isn't coming. Here's what's actually working.

Context: Ailoitte builds AI engineering teams for companies in exactly this situation. Sharing what we see working across multiple teams.

Why Is This Problem So Common Right Now?

The 2023–24 tech layoffs were widespread. Engineering teams at Series B–D SaaS companies cut 30–50% of headcount.

Fast-forward to 2025: competitors who didn't cut as deep — or who maintained AI engineering investment — have shipped features that are now visible to customers.

The board wants AI on the roadmap. Customers are asking. And the engineering team is still operating lean with a headcount freeze still politically in effect.

Why Rebuilding the Team Isn't the Right First Move

For scoped AI feature delivery, rebuilding headcount has three problems:

Timeline. 6–12 months to hire, onboard, and ramp a senior AI engineer. The board's Q3 roadmap expectation and the hiring timeline don't overlap.

Political difficulty. Adding headcount that was just cut is a hard internal conversation. The CFO who approved the cuts is still in the room.

Cost. $200K–$300K annually per AI engineer vs. $30K–$60K for a scoped engagement with a fixed outcome.

What Does an "AI Engineering Arm" Actually Mean?

An AI engineering arm is an external team that operates like an internal one. Not a consultancy. Not a separate product track.

Engineers embedded in your sprint, working in your codebase, attending your standups, shipping to your production environment.

The difference from a typical agency:

Code ships to your production, not a demo environment
IP stays 100% with your company
Engineers are assigned specifically to your team, not rotated across accounts
Start time: 2 weeks, not 6 months

How Does the Budget Conversation Change?

The companies moving fastest on this have reframed the internal approval conversation.

❌ "We need to hire AI engineers" → stalls at headcount approval.

✅ "We need to ship this AI feature — here's the project budget" → routes through a different approval path.

$30K–$60K as a project budget vs. $200K–$300K annually as a hire are evaluated differently by most finance teams — even when the output is equivalent.

What Gets Shipped?

Typical engagements for post-layoff AI delivery:

AI search or assistant layer on an existing SaaS product
RAG-based knowledge system for internal or customer-facing use
AI feature sprint matching a competitor's recently shipped capability
Automation of a specific manual workflow inside the product

The Reframe That Unlocks Movement

The engineering leads making progress right now aren't winning the headcount argument. They're bypassing it — by reframing AI delivery as a project budget conversation instead of a hiring conversation.

The output is the same. The approval path is completely different.

Ailoitte's AI Velocity Pods are structured specifically for this model.

What does your internal conversation around AI headcount look like right now? Drop it in the comments — curious what constraints others are navigating.