Forem: Gaurav Talesara

Why Software Isn’t Built for AI Agents

Gaurav Talesara — Sat, 02 May 2026 12:52:03 +0000

The next users of your software won’t be humans.
They’ll be agents.

And most software today is completely unprepared for that.

Right now, AI agents are:

Browsing websites
Filling forms
Clicking buttons
Navigating dashboards

That’s not scale. That’s a workaround.

We’re forcing machines to behave like humans—because our systems were never designed for anything else.

The Core Problem

Modern software is built around a simple assumption:

A human will be sitting in front of a screen.

That assumption drives everything:

UI-heavy workflows
Step-by-step interactions
Documentation meant to be read, not executed

But agents don’t need interfaces.
They need interfaces they can reason about and execute against programmatically.

Where Current Systems Break for Agents

Let’s break this down from a systems perspective.

1. UI-First Architecture

Most SaaS products expose functionality through:

Dashboards
Forms
Buttons

Agents interacting with these:

Rely on scraping or automation layers
Break when UI changes
Lack reliability

2. Non-Deterministic Outputs

Agents need:

Structured responses
Predictable schemas

Instead, they get:

HTML pages
Inconsistent API responses
Unstructured data

3. Human-Centric Authentication

Current flows:

OAuth screens
Email verification
CAPTCHA

These are friction points for agents trying to:

Discover tools
Authenticate
Execute tasks autonomously

4. Documentation Isn’t Machine-Readable

Docs today are:

Written for humans
Scattered across pages
Hard to parse programmatically

Agents need:

Structured capability descriptions
Executable contracts
Clear input/output expectations

APIs Alone Are Not the Answer

A common assumption is:

“We already have APIs, so we’re agent-ready.”

That’s not true.

APIs are:

Too generic
Often inconsistent
Not designed for autonomous decision-making

Agents need more than endpoints.

They need:

Action schemas (what can be done, not just how)
Deterministic contracts (guaranteed outputs)
Capability discovery (what tools exist and when to use them)

What “Agent-First Software” Actually Looks Like

If we design systems for agents as first-class users, the architecture changes.

1. Machine-Readable Interfaces

Instead of UI-first:

Structured APIs with strict schemas
Tool definitions with clear contracts
Standardized input/output formats

2. Programmatic Onboarding

Instead of:

Signup → verify → explore

Agents should:

Discover → authenticate → execute

This requires:

Auto-provisioned credentials
Machine-readable pricing/limits
Capability endpoints

3. Permissioned Execution

Agents need controlled autonomy:

Scoped access tokens
Role-based permissions
Execution boundaries

4. Deterministic Execution Layer

Every action should be:

Predictable
Retry-safe
Observable

5. Observability for Agents

Traditional logs aren’t enough.

We need:

Decision tracing
Tool-call lineage
Cost per execution
Latency breakdowns

A Practical Agent-System Architecture

A simplified flow looks like this:

Agent
  ↓
Planner (decides what to do)
  ↓
Tool Registry (what tools are available)
  ↓
Execution Layer (calls APIs/tools)
  ↓
Response Validator (ensures correctness)
  ↓
Memory (stores context + learnings)

Each layer is critical:

Planner → reasoning
Tool registry → discoverability
Execution → action
Validator → reliability
Memory → continuity

This is very different from traditional request-response systems.

Where the Opportunity Is

Most people today are focused on:

“How do we build better agents?”

But the bigger opportunity is:

“How do we build better systems for agents to operate on?”

Every major category is open:

CRM → agent-native workflows
Payments → programmable financial actions
Support → autonomous resolution systems
Analytics → queryable, structured insights

Not as add-ons.
But as core design principles.

What Most People Get Wrong

❌ “APIs are enough”

They’re not.
Agents need structured, reliable, discoverable systems.

❌ “Just add AI on top”

That creates brittle layers, not scalable systems.

❌ “Agents will replace software”

No.
Agents will consume software differently.

The Shift That’s Coming

We’re moving from:

Human-first software → to
Agent-first systems

This isn’t a feature upgrade.

It’s a paradigm shift in how software is designed and consumed.

Final Thought

The companies that win won’t be the ones with the smartest agents.

They’ll be the ones:

Agents prefer to use.

👋 If You’re Building in This Space

I’m currently working on:

Agent-based systems
Automation architectures
AI-native SaaS workflows

If you’re exploring similar problems or thinking about building agent-first products, I’d be interested to exchange ideas.

Engineers Won’t Just Have Salaries - They’ll Have Token Budgets

Gaurav Talesara — Mon, 13 Apr 2026 18:40:27 +0000

Introduction

There’s a subtle shift happening in how software is being built.

It’s not loud.
It’s not fully standardized.
But it’s already visible if you look closely.

We are moving from a world where:

Engineering output was limited by human effort

To a world where:

Output is increasingly limited by how much AI you can effectively use

And that introduces a new concept most teams are not yet fully prepared for:

Token budgets.

What’s Changing Right Now

If you zoom into how modern engineering teams are working:

AI tools are no longer optional — they’re embedded in daily workflows
Engineers are generating, reviewing, and iterating faster than ever
The bottleneck is shifting from writing code to orchestrating systems

Some early signals:

Companies are beginning to track AI usage per employee
AI costs are becoming a visible line item in engineering budgets
Token consumption is growing at an unpredictable pace

This isn’t theoretical.

It’s already happening in pockets of the industry.

The Real Constraint Has Changed

Traditionally, engineering constraints looked like this:

Developer bandwidth
System architecture
Infrastructure scaling

Now there’s a new constraint emerging:

Effective AI utilization

Two engineers today are no longer equal if:

One uses AI occasionally
The other builds workflows, agents, and automation around it

The second engineer is operating with leverage.

And that leverage is powered by tokens.

Why Token Budgets Will Emerge

Right now, most companies are in an experimental phase:

Pay-as-you-go AI usage
No clear limits
Costs that are hard to predict

This doesn’t scale.

As usage increases, companies will need:

Cost control
Predictability
Fair distribution of resources

The natural evolution?

Allocated token budgets per engineer or team

Just like:

Cloud budgets
API rate limits
SaaS seat allocations

Tokens = The New Productivity Unit

We’re used to measuring productivity through:

Output (features shipped)
Velocity (story points, sprints)
Efficiency (time to deliver)

But AI introduces a different layer.

Now, productivity is increasingly tied to:

How effectively you can convert tokens into outcomes

Not all token usage is equal.

Some engineers waste tokens on low-value prompts
Others build reusable systems that compound output

This is where the real differentiation will happen.

The Rise of the “AI-Orchestrating Engineer”

The best engineers in the next phase won’t just:

Write clean code
Design scalable systems

They will:

Design agent workflows
Optimize token usage vs output
Build systems that act, not just respond

In other words:

They will orchestrate intelligence.

What This Means for Engineering Leaders

If you’re leading teams today, this shift has implications:

1. Budgeting will change

AI costs will move from “tools” to core infrastructure spend

2. Hiring signals will change

You won’t just evaluate:

Coding ability

You’ll evaluate:

AI leverage
System thinking
Automation mindset

3. Internal tooling will evolve

Teams will build:

Internal agents
Workflow automation systems
Token-efficient pipelines

Why This Isn’t Mainstream Yet

It’s important to stay grounded.

Most companies today:

Do NOT have formal token budgets
Are still figuring out pricing and limits
Are experimenting without clear governance

This is still early-stage behavior.

But the direction is clear.

The Bigger Shift: From Software to Systems

Today’s companies are built on:

APIs
Databases
Services

Tomorrow’s companies will increasingly rely on:

Agents
Workflows
Token pipelines

And that changes how value is created.

Final Thought

We’re not just adding AI to existing systems.

We’re redefining how work gets done.

The question is no longer:

“How fast can your team build?”

It’s:

“How effectively can your team deploy intelligence?”

And in that world—

Tokens become leverage.

Closing

This shift isn’t fully visible yet.

But it’s already in motion.

The teams that understand it early will have an advantage that compounds over time.

If you're building or leading engineering teams right now—

How are you thinking about AI usage?

As a tool…

Or as infrastructure?

Before You Deploy AI-Generated Code: A Production Checklist

Gaurav Talesara — Sat, 14 Mar 2026 05:29:49 +0000

AI can generate working code in seconds. Tools like ChatGPT, Claude, and GitHub Copilot have dramatically accelerated development.

But generating code is not the same as shipping production-ready software.

AI-generated code often introduces hidden issues: outdated dependencies, inefficient logic, security risks, and architecture problems. Before deploying AI-generated code to production, engineers should review it carefully.

This article outlines a practical checklist to validate AI-generated code before moving it to production.

1. Dependency and Package Validation

AI frequently suggests libraries without verifying their current status. Some packages may be deprecated, insecure, or poorly maintained.

Before deploying, validate all dependencies.

Things to check:

Verify package versions
Ensure packages are actively maintained
Lock dependency versions
Remove unnecessary libraries

Useful commands for Node.js projects:

npm audit
npm outdated

Tools that help with dependency validation:

Snyk
Dependabot
OWASP Dependency Check

These tools can detect vulnerable dependencies and recommend secure versions.

2. Vulnerability and CVE Scan

Many open-source libraries contain known vulnerabilities. AI-generated code may unknowingly include these dependencies.

Before production deployment, perform a vulnerability scan.

Things to check:

Known CVEs in dependencies
High or critical severity vulnerabilities
Security advisories from package maintainers

Recommended tools:

Snyk
Trivy
OWASP Dependency Check

Address critical vulnerabilities before moving forward.

3. Check for Broken Code

AI-generated code may appear correct but fail in real scenarios.

Common problems include:

Missing imports
Incorrect API usage
Poor edge case handling
Null or undefined errors

Static analysis tools can help detect these issues early.

Useful tools:

ESLint
TypeScript type checking
Static code analyzers

Example:

npm run lint
tsc --noEmit

These checks ensure the codebase is structurally sound.

4. Performance Review

AI-generated code may not always be optimized. In many cases, it produces inefficient queries or unnecessary loops.

Common performance issues include:

N+1 database queries
Repeated API calls
Large unpaginated responses
Inefficient loops

Example of inefficient logic:

for (const user of users) {
  await db.getOrders(user.id);
}

Improved approach:

await db.getOrdersForUsers(userIds);

Optimizing performance early prevents scaling issues later.

5. Scalability Validation

Code that works locally may fail under production load. AI-generated code often lacks scalability considerations.

Key things to verify:

Stateless architecture
Proper database indexing
Rate limiting for APIs
Background job processing for heavy tasks

For Node.js systems, queues are often used to handle asynchronous workloads.

Common tools include:

Redis
BullMQ
RabbitMQ

This ensures that the system can handle increased traffic and workload.

6. Reliability and Error Handling

Production systems must handle failures gracefully.

AI-generated code may miss important reliability patterns such as retries or proper error handling.

Important checks include:

Proper try-catch blocks
Retry mechanisms for external services
Circuit breakers
Graceful fallback responses

Example:

try {
  const result = await paymentService.process();
} catch (error) {
  logger.error(error);
  return fallbackResponse();
}

Reliable systems anticipate failure and handle it properly.

7. Logging and Observability

Observability is essential for production systems. AI-generated code rarely includes production-level logging.

Before deployment, ensure that the system has proper visibility.

Important components:

Structured logging
Request tracing
Error monitoring
Alerts for system failures

Popular tools include:

Winston
Pino
Prometheus
Grafana
Datadog

Good observability allows teams to detect and resolve issues quickly.

Final Production Checklist

Before deploying AI-generated code to production, confirm the following:

Dependencies are validated
Vulnerability scans are completed
Code passes static analysis
Performance issues are addressed
Scalability considerations are reviewed
Reliability and error handling are implemented
Logging and monitoring are enabled

Conclusion

AI has dramatically accelerated the speed of software development.

However, faster code generation also increases the risk of shipping insecure or unstable systems. Generating code is only the first step. The real responsibility lies in validating that code before it reaches production.

AI can write code. Engineers must ensure that code is secure, reliable, and production-ready.

The Next Leap in RAG Isn’t a Better Model - It’s Better Retrieval

Gaurav Talesara — Sun, 01 Mar 2026 14:09:56 +0000

For the last two years, most Retrieval-Augmented Generation (RAG) systems have followed the same architecture:

Chunk → Embed → Store in Vector DB → Similarity Search → Inject into LLM

This pipeline works.

But it also has a fundamental limitation:

Similarity does not always equal relevance.

And that’s where the next evolution of RAG begins.

The Core Problem with Vector-Based RAG

Traditional RAG relies on embeddings and vector similarity. The assumption is simple:

If two pieces of text are semantically similar in vector space, they are relevant.

In real-world production systems, this breaks down.

1. Arbitrary Chunking Breaks Context

Documents are split into fixed-size chunks.
Cross-references get separated.
Tables and structured sections lose meaning.

2. Similarity Is Not Logical Relevance

A chunk might be semantically close but logically unrelated to the question.

This becomes especially problematic in:

Financial reports
Legal documents
Research papers
Large enterprise PDFs

3. Retrieval Is Passive

Vector search retrieves the “closest” chunks.
It does not reason about where it should look.

Enter Vector-Less Page Indexing

A new approach is emerging: vector-less indexing, also described as reasoning-based retrieval.

One open-source implementation gaining attention is PageIndex:

https://github.com/VectifyAI/PageIndex

Instead of embedding everything into vector space, this method:

Builds a structured index similar to a smart table of contents
Organizes documents hierarchically using a tree structure
Uses LLM reasoning to navigate the structure
Follows cross-references across sections

The retrieval flow becomes:

Query → Reason → Navigate → Select → Answer

Instead of:

Query → Embed → Match → Return

This is a significant architectural shift.

Why This Improves Accuracy

In structured documents, relevance is often positional and logical, not just semantic.

For example:

“See Appendix G for revenue breakdown”
“Refer to Section 4.2 for risk disclosure”
“As discussed in the previous quarter”

Vector similarity alone struggles with these patterns.

A structured tree index allows the system to:

Understand document hierarchy
Traverse sections intelligently
Maintain context across related nodes
Treat retrieval as a planning problem

Retrieval becomes active navigation rather than passive matching.

Does This Replace Vector Search?

Not entirely.

Vector search remains powerful for:

Unstructured knowledge bases
FAQs
Customer support bots
General semantic retrieval

For highly structured documents, reasoning-based indexing may outperform traditional embedding-based RAG.

In practice, hybrid systems combining structured indexing and vector search may become the dominant approach.

Final Thoughts

The future of RAG will not be defined by larger models or faster embeddings.

It will be defined by how intelligently we retrieve context.

As systems move toward production-grade reliability, indexing strategy may matter more than embedding choice.

If you are building serious RAG systems, it may be time to rethink:

Your chunking strategy
Your indexing layer
Your retrieval architecture

Retrieval is evolving from vector similarity to intelligent navigation.

Before You Build Anything, Make Your Idea Visible

Gaurav Talesara — Mon, 23 Feb 2026 18:50:48 +0000

Most early-stage products don’t fail because of bad engineering.
They fail because the idea was never clarified before development started.

One thing I’ve learned working with startups is this:

Before you build anything, you need to make your idea visible.

It sounds simple, but this step is often skipped - and skipping it creates confusion, wasted effort, and expensive rework later.

The Common Pattern I See

Many founders and early teams do one of two things:

Keep the idea in their head
Jump straight into development

In both cases, clarity is missing.

Developers start building.
Features get added.
Scope expands.
Assumptions go untested.

And then a few weeks later, everyone realizes:

“This isn’t what we meant.”
“This isn’t scalable.”
“This isn’t what users actually need.”

The issue usually isn’t execution.

It’s that the idea was never made visible.

I’ve seen this pattern in both early MVPs and scaling products — and it’s surprisingly consistent.`

What Does “Make It Visible” Actually Mean?

It doesn’t mean creating perfect architecture diagrams.
It doesn’t mean over-engineering.
It doesn’t mean spending weeks planning.

It means answering a few critical questions visually:

How does a user enter the system?
What actions can they take?
What data moves where?
What absolutely needs to exist in version one?
What can wait?

This can be as simple as:

A rough user flow
A basic system sketch
A simple data movement diagram
A lightweight interactive prototype

The goal is not perfection.

The goal is clarity.

Why This Step Changes Everything

When you can see the product, several things happen:

1. Assumptions Become Visible

Hidden assumptions surface quickly when you map flows.

2. Scope Becomes Controlled

You start identifying what’s essential and what’s noise.

3. Technical Decisions Improve

Architecture becomes intentional instead of reactive.

4. Validation Gets Easier

It’s much easier to show something tangible and ask:

“Does this solve your problem?”

5. You Avoid Building the Wrong Thing

And that’s where most time is wasted.

I’ve seen teams save weeks - sometimes months - just by doing this step properly.

You Don’t Need Heavy Tools

Today, there are simple ways to do this:

Structured brainstorming
Basic flow mapping
Simple system views
AI-assisted breakdown of ideas
Lightweight visual prototypes

The important part isn’t the tool.

It’s the thinking.

**Clarity before code.

**

A Simple Starting Framework

When I approach a new idea, I usually think in this order:

User Flow – What does the user actually experience?
System Flow – What needs to happen behind the scenes?
Version One Filter – What is absolutely required for the first usable version?
Constraint Check – What can break? What will scale? What can wait?

This doesn’t take weeks.

Sometimes it takes a few focused hours.

But it completely changes the quality of execution.

Final Thought

Most ideas don’t fail because of bad development.

They fail because they were never clarified before development started.

Before you hire.
Before you code.
Before you build.

Make the idea visible.

Curious -
Do you usually visualize your ideas before building, or jump straight into execution?

If you’d like, I can write a follow-up post on:

The exact step-by-step process I use to break ideas into flows
Or the lightweight tools I use to turn raw ideas into something interactive

Let me know 👇

NVIDIA’s Open-Source Voice AI Is Quietly Changing Customer Support

Gaurav Talesara — Mon, 26 Jan 2026 11:47:17 +0000

For years, AI voice systems promised to transform customer support.

In reality, most businesses ran into the same issues:

Robotic conversations
High latency
Expensive, closed platforms
Little to no control over customization

That’s starting to change.

With NVIDIA’s PersonaPlex 7B, open-source voice AI has crossed an important threshold — real-time, natural conversations are finally practical for real businesses.

And this shift isn’t just technical.
It’s operational.

What Is PersonaPlex 7B (in simple terms)

PersonaPlex 7B is an open-source, speech-to-speech AI model.

Unlike traditional voice bots that rely on multiple steps
(speech → text → LLM → text → speech), PersonaPlex operates using a single, unified pipeline.

What this enables:

The AI can listen and respond at the same time
Conversations feel more natural and human
Latency is low enough for real customer interactions
Voice and persona can be customized without heavy fine-tuning

In short:
It behaves less like a bot and more like a real agent.

Why This Matters for Businesses (Not Just Engineers)

This isn’t about replacing support teams.

It’s about giving businesses a new first layer of interaction.

With modern AI voice agents, startups and companies can:

Offer 24×7 customer support
Instantly handle repetitive and common questions
Reduce response time without increasing headcount
Support customers across time zones
Experiment without locking into expensive SaaS platforms

This is especially impactful for:

SaaS companies
Marketplaces
E-commerce brands
Fintech and logistics businesses
Internal IT or HR helpdesks

Where AI Voice Agents Actually Work Today

AI voice agents are most effective when used intentionally.

Some practical, real-world use cases include:

Customer support triage (FAQs, order status, basic troubleshooting)
Inbound sales inquiries
Appointment scheduling
Onboarding and walkthroughs
Internal employee support
After-hours support coverage

In these scenarios, AI doesn’t replace humans —
it removes friction before humans need to step in.

The Real Challenge Isn’t the Model

The model being open source is the easy part.

The real work — and real value — comes from:

Training the agent on business-specific knowledge
Designing the right voice personality and tone
Ensuring low-latency, real-time performance
Integrating with CRMs, ticketing systems, and workflows
Handling edge cases and smooth handoffs to humans

This is where most businesses struggle — and where thoughtful implementation matters.

Open Source Changes the Game

Because PersonaPlex is open source:

Businesses keep control of their data
There’s no vendor lock-in
Customization is possible
Infrastructure decisions stay flexible

For startups, this means:

Faster experimentation
Lower long-term costs
More control over the customer experience

We’re moving from “AI voice demos” to production-ready systems.

What This Means Going Forward

AI voice agents are no longer a future concept.

They’re becoming a practical business tool — especially for teams that want to scale support without scaling complexity.

Companies that explore this early won’t just save costs.
They’ll design better customer experiences.

As with every platform shift:

Early understanding matters more than hype
Implementation matters more than tools

Final Thoughts

Voice AI is quietly entering a new phase.

Not flashy.
Not perfect.
But finally usable.

If you’re building or operating a business that handles customer conversations, this is a space worth understanding deeply — sooner rather than later.

Let’s Connect

I’ve been working hands-on with open-source AI voice agents and real-world integrations.

If you’re evaluating this for your product or business:

LinkedIn: https://www.linkedin.com/in/gaurav-talesara-8099ba147
Email: gaurav@ciphernutz.com

Happy to exchange notes or walk through real use cases.

PostgreSQL Didn’t Fail at Scale -My Architecture Assumptions Did

Gaurav Talesara — Fri, 23 Jan 2026 19:36:11 +0000

When I read “Scaling PostgreSQL to power 800 million ChatGPT users”, I didn’t read it as a Postgres success story.

I read it as a reality check.

Because like many developers building SaaS products, I’ve caught myself thinking:

“This will work for now… but later we’ll need something more scalable.”

Later usually means:

sharding
multiple databases
complex infra
future-me’s problem

Turns out, future-me might be overthinking it.

Building SaaS Makes You Fear Scale Too Early

While working on SaaS-style platforms (hiring tools, dashboards, internal systems), PostgreSQL is often the first thing people want to replace.

Not because it’s failing.
But because we assume it will fail.

The OpenAI post forced me to pause and ask:

If Postgres can survive ChatGPT traffic, what exactly am I afraid of?

The Database Wasn’t the Hero — Discipline Was

What impressed me wasn’t the scale.
It was the restraint.

One primary database
Reads pushed aggressively to replicas
Bad queries treated like production bugs
Write-heavy or non-core data moved out

Postgres wasn’t used as a junk drawer.
It was used as a core system with clear boundaries.

That’s something I don’t always do in my own projects.

Reads Are the Silent Cost Killers

Most SaaS apps are read-heavy:

dashboards
candidate profiles
activity timelines
analytics views

Yet we design everything as if writes are the main concern.

This story reminded me:

Scaling isn’t about handling more writes — it’s about protecting the primary from reads.

Once that clicks, architecture decisions become simpler.

Simplicity Is Not Laziness

I used to think:

“A simple architecture means it won’t scale.”

Now I’m starting to believe the opposite.

Simple systems:

are easier to debug
fail more gracefully
survive longer than clever ones

OpenAI didn’t avoid complexity forever.
They just earned the right to add it later.

What I’m Taking Back to My Own SaaS Work

After reading this, my mindset changed:

I trust PostgreSQL more
I fear premature sharding less
I care more about query quality than new tech
I think harder before adding “just in case” infrastructure

Not everything needs to be distributed.
Not everything needs to be clever.
Most things need to be boring and reliable.

Final Thought

PostgreSQL didn’t scale because it’s magical.

It scaled because engineers respected its limits
and designed around them instead of fighting them.

That’s probably the real lesson for anyone building SaaS today.