Forem: Pavan Belagatti

Context is what you are missing in your AI Agents

Pavan Belagatti — Thu, 14 May 2026 07:38:24 +0000

The biggest flex for developers today is not about coding faster.
It is designing efficient agentic systems & workflows.

For almost thirty years, software engineering optimized around a single bottleneck: Human in loop at every stage of the SDLC.

Every tool - IDEs, linters, autocomplete, even early Copilot - assumed a developer was driving and the machine was assisting. Agentic engineering inverts that assumption. The developer becomes the orchestrator; the agents become the drivers. How cool is that, right?

Look at the SDLC on the left side of this diagram below. Every stage that used to define a senior engineer's week - translating a Jira ticket into a spec, writing the feature, reviewing the PR, fixing the failing test, deploying to staging, triaging the 3 a.m. alert, is now a candidate for delegation.

Not because agents are smarter than your best engineer, but because they're tireless, parallelizable, and increasingly competent at bounded tasks.

The system on the right is where the leverage compounds. A single agent running Plan → Code → Review → Ship is useful.

A fleet of specialized agents - Coder, Reviewer, Tester, Deployer - coordinated by an orchestrator and wired into your real environment (GitHub, CI, Datadog, the codebase) is a force multiplier. This is the shift from "AI as a faster autocomplete" to "AI as a team of junior engineers that never sleep."

MCP has quickly become the standard for connecting AI agents to the tools where real work happens - Salesforce, AWS, Atlassian, Notion, and a growing ecosystem of 1,000+ servers.

But here's the uncomfortable truth: connecting agents to tools is easy, doing it at enterprise scale is not.

When every AI client - Claude, ChatGPT, Gemini, Copilot - wires up its own direct connections to every MCP server it needs, you end up with an N×M sprawl of point-to-point integrations. No central visibility, no unified access control. Credentials scattered across clients, no audit trail when something goes wrong. It works in a demo, it collapses in production.

The path forward is an MCP Gateway or a connector - a single control plane between your AI clients and the broader MCP ecosystem.
One layer that handles:

Identity & Access Control: who and which agent can do what
Secret Management: credentials never leak into agent context
Metrics & Audit Logs: full observability of every tool call
Content Filtering: guardrails on what flows in and out
Composite MCP Servers: curated tool bundles for specific use cases

On one side, your MCP hosts and custom agents built on any framework.
On the other, a governed registry of approved servers ready to plug in.

MCP unlocks agent capability, a gateway/connector is what makes it safe, scalable, and enterprise-ready.

You might have built really efficient AI Agents, MCP servers and Skills, but if you don't have a proper context layer to support them, you are wasting your time.

Yes, whenever I talk to developers, they tell me the same problem - Their agents failing and not providing the contextually relevant responses. They used the skills and even routed everythig through MCP server, but still.

It is not having a generic vector database. It is not about LLMs, you might still be using highly efficient LLMs. The problem is, your systems are starving for context because you aren't providing them with enough context.

See, context layer may not be so useful if you are building a generic chatbot or even little advanced RAG application BUT, when you like to automate your engineering or developer workflows, that is where you need a proper context layer. Because, you can't leave everything to the LLM to figure out because LLMs are generic entities and hallucinate and mess up your workflows.

Hence, when it comes to developer workflows, always have an efficient context layer in place. If you wanna know more about how to get started with a context layer for your developer workflows, ping me. Let's talk :)

I hated 3 a.m. calls, so I automated incident response using AI workflows!

Pavan Belagatti — Mon, 11 May 2026 06:55:23 +0000

Agentic Engineering becomes very real the moment a production alert wakes me up at 3:00 a.m. The alert says the checkout service is down. Revenue is impacted. Orders are failing. And now the clock is ticking.

In a typical setup, the first part of incident response is not really problem-solving. It is context hunting. I open PagerDuty for the alert, Datadog for metrics and logs, GitHub to check recent deployments, AWS to inspect infrastructure, and Slack to figure out who owns the service right now. By the time I gather enough information to start diagnosing the issue, 30 minutes are already gone.

That is the core problem Agentic Engineering solves. Engineers usually know how to troubleshoot. What slows them down is that the context they need is scattered across too many tools, and nobody has stitched those tools together into a useful workflow.

That is where an agentic engineering platform like Port comes in. Instead of forcing me to jump between systems, it keeps a live context layer of services, deployments, incidents, infrastructure, owners, and dependencies. Then AI agents use that context to triage incidents, correlate likely causes, surface ownership, and propose next actions in seconds.

Why incident response breaks down in modern engineering teams

Most incident workflows fail long before root cause analysis starts.

The failure is usually operational fragmentation. Every team has great tools, but each tool only answers one slice of the problem:

PagerDuty tells me what fired
Datadog tells me what the system is doing
GitHub tells me what changed
AWS tells me what the infrastructure looks like
Slack tells me who might know what is going on

Individually, these are useful. Together, without orchestration, they create toil.

I end up doing repetitive work under pressure:

tab switching
copy-pasting links and IDs
searching for service ownership
guessing whether a recent deployment caused the incident
manually building a timeline from disconnected signals

This is why Agentic Engineering matters. It is not just about adding AI to DevOps. It is about giving AI the right operational context so it can take useful action inside engineering workflows.

What Agentic Engineering actually looks like in incident response

When I talk about Agentic Engineering, I am talking about systems that do more than summarize text or answer generic questions.

An agentic workflow for incident response should be able to:

ingest the alert automatically
understand which service is affected
correlate the alert with recent deployments
identify the owning team or on-call engineer
pull relevant runbooks and service context
assess severity and business impact
suggest remediation options
send a clean incident summary into collaboration tools like Slack

That is a huge shift.

Instead of spending the first 30 minutes gathering information, I can start with a ready-made triage report. Humans still stay in control of the key decisions, but the boring and repetitive context assembly gets automated.

The foundation: a live context lake for engineering data

The reason this works is that Port maintains what is essentially a live context lake across the engineering stack.

That includes things like:

services
deployments
incidents
owners
infrastructure

Once that operational context is centralized, AI agents can reason across systems instead of treating each tool as an isolated island.

This is one of the most practical expressions of Agentic Engineering I have seen. The AI is not operating blindly. It has access to structured engineering context, which makes its output far more relevant.

Walking through an AI-powered incident triage workflow

Inside Port, I can go to the self-service area and create or trigger actions. In this case, the workflow I care about is incident triage automation.

The action is straightforward: an AI-powered incident triage uses Port's AI agent to analyze incidents, query the catalog, and send formatted results to Slack.

To simulate a realistic production issue, I trigger an incident with the title:

Checkout service returning 500 errors
Once I hit Start Triage, the workflow begins immediately.

What the workflow does behind the scenes

The sequence is simple but powerful:

Fetch incident details
Run AI triage analysis
Update the incident with triage results
Send the formatted results to Slack

This is exactly what Agentic Engineering should feel like. I trigger a workflow once, and the platform performs the repetitive coordination across systems automatically.

What the triage output looks like in Slack

Once the analysis is complete, the incident summary lands in Slack with the kind of structure that is actually useful during an outage.

The triage report includes:

Incident title: checkout service returning 500 errors
Urgency: high
Priority: P1
Service: checkout
Severity: mission critical
Business impact: 30% order failure

That alone already saves time, because the incident has been normalized into a shared operational summary.

But the more interesting part is the context it adds. The system can show insights from Port, identify potentially affected downstream or upstream services, and propose next steps.

In this example, the frontend service is also affected because of the checkout incident. And the suggested actions are concrete, not vague:

roll back the order service deployment immediately
review order API contract changes
process checkout integration checks
monitor error rates after rollback
check integration test coverage between services

This is where Agentic Engineering stops being a buzzword and starts becoming operational leverage. The platform is not just telling me that something is broken. It is helping me reason about what changed, what is impacted, and what I should do next.

Built-in remediation options make the workflow actionable

A good incident summary is helpful. An actionable incident summary is better.

In the Slack message, I also get remediation options such as:

Remediate with Claude
Investigate in Port
Roll back deployment
Update status page

That matters because incident response is a chain of decisions. If the triage output is separated from the next action, engineers still lose time moving between tools. Agentic Engineering works best when diagnosis and execution are connected.

I can choose the right level of automation depending on the situation. If human review is needed, I investigate further. If the rollback path is clear, I can move quickly. If customer communication is necessary, the status page update is right there.

Humans remain in control, but the system removes the coordination burden.

Investigating the incident inside Port

When I click Investigate in Port, I get a more detailed incident workspace.

This page pulls together the key pieces of information I need:

incident title
severity
description
impact
triage summary
business impact
root cause hypothesis
an internal communication message
supporting reports and details

This is a much better starting point than opening five browser tabs and trying to build the story manually.

Using Port Chat to analyze the incident across tools

The most powerful part of this workflow is what happens next.

Inside the incident page, I can open Port Chat and connect the relevant systems and agents. In this example, I enable connectors for:

Datadog
AWS
GitHub

Then I can ask a natural language question like:
Can you please analyze what's happening here with this incident?

Because Port already has the incident context and now also has access to monitoring, infrastructure, and code history, the chat is not answering in isolation. It is reasoning across the actual systems involved.

This is another important principle of Agentic Engineering: agents become far more useful when they can traverse the environment instead of being restricted to a single static prompt.

Why this is different from a generic AI assistant

A generic assistant might help me brainstorm likely causes of 500 errors.

An agentic engineering assistant can:

check which services are related to the incident
inspect recent deployments
look at pull requests that may have introduced breaking changes
reason about cloud infrastructure and service dependencies
return a focused investigation summary tied to the incident

That difference is everything.

The investigation report: root cause, history, and recommendations

After gathering context from the connected systems, Port Chat returns a comprehensive analysis.

The report includes a broad set of useful sections, such as:

Incident overview
Root cause analysis
Recent deployments
Related pull requests
Why checkout is failing if order was deployed
Hypotheses
Historical context
Affected services
Recommendations

That is exactly the kind of report I want during a high-pressure production issue.

I do not just want isolated data points. I want an organized explanation of what likely happened, what changed recently, what dependencies are involved, and what actions are sensible right now.

This is where Agentic Engineering shines. It compresses the time between signal and understanding.

What makes this a self-healing workflow

The phrase self-healing can sometimes sound overly ambitious, so I like to be precise about what it means here.

It does not mean the platform magically fixes every issue on its own with no oversight.

It means the workflow can automate a significant part of the operational response:

collecting the right context
triaging the incident
identifying probable causes
highlighting affected systems
presenting remediation options
supporting rollback or communication paths

In some environments, that may even extend to executing well-defined remediations after approval. In others, it will function as a copilot that accelerates decision-making. Either way, the engineering team gets to spend less energy on operational friction and more energy on actual resolution.

Why Agentic Engineering matters beyond incidents

Although this example focuses on incidents, the broader lesson is about engineering workflows in general.

Anywhere there is repeated context gathering, dependency tracing, or multi-tool coordination, Agentic Engineering can help. Incident management is just one of the clearest and most painful use cases because the cost of delay is obvious.

When a P1 incident hits, every minute matters. Faster triage means:

less downtime
less revenue loss
less stress for the on-call engineer
clearer communication across teams
more consistent operational responses

And importantly, this kind of system scales knowledge. The platform can surface runbooks, ownership information, and historical patterns that would otherwise live in scattered tools or in the head of the most experienced engineer on the team.

The practical takeaway

If your current incident process depends on an engineer manually collecting context from half a dozen systems before they can even begin diagnosing the problem, you do not have an incident response problem alone. You have a workflow design problem.

Agentic Engineering addresses that by connecting systems, preserving context, and letting AI agents execute structured operational tasks on top of that foundation.

What I like about the Port approach is that it keeps humans in control while removing the worst part of on-call work: the frantic scramble for context in the middle of the night.

Instead of spending 30 minutes figuring out what changed, who owns the service, and what might be affected, I can start with a triaged incident, a business impact summary, a root cause hypothesis, affected services, and recommended actions.

That is not just automation for the sake of automation. That is useful engineering leverage.

Final thoughts

Agentic Engineering is one of those ideas that sounds futuristic until you see it applied to a very real problem like incident response.

The value is immediate:

faster context gathering
faster triage
better incident summaries
clear remediation paths
less operational toil

For developers and platform teams, that is a big deal. Production incidents will always happen. The question is whether the first half hour is spent hunting for information or acting on it.

That is the promise of Agentic Engineering, and in this workflow, it is already practical.

If I can turn a 3:00 a.m. alert from a chaotic tab-switching exercise into a guided response with real context and actionable recommendations, that is a win for everyone on call.

This is How I Automated My GitHub PRs with AI Agents & Agentic Workflows!

Pavan Belagatti — Mon, 04 May 2026 07:32:04 +0000

If you want to Automate GitHub PRs, the real goal is not just adding another bot comment to a pull request. The goal is to give reviewers the context they usually have to gather manually: who owns the service, whether it is deployed, whether basic repository standards are in place, and whether the change looks safe to merge.

A useful AI pull request workflow can do exactly that. When a PR opens, it can sync metadata from GitHub, pull operational and ownership context from an internal developer platform, send that context to an LLM, and return a structured review summary plus a risk level. That reduces blind approvals and cuts down on repetitive reviewer questions.

This guide explains how to Automate GitHub PRs using GitHub Actions, Port, a lightweight webhook server, and an LLM such as GPT-4. It also covers what this kind of workflow should evaluate, why a middleware service is needed, and what mistakes to avoid.

What it means to automate GitHub PRs

To Automate GitHub PRs, I am talking about a workflow where opening a pull request triggers an automated review pipeline. Instead of checking only the code diff, the system looks at the broader service context and then posts a structured result back to the PR.

That result can include:

Service ownership
Repository readiness signals, such as a README or CODEOWNERS presence
Scorecard or compliance status
Deployment status, such as staging and production workloads
An AI-generated summary
A risk level, such as low, medium, or high
Suggested action items when something is missing

This is different from a traditional static code review bot. The value comes from combining code events with operational context from systems outside GitHub.

Why teams want to automate GitHub PRs

Most pull request delays are not caused by code syntax alone. They come from uncertainty.

Reviewers often need answers to questions like:

Who owns this service?
Is this service already running anywhere?
Is the repository production-ready?
Does it follow the team’s baseline standards?
Is there enough context to approve safely?

Without automation, someone has to hunt for that information across GitHub, deployment systems, internal docs, and team ownership records. That takes time and usually leads to either delayed merges or weak review quality.

When you Automate GitHub PRs with AI and catalog data, reviewers get a structured starting point within seconds.

What a good automated PR review should check

If you want to build a useful system and not just a noisy one, focus on checks that help humans make better decisions.

1. Ownership

The review should identify the responsible team or service owner. This helps route questions quickly and gives confidence that the change belongs to a known part of the platform.

2. Repository hygiene

Basic project files matter. A README and CODEOWNERS file are simple indicators that the repository follows expected practices. These signals are easy to include and often useful in readiness checks.

3. Scorecard or standards compliance

A scorecard can represent repository quality or policy compliance. In the demonstrated setup, the scorecard level acts as one of the inputs used to judge pull request readiness.

4. Deployment context

Whether a service is deployed to staging or production changes how risky a PR feels. A change to an actively deployed service deserves different attention than a repo that is not yet in use.

5. Risk assessment

The output should classify the PR in a simple, scannable way. A low, medium, or high risk label works well because it gives the reviewer an immediate signal.

6. Summary and action items

The review should not stop at a label. It should explain why the PR was marked a certain way and list any missing prerequisites.

Architecture to automate GitHub PRs

A practical architecture for this workflow has four parts:

GitHub to detect PR activity
Port to hold and expose context about services, scorecards, workloads, and PR entities
A webhook server to coordinate API calls and write results back
An LLM to produce the structured review verdict

The flow works like this:

A developer opens a pull request in GitHub.
A GitHub Action runs and syncs PR data into Port.
Port detects the new PR entity and triggers an automation.
The automation calls a publicly reachable webhook endpoint.
The webhook server fetches related context from Port.
The server sends that context to the LLM.
The LLM returns a structured verdict.
The server posts a review comment to GitHub and writes the summary and risk level back into Port.

Why is Port useful in this workflow

Port acts as the context layer. It is where service metadata, ownership, scorecards, workloads, and pull request entities can live together in a catalog.

That matters because an LLM alone does not know:

Which team owns a given service
Whether the repo has certain governance files
Whether the service is deployed in staging or production
What the latest scorecard or policy status is

By connecting GitHub as a data source and modeling those related entities in a catalog, Port can provide the context the AI needs to produce a more useful PR review.

In this setup, the pull request becomes an entity that can be enriched with fields such as:
AI review summary
AI risk level
Run history
Audit data

How to automate GitHub PRs step by step

Step 1: Connect GitHub to your internal developer platform

Start by integrating GitHub so your platform can detect repositories and pull request activity. In the demonstrated pattern, GitHub is connected as a data source inside Port.

This connection allows pull request details to be synced and associated with the right service or repository metadata.

Step 2: Create a GitHub Action that syncs PR data

The automation begins in GitHub. You need a workflow file that runs on pull request activity and sends the relevant information into Port.

At minimum, the sync should include:

PR number
Title
Branch
Repository
Associated service, if available
Status

This is the event bridge that lets you Automate GitHub PRs with richer catalog-based context instead of relying on code diff events alone.

Step 3: Model the related entities in Port

The automated review is only as good as the context available. The useful entities in this design include:

Service entity with team, ownership, and repository details
Scorecard entity with pass or fail style readiness indicators
Workload entity showing staging and production deployment information
Pull request entity that gets enriched with AI results

If these relationships are incomplete, your AI verdict will be weaker.

Step 4: Add a Port automation to trigger the review

Once the PR entity appears in Port, an automation should fire automatically. This automation sends the event to your webhook server.

That trigger is the handoff from catalog event detection to the external processing logic.

Step 5: Run a webhook server as middleware

This part is essential. Port can trigger workflows and call webhooks, but the actual review process requires a custom layer that can:

Authenticate with APIs
Fetch multiple related entities
Build a prompt with structured context
Call the LLM
Post a GitHub comment
Write fields back into Port

In the demonstrated implementation, this middleware is a lightweight Python application running continuously in the cloud.

That always-on endpoint matters because local development servers are not reliable for production automation.

Step 6: Deploy the middleware somewhere with a permanent public URL

A cloud deployment platform such as Railway works well for this. The important requirement is a stable HTTPS endpoint that Port can call every time a PR event occurs.

If the server is not always available, the automation chain breaks.

Step 7: Send context to the LLM and request a structured verdict

The webhook server should gather the relevant Port data and send it to the LLM in a structured way. The desired output should also be structured, ideally as JSON.

The resulting verdict can include:

Overall approval recommendation
Risk level
Short review summary
Missing requirements
Action items

Structured outputs are much easier to write back into systems and display consistently.

Step 8: Write the result back to GitHub and Port

Finally, the middleware should:

Post a human-readable comment to the PR in GitHub
Update the PR entity in Port with the AI summary
Set the AI risk level field in Port
Record success in the automation run or audit log

This gives both developers and platform teams a clear trail of what happened.

What the PR comment should look like

A good automated PR comment is short, structured, and focused on decision support.

It should answer these questions quickly:

Who owns the service?
What does the scorecard or readiness status say?
Where is the service deployed?
What is the AI verdict?
Are any action items required?

A comment that simply says “looks good” is not enough. A useful automated review should give a reviewer enough context to decide what to inspect next.

Using AI agents and self-service actions

One notable part of this setup is that platform actions and AI agents can be created inside Port itself. That makes it easier to operationalize workflows like:

PR readiness review
PR summary generation
Risk analysis
Other engineering actions such as ticket creation or health reporting

This matters if you want your pull request automation to be part of a larger internal developer platform rather than a standalone script.

Common mistakes when you automate GitHub PRs

Relying only on the code diff
If the AI sees only the changed files, it cannot reason about deployment status, ownership, or baseline readiness. The context layer is what makes the review valuable.

Posting unstructured comments
A long generic paragraph is hard to scan. Use a consistent template with ownership, readiness, deployment, verdict, and action items.

Skipping the middleware layer
Trying to connect everything directly often becomes limiting. A custom webhook server is useful because it can orchestrate multiple API calls and handle bidirectional updates.

Hosting the server locally
For continuous automation, the endpoint must be publicly reachable all the time. A local laptop is not a stable production service.

Overtrusting the AI output
Even if you Automate GitHub PRs, the output should support human review, not replace it entirely. The AI is helping summarize context and flag risk, not acting as the final approver in every case.

Using incomplete catalog data
If service ownership is wrong or workload data is outdated, the PR review will reflect those gaps. Data quality matters as much as prompt quality.

You can automate your developer workflows using Port.io

What this setup is best for

This approach is especially useful for teams that already manage service metadata in a developer platform and want faster, more informed pull request reviews.

It is a strong fit when:

You have many services and ownership is not always obvious
Reviewers frequently ask for operational context before approving
You want PRs enriched with platform metadata automatically
You already use GitHub Actions and can add webhook-based automations

It is less useful if your environment has no structured service catalog yet. In that case, the first step is improving metadata, not adding AI.

A practical checklist to automate GitHub PRs

Use this checklist if you want to implement the same pattern:

Connect GitHub as a data source
Create PR sync automation with GitHub Actions
Model service, scorecard, workload, and PR entities
Create a Port automation triggered by PR creation
Deploy a public webhook server
Fetch Port context inside the server
Send structured context to an LLM
Post the verdict to GitHub
Write summary and risk fields back to Port
Review data quality regularly

Final takeaway

If you want to Automate GitHub PRs in a way that actually helps reviewers, focus on context first and AI second. The most useful automation does not just analyze changed code. It brings together service ownership, readiness signals, deployment status, and a structured verdict in one place.

A setup built with GitHub Actions, Port, a cloud-hosted middleware service, and an LLM can turn pull request reviews from a context-hunting exercise into a faster, better-informed workflow. Done well, this approach gives every PR a head start before a human reviewer even begins.

You can automate any of your developer workflows using Port.io

Developer's Guide to Agent Skills (Hands-On Tutorial)!

Pavan Belagatti — Mon, 20 Apr 2026 06:03:37 +0000

Agent Skills are suddenly everywhere in the AI engineering world, and for good reason. They solve a very real problem: AI agents may be smart, but they still know nothing about your organization unless you explicitly teach them. They do not automatically understand your internal workflows, your service catalog, your production readiness rules, or the exact steps needed to fix recurring issues.

That is where Agent Skills come in. They give your AI agent reusable knowledge, structured instructions, and workflow-specific context so it can do meaningful work instead of acting like a generic chatbot with tool access.

If you have been hearing about skills.md files, MCP servers, Claude, Copilot, and custom agent workflows, this is the missing mental model. Once you get it, the whole ecosystem makes a lot more sense.

Why Agent Skills are getting so much attention

One quick way to understand whether a concept matters is to look at search interest. The term Agent Skills has been climbing fast, especially in recent months. That is usually a sign that people are not just curious, they are actively trying to use something in real projects.

And it makes sense. This is not a niche concept only for AI researchers. Developers, platform teams, engineering managers, and AI engineers can all benefit from it because Agent Skills increase both capability and efficiency for AI agents.

A lot of the early buzz is tied to Claude because Anthropic introduced the concept as an open standard. But the idea is bigger than one model or one company. The important part is that a skill can travel across platforms, which makes it much more useful than a one-off prompt hidden in one tool.

How we got here: from function calling to MCP to Agent Skills

To really understand Agent Skills, it helps to place them in the broader evolution of AI agents interacting with the outside world.

1. Function calling

The first big step was function calling, also known as tool calling. This was when large language models started invoking external tools through a predefined JSON schema. A classic example is something like get weather data for a city.

That was useful, but it had clear limitations:

Manual wiring everywhere. Every function had to be described and connected by hand.
Error handling was your job. If something failed, the system did not really know how to recover intelligently.
Scaling was painful. Every new capability increased developer overhead.

So function calling gave models access to tools, but not much autonomy or reusable workflow intelligence.

2. Model Context Protocol (MCP)

Then came Model Context Protocol, or MCP. This made it much easier to connect AI agents to external tools and data sources through a standard protocol.

The easiest way to think about MCP is as a USB-like standard for AI systems. Instead of custom integrations for every tool, you get a cleaner, more interoperable plug-and-play model. That is why so many companies are now building MCP servers for their own systems and workflows.

MCP was a major leap because it standardized access.
But access alone is not enough.

3. Agent Skills

This is where Agent Skills become important. If MCP gives your AI agent access to external tools and data, Agent Skills teach the agent what to do with those tools and data.

That is the core idea.

Instead of giving an agent only tool access, you package repeatable workflows, domain knowledge, trigger conditions, and repair playbooks into reusable skill files. The agent can then reason through a task in a more structured and specialized way.

Each stage in this evolution shifts more agency from the developer to the system:

Function calling gave the model tool access.
MCP standardized access to tools and data.
Agent Skills gave the model reusable capability and workflow intelligence.

That is why this feels like a truly agentic progression.

What Agent Skills actually are

Agent Skills are folders of instructions and supporting files that package a repeatable workflow, specialized knowledge, or a new capability for your AI agent.

On the surface, that might sound like saved prompts. But they are more than that.

A good skill does not just store text. It defines:

When the skill should activate
What the agent should do step by step
What reference data the agent should use
What remediation playbooks or actions it can follow

So instead of copy-pasting a giant prompt every time you want an agent to do something specialized, you write that capability once and reuse it across sessions and tools.

This is exactly what makes Agent Skills powerful. They turn a general-purpose model into something much closer to a reliable specialist.

Example: Custom deployment skill

Here's an example of a custom skill for deploying services in your organization:

{
  "identifier": "deploy-to-production",
  "title": "Deploy to Production",
  "properties": {
    "description": "Guide for deploying services to production. Use when users ask to deploy, release, or promote a service to production.",
    "instructions": "# Deploy to Production\n\nFollow these steps to deploy a service to production:\n\n## Step 1: Verify prerequisites\n\n- Check that all tests pass.\n- Verify the service has a production-readiness scorecard score above 80%.\n- Confirm the service owner has approved the deployment.\n\n## Step 2: Run the deployment\n\nExecute the deployment action for the target service and environment.\n\n**Example input:**\n- Service: `payment-service`\n- Environment: `production`\n\n**Expected output:**\n- Deployment initiated successfully.\n- Action run ID returned for tracking.\n\n## Step 3: Verify deployment\n\n- Check the action run status.\n- Verify the service is healthy in production.\n- Monitor for any alerts in the first 15 minutes.\n\n## Common edge cases\n\n- If tests are failing, do not proceed with deployment.\n- If scorecard score is below threshold, recommend remediation steps first.\n- If deployment fails, check logs and suggest rollback if needed.",
    "references": [
      {
        "path": "references/deployment-runbook.md",
        "content": "# Deployment Runbook\n\n## Pre-deployment checklist\n\n- [ ] All CI checks pass\n- [ ] Code review approved\n- [ ] QA sign-off received\n\n## Rollback procedure\n\nIf deployment fails:\n1. Revert to previous version\n2. Notify on-call team\n3. Create incident ticket"
      },
      {
        "path": "references/common-errors.md",
        "content": "# Common Deployment Errors\n\n## ImagePullBackOff\nCause: Container registry authentication failed.\nFix: Verify registry credentials.\n\n## CrashLoopBackOff\nCause: Application fails to start.\nFix: Check application logs and configuration."
      }
    ],
    "assets": [
      {
        "path": "assets/deployment-config.yaml",
        "content": "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: {{ service_name }}\nspec:\n  replicas: 3\n  strategy:\n    type: RollingUpdate"
      }
    ]
  }
}

The open standard behind Agent Skills

Agent Skills were originally created by Anthropic and released as an open standard on December 18, 2025, along with the specification and SDK. The standard is now governed as a cross-platform specification at agentskills.io.

The practical implication is huge. A skill created for Claude is not trapped inside Claude. The same skill can work across multiple AI platforms that adopt the standard, including tools like OpenAI Codex, Gemini CLI, GitHub Copilot, Cursor, VS Code, and others.

That portability is what makes this more than another product feature. It is infrastructure for reusable agent behavior.

Why LLMs need Agent Skills in the first place

LLMs are great at general conversation, brainstorming, and broad reasoning. But when workflows become complex, they often become inconsistent. They forget details, miss edge cases, or answer too generically because they do not have the right context.

This becomes painfully obvious in cases like:

Analyzing internal service health
Understanding organization-specific scorecards
Applying a company's engineering rules
Generating precise remediation steps
Working across tools like GitHub, issue trackers, and internal platforms

Agent Skills help bridge that gap. They move the model from passive chat behavior to active, specialized execution grounded in your real systems and workflows.

A practical example: building an Agent Skill with Port.io

To make this concrete, consider a real workflow built around Port.io.

Port is an agentic internal developer platform that helps teams automate engineering workflows. It acts as a central place where developers can see services, ownership, scorecards, readiness, and other operational data without bouncing between a dozen different tools.

In this example, Port's MCP server is connected so the AI agent can access live data from a Port account. Once connected, the agent can pull information such as:

Services in the catalog
Blueprints in the organization
Production readiness states
Scorecard pass/fail data

That gives the agent raw access. Then Agent Skills provide the behavior and context needed to make that access useful.

The three-file structure of this Agent Skill

The example skill is built around a production readiness workflow and uses three main files.

1. skills.md

This is the brain and trigger mechanism of the skill.
It includes:

The skill name
Description
Metadata like author and version
Activation keywords
Instructions for how the agent should behave

In this case, the skill is focused on Port readiness. The description includes keywords such as scorecard, level B, and branch protection so the agent knows when to activate the skill.

It also defines the workflow for diagnosing failures, understanding readiness levels, generating PR descriptions, and suggesting fixes.

2. references/scorecard-state.md

This file contains the factual reference data.
It acts like a snapshot of the actual Port catalog, including the current state of services and scorecard rules. In the example, it includes data for six services and their pass/fail status against readiness rules.

This matters because it stops the agent from answering in vague terms. Instead of saying, "You may need better branch policies," it can say, "This specific service is failing because branch protection is missing and no recent PR activity exists."

3. assets/fix-checklist.md

This file is the remediation playbook.
It gives the agent a step-by-step checklist for fixing failures, such as:

Assigning the correct team
Enabling branch protection
Setting code owners
Ensuring recent PR freshness

So if the reference file tells the agent what is wrong, the checklist tells it how to fix it.

What this skill enables the agent to do

Once these files are in place and the Port MCP server is connected, the AI agent becomes dramatically more useful.

It can answer questions like:

What services are in my Port catalog?
What blueprints exist in my organization?
Why is the travel service failing its scorecard?
Which service is closest to reaching level B?
Write a PR description for Agentic AI explaining the readiness impact.
Assign Agentic AI to the AI team.

And importantly, it can answer these without forcing you to paste all the context into every new conversation.

That is the practical magic of Agent Skills. Context is packaged once, then reused repeatedly.

Understanding Port readiness in this example

The skill in this setup revolves around production readiness in Port.

Port readiness is basically a grading system that tells you how production-ready a service is. The levels include things like A, B, C, and F, depending on how many scorecard rules are satisfied.

In the example workflow, several services are currently at level C. The agent can inspect the rules, explain why a service is still at level C, and tell you what must be done to move it up to level B.

Typical requirements for moving from level C to level B include:

Assigning a team
Enabling GitHub branch protection
Pushing a recent PR

Because the skill has both the scorecard state and the remediation checklist, it can map those rules directly into actionable next steps.

How the interaction feels in practice

After connecting the MCP server and loading the skill into a coding environment like GitHub Copilot agent mode in VS Code, you can work conversationally.

You can ask:

Why is the prompt engineering service failing?
What team is assigned to this service?
How can all my services reach level B?
Can you push a simple PR to this service?

The agent then checks the skill instructions, pulls the relevant facts from the reference file, uses the checklist for remediation guidance, and responds in a way that is specific to your setup.

In the example, the agent can even update team assignments in the scorecard state and suggest exact actions needed to improve readiness.

This is a big shift from normal chatbot usage. Instead of asking broad questions and getting broad answers, you are interacting with an agent that understands your environment and your operational rules.

Why this is more powerful than prompts alone

A long prompt can tell an agent a lot of things once. But it is still fragile.

Prompts are easy to lose, hard to standardize, and difficult to reuse cleanly across teams and platforms. They also tend to degrade over time as workflows evolve.

Agent Skills solve that by separating responsibilities:

The skill file defines behavior and triggers
The reference file provides facts and current state
The checklist file provides action plans

That structure makes the whole system more maintainable, shareable, and predictable.

It also makes it easier to build agents that do not just know tools, but know how your organization actually works.

The bigger takeaway

The important idea here is not just Port, Claude, or one specific tutorial setup. The bigger takeaway is that Agent Skills are a reusable layer of organizational intelligence for AI agents.

You can imagine applying the same pattern to many other internal workflows:

Incident triage
Release readiness
Security policy checks
Onboarding flows
Documentation enforcement
Infrastructure review

As long as the agent has access to the right tools and data through something like MCP, skills can teach it how to reason and act within that domain.

What makes Agent Skills so compelling right now

There are three reasons Agent Skills feel especially important right now.

AI agents are everywhere, but most are still generic.
MCP gives agents access, but not domain behavior.
Teams need reusable workflows, not prompt improvisation every time.

That combination creates the perfect environment for skills to become a foundational pattern.

If the first wave of AI was about generating text, and the second wave was about calling tools, this next wave is about packaging expertise so agents can repeatedly perform meaningful work.

Final thoughts

Agent Skills are one of the clearest signs that AI tooling is maturing from demos into operational systems.

They let you encode workflows once, connect them to real systems, and reuse them across platforms. In practical terms, that means your AI agent can stop acting like an outsider and start behaving like a teammate who understands your stack, your rules, and your goals.
That is the real leap here.

MCP gives your agent the keys. Agent Skills teach it how to drive.
If you want to explore this approach hands-on, the Port-based production readiness example is a great model: connect your data source, define the skill behavior in skills.md, add factual reference state, add a remediation checklist, and then let the agent work against your real environment.

Once you see that flow in action, it becomes obvious why Agent Skills are getting so much attention.

BTW, Try Port for Free!

DORA Report Takeaways + Build Your Own DORA Metrics Dashboard with MCP!

Pavan Belagatti — Thu, 09 Apr 2026 13:04:10 +0000

The question of whether to adopt AI in software development has already been answered. According to the 2025 DORA AI Capabilities Model - based on research from nearly 5,000 technology professionals and over 100 hours of qualitative analysis - close to 90% of developers are already using AI in their day-to-day work. What remains unresolved is not adoption, but effectiveness. Many organizations have equipped their developers with powerful AI tools, yet struggle to translate individual productivity gains into meaningful business outcomes. This disconnect is at the heart of DORA’s latest research.

The report introduces a critical insight: AI is an amplifier. It does not inherently improve systems; instead, it magnifies the strengths and weaknesses that already exist. High-performing teams become faster and more effective, while struggling teams often see their inefficiencies scale. This reframes the entire conversation around AI adoption. Success is no longer about choosing the right tools—it is about building the right foundations. Understanding and investing in these foundations is what determines whether AI becomes a competitive advantage or just another layer of complexity.

The Core Insight: AI Alone Doesn’t Improve Performance

One of the most important findings from DORA is that AI adoption, on its own, has only a modest impact on organizational performance. While developers may experience significant gains in speed and efficiency, these improvements often fail to propagate through the rest of the system. Instead, they are absorbed by bottlenecks in testing, security reviews, approvals, and deployment pipelines. This creates a situation where teams appear to move faster locally, but the overall system remains constrained.

This phenomenon highlights a fundamental truth: software delivery is a system, not a collection of individual tasks. Optimizing one part of the system without addressing the rest leads to imbalances rather than improvements. If data is fragmented, workflows are unclear, or processes are overly complex, AI will simply accelerate these issues. Teams may generate more code, but that code will still face the same downstream friction.

DORA’s research makes it clear that meaningful improvements only emerge when AI is paired with strong technical and cultural capabilities. These capabilities ensure that gains at the individual level can flow through the entire value stream, ultimately impacting organizational performance. Without them, AI remains an isolated productivity tool rather than a transformative force.

Clear and Communicated AI Stance

A clear and communicated AI stance is one of the most foundational capabilities identified by DORA. In many organizations, ambiguity around AI usage creates uncertainty, which in turn slows adoption and increases risk. Developers often fall into two extremes: either they avoid using AI due to fear of violating policies, or they use it freely without understanding the boundaries. Both scenarios lead to suboptimal outcomes.

DORA emphasizes that an effective AI stance must be both comprehensible and communicated. It should clearly define what is expected, what is permitted, and how AI can be safely used within the organization. This clarity provides psychological safety, allowing developers to experiment and adopt AI tools with confidence. Importantly, the stance does not need to be overly restrictive or overly permissive—it simply needs to be well-defined and consistently applied.

The impact of this capability is significant. Organizations with a clear AI stance see improvements in individual effectiveness, organizational performance, and software delivery throughput, while also reducing friction. This is because developers are no longer second-guessing their decisions or navigating uncertainty. Instead, they can focus on using AI effectively within a known framework, which ultimately leads to better outcomes across the board.

Healthy Data Ecosystems

DORA identifies healthy data ecosystems as one of the most impactful capabilities for successful AI adoption. AI systems rely heavily on data, and the quality of that data directly influences the quality of outcomes. Organizations with high-quality, accessible, and well-integrated data see significantly stronger benefits from AI compared to those with fragmented or unreliable data systems.

A healthy data ecosystem is characterized by three key attributes: data must be trustworthy, easily accessible, and unified across the organization. When these conditions are met, AI can operate with the context it needs to produce meaningful and accurate outputs. However, when data is siloed or inconsistent, AI tends to generate results that reflect those inconsistencies, often leading to confusion and rework.

DORA also highlights that poor data environments lead to what can be described as “localized productivity gains.” Developers may work faster with AI, but their output gets slowed down or corrected later in the process due to data-related issues. This prevents organizations from realizing true end-to-end improvements. Investing in data quality, governance, and accessibility is therefore not just a data initiative—it is a prerequisite for making AI effective at scale. Without it, AI becomes a force multiplier for bad data rather than a driver of better outcomes.

AI-Accessible Internal Data

Closely related to healthy data ecosystems is the concept of making internal data accessible to AI systems. DORA distinguishes between simply having good data and ensuring that AI tools can effectively use that data. This capability focuses on connecting AI systems to internal sources such as codebases, documentation, and organizational knowledge. When AI operates without access to internal context, it remains a general-purpose assistant. It can provide useful suggestions, but those suggestions lack specificity and alignment with the organization’s unique systems and practices. In contrast, when AI is connected to internal data, it becomes significantly more effective, offering insights and outputs that are tailored to the organization’s environment.

DORA’s findings show that this capability has a strong positive impact on both code quality and individual effectiveness. Teams that enable AI to access internal data experience more relevant outputs and fewer errors, which reduces rework and improves overall efficiency.

However, this capability also comes with responsibility. Poor-quality or outdated data can lead to poor AI outputs at scale. Organizations must ensure that the data being exposed to AI is accurate, up-to-date, and well-maintained. This reinforces the importance of strong data governance and continuous data hygiene as part of AI adoption strategies.

Strong Version Control Practices

As AI increases the speed and volume of code generation, version control becomes more critical than ever. DORA’s research highlights that AI-assisted development introduces a level of unpredictability, as generated outputs can vary in quality and correctness. This makes it essential for teams to have strong version control practices in place to manage risk effectively.

Frequent commits and the ability to rollback changes are particularly important. DORA found that these practices amplify the positive effects of AI adoption. Frequent commits create a clear and traceable history of changes, making it easier to identify issues and isolate problems. Rollback mechanisms provide a safety net, allowing teams to quickly revert changes when something goes wrong.

This capability enables teams to experiment with AI-generated code without compromising system stability. It transforms version control from a passive tool into an active safeguard that supports safe and continuous development. In an AI-assisted environment, version control is not just about tracking changes—it is about enabling controlled experimentation. Teams that invest in strong version control practices are better positioned to harness the benefits of AI while minimizing the associated risks.

Working in Small Batches

Working in small batches is a long-standing best practice in software development, and DORA reinforces its importance in the context of AI. While AI enables developers to generate large amounts of code quickly, large changes are inherently more difficult to review, test, and integrate. This increases the likelihood of errors and slows down the overall delivery process.

DORA’s research shows that teams working in small batches experience better product performance and reduced friction, even if their perceived individual productivity is slightly lower. Smaller changes are easier to validate, easier to deploy, and less likely to introduce instability into the system.

This capability acts as a counterbalance to the speed introduced by AI. It ensures that rapid code generation does not lead to uncontrolled complexity. Instead, it channels that speed into manageable, incremental improvements. By focusing on small, testable units of work, teams can maintain a steady flow of value while minimizing risk. This approach aligns with the broader goal of turning individual productivity gains into consistent and reliable system-level performance.

User-Centric Focus

DORA’s findings around user-centric focus are particularly striking. The report shows that AI adoption can have dramatically different outcomes depending on whether teams are aligned with user needs. Teams with a strong user-centric focus see improvements in performance, while those without it can actually experience declines.

This highlights a critical point: AI amplifies direction, not just speed. If teams are focused on delivering user value, AI helps them do it faster and more effectively. However, if teams are focused on output rather than outcomes, AI accelerates the production of features that may not deliver real value. Maintaining a user-centric approach requires continuous alignment with user needs. This includes integrating user feedback into development processes, measuring success based on outcomes rather than outputs, and ensuring that development efforts are guided by clear user goals.

In an AI-driven environment, developers must take on a more active role in ensuring that generated outputs align with user expectations. This requires a shift in mindset from simply building features to delivering meaningful outcomes.

Quality Internal Platforms

The final capability identified by DORA is the presence of high-quality internal platforms. These platforms play a critical role in enabling AI adoption at scale by providing standardized workflows, reducing friction, and ensuring consistency across teams. DORA’s research shows that the impact of AI on organizational performance is heavily influenced by the quality of internal platforms. When platforms are well-designed and provide a seamless developer experience, AI-driven improvements can propagate throughout the organization. When platforms are lacking, these improvements remain isolated.

Internal platforms serve as the infrastructure that supports modern software development. They provide the tools, processes, and guardrails that allow teams to build, test, and deploy software efficiently and safely. In the context of AI, they ensure that generated outputs can move smoothly through the delivery pipeline.

By reducing complexity and standardizing processes, internal platforms enable teams to focus on delivering value rather than managing infrastructure. This makes them a key enabler of successful AI adoption.

From AI Adoption to Agentic Workflows

As organizations mature across these capabilities, a broader shift begins to emerge. AI is no longer limited to assisting developers at the code level—it starts to participate in workflows across the software development lifecycle. Tasks such as generating changes, validating outputs, and triggering processes become increasingly automated. This shift can be understood as a move toward more agent-assisted or semi-autonomous workflows, where AI systems operate within defined guardrails to support end-to-end processes.

However, this evolution is only possible when the foundational capabilities identified by DORA are in place. Without strong data, version control, and platforms, introducing automation at the workflow level increases risk rather than reducing it. With the right foundations, however, it enables a new level of efficiency and consistency in software delivery.

The Final Shift: From AI Adoption to Platform Orchestration

As organizations mature across these capabilities, the challenge shifts from adoption to orchestration. Having the right practices in place is no longer sufficient - teams need a central layer that connects systems, enforces workflows, and maintains consistency across the entire SDLC. This is where the quality of your internal platform becomes the defining variable. AI embedded within a strong platform multiplies output. AI layered on top of a weak one multiplies chaos.

The IDP Imperative: Why Your Platform Is the Make-or-Break Variable

The numbers are hard to ignore. According to the DORA report, 90% of organizations already report using an internal developer platform. Gartner projects that 85% of platform engineering teams will have IDPs by 2028, and 80% of large engineering organizations will have dedicated platform teams by 2026. But here is the critical nuance DORA surfaces: having a platform is not enough. Platform quality is the make-or-break variable for AI ROI. When platform quality is high, AI adoption has a strong and measurable positive impact on organizational performance. When it is low, that impact is negligible — no matter how sophisticated the AI tools in use.

This is where the conversation shifts from platform engineering to agentic engineering. The next generation of IDPs cannot simply manage services and workflows - they need to power a shared environment where humans and AI agents run the software development lifecycle together. That requires four critical capabilities: a rich, holistic context lake that correlates data across all environments, services, tools, and policies in real time; orchestration and automation that supports code, low-code, and AI-enabled workflows with governed execution; embedded guardrails and governance with RBAC, confidence thresholds, and human-in-the-loop approval gates; and unified measurement and optimization across DORA metrics, AI impact, and custom standards.

Port.io is built for exactly this. As an agentic developer portal, Port goes beyond traditional IDP functionality by embedding AI workflows directly into the platform layer - giving developers not just visibility and self-service, but intelligent automation that operates within defined guardrails. The result is not just faster developers. It is a system where humans stay in control, teams consistently ship value, and AI incidents stop derailing delivery.

You know, you can build the DORA dashboard inside your Port account to see your engineering performance. See below through a simple walkthrough. Signup to Port right now and start measuring your engineering performance.

This Is How I Automated My Dev Workflow with MCPs - GitHub, Notion & Jira (And Saved Hours)

Pavan Belagatti — Thu, 02 Apr 2026 06:16:48 +0000

AI agents are no longer a novelty - they’re becoming a practical way to speed up engineering work. But there’s a catch: agents don’t do anything useful unless they can access your real systems securely - documentation, tickets, code, deployment details, and operational logs.

That’s where MCP (Model Context Protocol) changes the game. MCP provides a standard way to connect AI systems to external tools and data sources. Yet, once you actually start wiring MCP into an organization, a new problem appears: managing many MCP servers, many permissions, and many integrations across teams - without turning your platform into a fragile routing monster.

This is the gap Port fills. Port acts as a unified, governed interface where your MCP servers live - so developers and AI agents get one entry point, consistent permissions, and connected engineering context.

The core problem with agentic workflows: “Everything is separate”

Most teams have the same reality behind the scenes:

Your documentation is in Notion.
Your code is in GitHub.
Your work tracking lives in Jira (and runbooks may be in Confluence).
Your operational signals are in tools like Sentry or Dynatrace.

When an AI agent (or even a human developer) needs to answer a question like “How do we deploy this service?” it’s not one tool—it’s a chain of tools:

deployment pipeline details
cluster information
team context
runbook/documentation steps
recent failures and quality signals

Without a unifying layer, you end up building custom integrations and custom “routing logic” to decide what tool answers which part of the question.

How MCP helps - and what it doesn’t solve

MCP is like an industry “connector standard” for AI. Instead of inventing new adapters for each tool, you can expose capabilities through MCP servers. This lets agents access external systems in a consistent way.

But even with MCP, you still face an organization-level bottleneck:

you might have multiple MCP servers (Notion MCP, GitHub MCP, Jira MCP, etc.)
each MCP server has its own permissions model
you need a way to ensure users only see what they’re allowed to access
you need a way to keep knowledge consistent as systems change

In other words, MCP solves connectivity. Your org still has to solve governance, orchestration, and “one entry point” experience.

Port’s idea: one governed interface for all your MCP servers

Port positions itself as that unified layer. Think of it as a single, governed gateway sitting in the middle:

Developers connect once (to Port’s MCP server/entry point).
Port routes requests to the correct external MCP servers.
Port enforces permissions so users and agents see only allowed tools/data.
Port consolidates engineering knowledge into a connected experience.

The image below shows the “before vs after” framing—fragmented tool access vs Port’s unified gateway approach.

The practical outcome: you stop managing a scattered web of integrations and start scaling agentic AI across the organization.

This approach shifts from “routing queries” to building connected engineering context. Port doesn’t just pass questions along. The platform synthesizes information across your connected systems into a persistent knowledge graph.

What this means in day-to-day engineering?

SDLC data from one tool can be connected to technical docs from another.
GitHub commit context can be tied to related tickets and discussions.
Agents can analyze patterns (deployments, bottlenecks, quality gaps) using a consistent interface.

So instead of writing custom logic like “if question contains X, query tool Y, then parse Z,” you give the agent one source of truth and let Port handle the orchestration.

Example 1: Notion MCP—make runbooks and onboarding instantly usable

Documentation is often treated as a static knowledge base. But agentic engineering changes the expectations: documentation must be queryable and actionable.

When Notion is connected through an MCP server in Port, you can do things like:

Search and fetch onboarding checklists instantly.
Create structured documentation pages automatically.
Generate a deployment runbook for a new service with service owner and monitoring info.

Realistic use case: a developer asks how to deploy an incident response process. Port fetches the correct runbook from Notion and returns it in context, without the user hunting through Notion pages.

Example 2: GitHub MCP—understand code changes without spelunking through history

GitHub isn’t just where code lives. It’s also where context lives:

pull request discussions
commit history
who changed what
why it changed

With GitHub connected via MCP through Port, agents can answer questions like:

“Why did we change the cache logic?”
“What changed in the payment service last week?”
“What’s the root cause suggested by the PR discussion?”

This shifts engineering from “manual archaeology” to “instant, contextual explanations.” The key advantage isn’t just speed—it’s that the explanation includes the surrounding narrative (PR context, owners, and intent), not just raw diffs.

Example 3: Atlassian Rovo MCP (Jira + Confluence)—incident context in one answer

Incident response is where context fragmentation becomes brutally expensive. At 3:00 a.m., no one wants to bounce between tools to gather:

recent incident history (Jira)
runbooks and procedures (Confluence)
team notes and next steps

Port’s approach with the Atlassian MCP bridge (called Atlassian Robo MCP) connects Jira and Confluence content so agents can answer incident questions as one cohesive response.

Outcome: faster triage, fewer “where is the runbook?” moments, and lower meantime to recovery because the agent can pull the needed context immediately.

Example 4: Cross-tool workflows - create and update artifacts across the SDLC

The most compelling part of this architecture is how it enables workflows that span tools. Port becomes the bridge between systems.

In the demonstration flow, the idea looks like this:

Use Notion data to create a service-related page (e.g., “feature release 2.1”).
Ask Port to push that structured information into another system (e.g., GitHub repository updates).
Query related entities (repositories, Jira issues) to enrich the artifact.
Use the same connected context to trigger or guide next steps.

Instead of hand-carrying information from tool to tool, the agent can operate through Port’s unified interface.

The screenshot below shows creating a Notion page (service-related artifact) from Port by using the connected MCP tools.

Managing MCP servers from one place: Port’s dashboard experience

For platform engineers, the operational challenge is real: once MCP exists, you still need a clean way to onboard it for teams.

Port’s dashboard is designed for that governance layer. Instead of asking developers to wire up MCP servers individually, platform engineers add and configure MCP server integrations centrally.

The process is straightforward:

Go to Port’s data sources/catalog area.
Add the MCP server (for example, Notion, GitHub, Atlassian Rovo).
Choose “when to use” guidance so the agent knows what the MCP server is for.
Connect via authentication and approve which tools are available.
Publish so teams can access the unified interface.

This “configuration as experience” matters. Developers shouldn’t need to understand how MCP servers are wired behind the scenes—they just need reliable answers and safe actions.

Governance and permissions: why this matters for scaling

One of the biggest risks in agentic workflows is accidental access. If your AI can query or modify systems, you need guardrails.

Port’s model emphasizes:

Approved tools only (you can restrict destructive actions).
User-level permissions (OAuth-based access aligns with existing account permissions).
Controlled visibility across teams and roles.

This enables scaling MCP across the organization without turning security review into a permanent blocker.

How to think about the “single entry point” advantage

When people compare internal developer platforms and agent tooling, it’s easy to reduce the conversation to “one UI.” Port’s value is more fundamental:

One entry point to access multiple MCP servers.
One governed interface to reduce integration sprawl.
One framework to keep permissions consistent.
One place where engineering context becomes queryable for agents.

That’s why this approach can genuinely make teams more productive rather than just adding another layer of tooling complexity.

Practical rollout checklist: bring MCP to your org without chaos

If you’re planning an MCP-first agentic setup, here’s a pragmatic way to get started with a unified layer like Port:
1) Start with the “high leverage” tools

Notion for docs/runbooks/onboarding
GitHub for code and change context
Jira/Confluence via Atlassian Robo for planning and incidents

2) Decide what actions are safe
Not every agent action needs write permissions on day one. Start with read-only where possible, then expand.

3) Define “when to use” descriptions for each MCP server
This helps the agent select the right tool for the right job—and reduces incorrect queries.

4) Build cross-tool workflows intentionally
Choose one workflow that’s painful today (e.g., incident triage, release note creation, onboarding). Then wire it end-to-end through Port so value is obvious quickly.

5) Keep governance in the platform layer
Developers should not have to manage routing logic, authentication, and tool availability per MCP server. Port should.

MCP becomes scalable when you add the governed layer

MCP makes it possible to connect AI agents to external tools in a standard way. But the real engineering breakthrough comes when you turn many MCP servers into a single, governed interface.

Port’s approach—unifying and orchestrating MCP connections, enforcing permissions, and enabling cross-tool context—helps teams stop switching between tools and start building agentic workflows that actually scale. If you’re exploring MCP for agentic engineering, focus on the “last mile” first: one entry point, governed access, and connected context across your SDLC systems.

Well, Port is free to use. I want you all to experience the power of agentic automation for your dev workflows.

Learn How to Build Reliable RAG Applications in 2026!

Pavan Belagatti — Mon, 19 Jan 2026 07:00:50 +0000

LangChain is a developer framework for connecting large language models with data, tools, and application logic. This guide walks through a practical step-by-step workflow to build a Retrieval-Augmented Generation (RAG) document chat: upload documents, chunk and embed them, store embeddings in a vector database, and serve a chat UI that answers only from retrieved context. Use this as a checklist and hands-on recipe for production-style LLM applications.

Here is my complete hands-on video guide below.

Below is the complete code repo to try

pavanbelagatti / LangChain-RAG-Application

LangChain RAG Application (DocChat Pro)

This repository contains a Retrieval-Augmented Generation (RAG) application built using LangChain, Streamlit, and SingleStore. The app allows you to upload documents (PDF, TXT, or Markdown), automatically chunk and embed them, store embeddings in SingleStore as a persistent vector database, and chat with your documents using a ChatGPT-like interface.

The project demonstrates how LangChain connects document loading, text splitting, embeddings, retrieval, and prompt templates into a reliable AI workflow. It also includes source citations, retrieval debugging, and a reset option for clean demos.

This is a practical, production-style example of building a real AI application—not a toy chatbot.

View on GitHub

How LangChain evolved

Before LangChain, developers used LLMs mainly via standalone prompts. That approach left large gaps: no built-in data connectors, no standard way to persist embeddings, limited support for multi-step logic, and no standardized memory or agent tooling. LangChain was created to fill these gaps by providing composable primitives and patterns for LLM-powered apps.

Key milestones in LangChain's evolution:

Open-source modular library that standardizes document loading, splitting, embeddings, and retrievers.
Agent and chain patterns that let you sequence LLM calls and tool invocations in reproducible workflows.
Integrations with vector databases, hosts, and model providers to avoid vendor lock-in.
Growth in community and tooling, with managed runtimes and observability emerging around LangChain patterns.

Why use LangChain and when it matters

LangChain is a developer framework that makes it easy to build LLM-powered applications by connecting language models to data sources, vector stores, prompts, memory, and tools. It is not an LLM itself; it is the scaffolding that turns LLMs into reliable, maintainable systems.

LangChain is useful when you need LLM responses tied to custom, up-to-date, or proprietary data and when you want predictable, auditable results. Instead of relying purely on prompt tweaks or costly fine-tuning, LangChain helps you assemble components - loaders, splitters, embeddings, vector stores, retrievers, chains, and prompts - into a repeatable pipeline.

Core LangChain components - overview

LangChain organizes common functionality into composable components. Understanding each component helps you design correct, debuggable applications.

LLMs (model interfaces)

The LLM component is a thin adapter that calls a model provider (OpenAI, Anthropic, local models, etc.). LangChain gives a uniform API so you can swap models without rewriting the rest of your app.

Loaders and Indexes

Loaders ingest documents (PDFs, HTML, text, spreadsheets). Index-like modules prepare content for retrieval by preserving metadata and mapping pieces of text to retrievable records.

Text splitters and chunking

Splitters break long documents into chunks sized to fit model context windows. Proper chunking balances context completeness and retrieval precision.

Embeddings

Embedding models convert text chunks and queries into numeric vectors that capture semantic meaning. LangChain wraps embedding providers so you can change models consistently.

Vector stores (vector databases)

Vector stores persist embeddings and support similarity search. LangChain provides connectors for many vector databases and vector-enabled SQL stores.

Retrievers

Retrievers are configurable search layers that use embedding similarity, filters, or hybrid search to fetch relevant chunks for a query.

Chains

Chains are sequences of modular steps: call a retriever, format a prompt, call an LLM, post-process the answer. Chains let you compose robust workflows with predictable behavior.

Agents and tools

Agents combine LLM reasoning with tool execution (APIs, calculators, search). LangChain includes patterns for creating agent loops with toolkits and stopping conditions.

Memory

Memory modules manage conversation state - short-term for session context and long-term for persistent user data. Memory is essential for chat experiences that require context continuity.

Prompt templates

Prompt templates are reusable instruction blueprints. They standardize system messages, user instructions, and context injection to make outputs predictable and auditable.

Tutorial: What we will build?

A typical LangChain RAG pipeline contains these stages. Plan them before writing code:

Document ingestion and metadata extraction.
Text splitting and chunking strategy (size, overlap).
Embedding generation with a chosen embedding model.
Store embeddings in a vector store with metadata.
Query embedding and retrieval (top-K, filters).
Construct a prompt combining retrieved context and user query.
LLM response generation and attribution (sources/similarity scores).

Step 1: Define scope, data, and success criteria

Before coding, decide:

Data types: PDFs, DOCX, HTML, CSV, internal wiki pages.
Latency and scale: number of documents and query QPS.
Accuracy expectations: must answers strictly cite docs or can it hallucinate?
Monitoring: logs for retrieval results, source hits, and LLM outputs.

Step 2: Environment and core libraries

Install the core packages and provider SDKs. Replace provider names with your chosen LLM and vector DB.

pip install langchain streamlit openai singlestoredb[client] tiktoken

Set environment variables securely for API keys and vector DB credentials (do not commit .env to source control).

Step 3: Ingest documents and split into chunks

Goal: convert each input document into coherent chunks that fit the model's context window and preserve meaning.

Recommended splitter settings

Chunk size: 500–1000 tokens (or 800–1200 characters depending on language)
Chunk overlap: 100–200 tokens to preserve context across splits
Prefer semantic boundaries (sections, paragraphs) over fixed-length cuts when possible

Example ingestion pattern (pseudo-real code using LangChain idioms):

Step 4: Create embeddings and store them in a vector database

Convert text chunks into vectors with an embedding model and persist them to a vector store. Choose a persistent vector DB (SingleStore, Pinecone, Milvus, Chroma, etc.) for production.

Important metadata to store with each vector:

source document id or file name
chunk index or position
original text snippet for provenance
timestamp or ingestion batch id

Generic embedding + store pattern:

Notes:

If using a managed vector DB, create the collection/table with proper indexing (HNSW/IVF etc.).
Batch embedding calls to improve throughput and reduce cost.
Store embeddings and text separately if you need to re-embed with another model later.

Step 5: Build the retriever and RAG chain

Core idea: for each user query, run a semantic search against the vector store to retrieve top-k candidate chunks, then pass those chunks plus the query to the LLM with a strict prompt that instructs the model to only use the provided context.

Retriever configuration

Top-k (k): 3–10 depending on average chunk length
Similarity metric: cosine is common for OpenAI embeddings
Filter by metadata: restrict to a document set or date range if needed

Example RAG flow (LangChain style):

Return source documents (or their URLs) to provide citations in the UI and to reduce hallucination risk.

Step 6: Build a simple Streamlit chat UI

Key UI features:

File upload with immediate "Build / Upsert" button
Toggles for chunk size, overlap, top-k, and temperature
Streamed LLM responses plus a sidebar showing retrieved sources and debug info
Button to reset or drop the knowledge base for demos

Minimal Streamlit sketch (abbreviated):

Show sources next to each answer using the metadata stored with vectors.

Step 7: Tune, test, and monitor

Tuning checklist:

Adjust chunk_size and chunk_overlap until retrieved contexts are coherent.
Control the LLM temperature: set to 0.0–0.2 for high factuality.
Adjust top_k: more context can help but increases prompt length and noise.
Implement answer gating: if the highest-similarity result score is below a threshold, refuse to answer or escalate to human review.

Monitoring and logs to add:

Query traces: query, retrieved doc ids, similarity scores.
LLM outputs and tokens used (cost monitoring).
Feedback collection UI to flag incorrect answers and retrain or re-curate data.

Common pitfalls and how to avoid them

Pitfall: Chunking too small. Result: context torn into fragments, leading to wrong or incomplete answers. Fix: increase chunk_size or use semantic splitting.
Pitfall: Chunk overlap too high. Result: duplicate context leading to longer prompts and higher cost. Fix: balance overlap to preserve transitions only.
Pitfall: Not storing provenance. Result: impossible to cite or debug answers. Fix: save source filename, page, and chunk id for each vector.
Pitfall: Open-ended prompts that allow the model to hallucinate. Fix: use strict system prompts and instruct the model to respond "I don't know" when context is insufficient.
Pitfall: Ignoring vector DB scaling. Fix: plan index parameters and re-shard or re-index as dataset grows.

When to choose fine-tuning or retrieval vs prompt engineering

Prompt engineering: low cost, best for short-term tweaks and small scope tasks.
RAG (recommended): best when you need up-to-date, auditable answers tied to documents. It avoids expensive model retraining.
Fine-tuning: choose for enterprise-level domain adaptation where you control the model and cost/latency tradeoffs, or when you need model-level behavior change not achievable with prompts.

Security and governance considerations

Encrypt credentials, enforce least privilege for vector DB access.
Remove or redact sensitive text before storing embeddings when compliance requires it.
Log queries while respecting privacy and retention policies.
Provide an allowlist/denylist for documents or terms if needed.

Troubleshooting examples

Low-quality answers despite relevant docs

Check retriever scores: if similarites are low, embeddings may be mismatched or chunking wrong.
Increase top_k or expand chunk_overlap to provide more context.
Ensure embeddings model and similarity metric align (e.g., OpenAI embeddings work well with cosine).

Model drifts or outdated facts

RAG ensures answers are grounded in indexed docs; re-index documents periodically or on every significant update.
Prefer real-time ingestion for highly dynamic sources.

Practical checklist before launch

End-to-end test with representative queries and documents
Automated unit tests for ingestion and retrieval
Cost forecast for embeddings and LLM usage
Monitoring for retrieval hit-rate and source coverage
Rate limits and graceful degradation for high load

Screenshots and visual debugging

Inspect the UI for upload progress and the vector DB dashboard to verify stored embeddings and metadata.

FAQ

How does LangChain reduce hallucinations?

By combining retrieval (vector search) with generation. The model receives specific, relevant document chunks as context and a strict instruction to answer only from that context. Returning source documents for every answer enables verification and debugging.

Do I need to fine-tune my LLM if I use LangChain?

Not necessarily. For most document-grounded applications, RAG provides strong results without fine-tuning. Fine-tuning is useful if you require model-level behavior changes or want to reduce repeated prompt tokens for very large or high-volume deployments.

What settings matter most for retrieval quality?

Chunk size, chunk overlap, embedding model choice, top-k, and similarity threshold. Also ensure your text splitter preserves semantic boundaries where possible.

Can LangChain switch LLM providers easily?

Yes. LangChain is designed to be provider-neutral: swap LLM and embedding providers by changing the integration class and configuration without rewriting the pipeline logic.

Which vector database should I use?

Choose based on scale and latency needs. For prototypes, lightweight stores FAISS should work. But for production, consider managed or scalable options such as SingleStore. Evaluate costs, persistence, query latency, and SDK maturity.

Summary and next steps

LangChain is a practical framework to build reliable, data-grounded LLM applications. Follow the steps in this guide to ingest documents, create embeddings, persist vectors in a scalable store, and assemble a retriever + LLM pipeline with strict prompts. Focus on chunking, metadata for provenance, and monitoring retrieval quality. Start with a small pilot: upload sample documents, tune chunk settings, and iterate on prompt constraints before scaling.

Ready-to-run components to assemble: a document loader, a robust text splitter, an embeddings layer, a persistent vector store, a retriever, a constrained prompt template, and a lightweight UI. Combine these with monitoring and governance to move from prototype to production.

LangChain vs LangGraph: How to Choose the Right AI Framework!

Pavan Belagatti — Thu, 04 Dec 2025 08:07:26 +0000

Why this comparison matters - LangChain vs LangGraph

I build practical LLM-powered software and have seen two patterns emerge: straightforward, linear pipelines and stateful, agentic workflows. The question "LangChain vs LangGraph" is not academic. It determines architecture, maintenance, and how the system reasons over time.

When I say "LangChain vs LangGraph" I mean comparing two different design philosophies. LangChain is optimized for linear sequences: take input, run one or more LLM calls in order, store or return the result. LangGraph is optimized for graphs: nodes, edges, loops, and persistent state across many steps.

Core idea of LangChain

I use LangChain when the workflow is essentially A then B then C. LangChain provides a standardized framework that saves developers from hard coding integrations, prompt scaffolding, or manual tool orchestration.

Prompt templates - reusable templates that accept variables and generate consistent LLM inputs.
LLM-agnostic connectors - easy swaps between OpenAI, Anthropic, Mistral, Hugging Face models, and more.
Chains - the core abstraction: compose multiple steps so each output feeds the next.
Memory - short-term or long-term conversational context, useful for stateful chat but limited compared to full state machines.
Agents and tools - let models call APIs, calculators, or external services in a structured way.

LangChain makes developers productive fast. For prototyping prompts, building simple RAG systems, or creating a question-answering pipeline that reads from a vector store and returns a single response, LangChain is an efficient choice.

Core Idea of LangGraph

LangGraph is built on top of LangChain concepts but rethinks workflows as graphs. I think of LangGraph when the system must persist complex state, loop, make decisions, or orchestrate multiple specialized agents.

Nodes - discrete tasks: call an LLM, fetch from a database, run a web search, or invoke a summarizer.
Edges - define conditional transitions, parallel branches, or loopback paths.
State - dynamic context that evolves across nodes: messages, episodic memory, and checkpoints.
Decision nodes - native support for conditional logic and routing to specialist agents.

LangGraph treats the application as a state machine. Nodes can loop, revisit earlier steps, and perform multi-turn tool calls. This enables agentic behaviors such as reflection, iterative retrieval, or progressive refinement of answers.

Side-by-side differences - practical checklist for LangChain vs LangGraph

I like to reduce technology choices to a checklist. For "LangChain vs LangGraph" here is the practical comparison I use when deciding which to adopt.

Flow type

LangChain: linear and sequential.
LangGraph: cyclic and graph-based with loops.

State management

LangChain: limited conversational memory.
LangGraph: rich, persistent state across nodes and sessions.

Conditionals and loops

LangChain: simple branching and one-shot tool calls.
LangGraph: built-in conditional edges, loops, and checkpoints.

Complexity and agents

LangChain: well-suited to simple chatbots, RAG, or ETL-like LLM pipelines.
LangGraph: suited to multi-agent systems, autonomous agent behavior, and long-running workflows.

Human in the loop

LangChain: possible but not native.
LangGraph: checkpointing and human-in-the-loop are first-class patterns.

When I weigh "LangChain vs LangGraph", I consider not only current needs but expected future complexity. If the app might grow into a multi-agent orchestration or needs persistent state and retries, starting with LangGraph can save refactors.

When to pick LangChain

I recommend LangChain when you need speed of development and your workflow is straightforward. Typical scenarios include:

Text transformation pipelines: summarize, translate, or extract information and save results.
Prototyping prompts and testing chains quickly.
Single-turn user interactions such as customer support responses.
Basic RAG systems that perform retrieval from a vector store and return a single synthesized answer.

LangChain is excellent for these tasks because it provides plug-and-play components - prompt templates, retrievers, and chain combinators - letting you ship quickly without building orchestration primitives yourself.

When to pick LangGraph

I reach for LangGraph when autonomy, iteration, and state are required. Choose LangGraph when your system needs:

Multi-step decision making that can loop until an exit condition is met.
Routing queries to specialist agents depending on context.
Persistent state across many LLM calls and user interactions.
Sophisticated tool usage, including multi-turn web searches, summarization, and aggregation of external sources.

For example, I built an email drafting agent that retrieves user preferences, consults a calendar, drafts an email, asks for clarifications, and then iteratively refines the draft. That kind of workflow maps naturally to LangGraph.

Hands-on walkthrough - a practical LangChain example

I often demonstrate concepts with a RAG example using a vector store. The LangChain pattern looks like this:

Install the required packages and configure API keys.
Create prompt templates that accept variables such as "objective" and "topic".
Initialize an LLM or local model connector via Hugging Face, OpenAI, or other providers.
Store documents in a vector database and create a retriever.
Build a retrieval-augmented generation chain that retrieves context and synthesizes answers.

This pattern stays linear: retrieve relevant docs then generate an answer. It suits many FAQ bots, documentation assistants, and single-pass pipelines. The code is compact and easy to iterate on, which is one of the core advantages when comparing "LangChain vs LangGraph".

Hands-on walkthrough - a practical LangGraph example

Now imagine the same task but with the added need to fetch fresh web results when the local corpus lacks recent information. A LangGraph workflow looks like this:

Load static content into a vector store from URLs or documents.
Create graph nodes: retrieve, web search, decision, and generate.
Define state: track whether the retrieved results answered the user, store interim summaries, and record tool outputs.
Connect nodes with conditional edges: if local retrieval fails, route to web search; if web search yields too many noisy results, ask clarifying questions; loop back as needed.
Run the graph and allow it to iterate until a stop condition is met, then return the final synthesis.

This pattern enables multi-turn tool use and agentic reasoning. In my tests, asking a LangGraph agent about "latest AI developments this month" triggers a web search node when the local knowledge is stale. The agent fetches, summarizes, and checks whether the summary is adequate before presenting it. That behavior highlights the distinction when comparing "LangChain vs LangGraph".

Common patterns and anti-patterns

Over time I found patterns that help decide between "LangChain vs LangGraph". Use them as heuristics.

Pattern: Start simple - If the problem is single-pass, build with LangChain to validate your prompts quickly.
Pattern: Evolve to graph - If your single-pass pipeline accumulates conditionals and stateful checkpoints, refactor into a LangGraph graph incrementally.
Anti-pattern: Premature complexity - Avoid implementing a full graph when no loops or persistent state are needed. Over-engineering reduces clarity and increases maintenance cost.
Anti-pattern: One-off tool calls - If you need repeated or multi-stage tool orchestration, a linear chain becomes fragile. LangGraph's native edges and state are better suited.

Example architecture templates

Here are two templates I reuse frequently depending on the "LangChain vs LangGraph" decision.

Template A - LangChain RAG pipeline

User query → Retriever → LLM prompt → Result → Store conversation (optional)
Good for document Q&A, help centers, and chatbots where each request is largely independent.

Template B - LangGraph agentic pipeline

User query → Retrieve → Decision node (sufficient?) → If no, Web search node → Summarize → Reflect/loop → Final generate → Persist episodic memory
Good for dynamic information requests, research assistants, and multi-agent workflows that need iterative reasoning.

Practical tips for migration and scaling

If you start with LangChain and need to migrate to LangGraph, I recommend the following:

Identify the branching points in your LangChain where decision logic begins to appear.
Extract prompt templates and retrievers as independent modules that can be used by graph nodes.
Introduce a lightweight state store so node outputs can be persisted across invocations.
Replace monolithic chains with nodes that encapsulate a single responsibility: retrieval, web search, summarization, or validation.

Scaling a LangGraph system requires operational considerations: durable state storage, idempotency of nodes, observability of edges, and human checkpoints for expensive actions. Planning for those early prevents surprises when workflows become long-running.

Final decision guide - quick checklist

When I decide between "LangChain vs LangGraph", I run through this checklist:

Is the workflow single-pass? Choose LangChain.
Does it require looping or complex decisioning? Choose LangGraph.
Will the system need to call multiple tools over time? Lean LangGraph.
Are you prototyping or exploring prompts? Start with LangChain.
Do you expect long-term sessions and persistent context? LangGraph is preferable.

Closing thoughts

Both frameworks share a common goal: make building with LLMs easier. The difference is architectural intent. LangChain shines for linear orchestration and rapid prototyping. LangGraph shines for stateful, agentic, and cyclic workflows that require coordination, persistence, and multi-turn tool usage.

When I evaluate "LangChain vs LangGraph" for a product, I balance time to ship against future complexity. If you expect your system to become an autonomous assistant or coordinator, start with a graph mindset and migrate components in. If you need a fast, maintainable pipeline today, LangChain will likely serve you well.

LangChain goes like this - A then B then C, follows a pre-defined path. LangGraph on the other hand, follows a dynamic path. It starts with A, then it decides if it needs B or C. It can go to C directly depending on the scenario. Loop, and repeat until the goal is satisfied.

If you want to reproduce the examples I described, begin with prompt templates and a small vector store for LangChain. For LangGraph, model nodes as single-responsibility components and define clear state schemas for the data that flows through the graph.

Complete code examples below.

LangChain RAG Tutorial: https://github.com/pavanbelagatti/LangChain-SingleStore-Package
Agentic Workflow Tutorial: https://github.com/pavanbelagatti/LangGraph-Agentic-Tutorial

Below is my complete video on understanding more about LangChain vs. LangGraph.

Transformers: The Magic Engine Behind ChatGPT, Gemini & Every Modern AI Model!

Pavan Belagatti — Mon, 17 Nov 2025 07:52:47 +0000

I want to walk you through one of the most important breakthroughs in modern artificial intelligence. The model family called Transformers changed everything about how machines read, understand, and generate language. In this article I explain why Transformers were invented, how they work, and why they sit at the core of systems like GPT, BERT, LLaMA, Claude, and Gemini. I will start from the basics and build up step by step so you can see the full story from simple neural networks to the powerful attention based architecture that powers today's most generative AI systems.

Why we needed a new architecture?

When I first learned about sequence processing in AI I noticed a consistent pattern. Early neural networks were great at classifying static inputs like images or tabular data. But language is not a static object. Language unfolds as a sequence. Words depend on earlier words and sometimes on words that appeared many steps before. If a model cannot remember or focus selectively across the whole sequence, it will lose important context. That is the problem Transformers were built to solve.

Transformers came into the world to overcome two main limitations. First, earlier models struggled to carry long distance context. Second, those models were often slow to train because they processed tokens one by one. Transformers solved both problems by introducing a powerful mechanism called attention and by processing sequences in parallel. That single change unlocked much larger models, faster training, and far better handling of long context. That is why Transformers now power nearly every large language model and many other AI systems.

Machine learning and deep learning

Image credits: ResearchGate.Net

Let me set the scene by explaining where Transformers sit in the big picture. Artificial intelligence is a broad field. Within it, machine learning is the branch that gives machines the ability to learn from data rather than follow explicitly coded rules. Within machine learning, deep learning is a specialization that uses multi layer artificial neural networks to learn complex patterns from large datasets. Transformers are an architecture within deep learning. They are a specific neural network design that excels at dealing with sequences such as text and speech.

Machine learning has three common learning paradigms that are worth recalling because they influence how models are trained and used.

Supervised learning: The model learns from labeled examples. For example, you show many images labeled cat or not cat. The model learns the mapping from image to label and can then predict on new images.
Unsupervised learning: The model finds structure in unlabeled data. Clustering customers by behavior or learning useful vector representations of words are typical examples.
Reinforcement learning: The model learns by trial and error, maximizing rewards. This is common in game playing or robotics where actions lead to feedback signals.

Artificial neural networks (ANNs) and their limitations

Artificial neural networks, or ANNs, are inspired by the brain. They consist of neurons arranged in layers. Each neuron receives inputs, computes a weighted sum, applies a non linear function, and passes a signal forward. Classic feed forward networks work well for image recognition and many other tasks where the entire input can be treated as a static snapshot.

However feed forward ANNs have a key limitation when it comes to language. They do not have a built-in mechanism to remember earlier words. If you present a sentence to a feed forward network, it sees the sentence as a fixed vector. It does not inherently model sequences or temporal dependencies. Language is not a collection of isolated tokens. Words interact over time. For instance consider the pair dog bites man and man bites dog. The same words appear in both phrases but the meaning is inverted by order. Feed forward methods do not track order naturally. That is why sequence specific models were developed.

Recurrent neural networks (RNNs) and the memory problem

Image credits: GeeksForGeeks

Recurrent neural networks, or RNNs, were the first widely used family of models designed for sequential data. The core idea is intuitive. Rather than treating the input as a static vector, an RNN reads tokens one at a time and maintains a hidden state or memory vector that summarizes what it has seen so far. Each new token updates the hidden state. This memory is then used to predict the next token or the output label. RNNs therefore give the model a way to remember previous context as the sequence unfolds.

RNNs were a major step forward, but they had two serious drawbacks.

Vanishing and exploding gradients. When training RNNs with long sequences, gradients that propagate back through many steps tend to vanish or explode, making it hard to learn long range dependencies. Variants like LSTM and GRU mitigated this, but the core issue remained challenging.
Sequential computation. RNNs process tokens one by one. This sequential nature makes training slow and prevents efficient parallelization on modern hardware. As models grew larger and datasets exploded, this became a severe bottleneck.

So we had a class of models that could remember, but only for a limited number of steps, and they were slow to train. A new idea was needed. That idea is attention.

Attention: the key idea

Image credits: Wikipedia
Attention is a mechanism that allows a model to look selectively at different parts of the input sequence when producing each output. Instead of relying solely on a single hidden state to carry all past information, attention lets the model compute a direct measure of relevance between any two tokens in the sequence. It answers a simple question for every pair of tokens: how much should token A pay attention to token B?

Why is that powerful? Because attention breaks the sequential bottleneck and allows the model to connect distant tokens directly. Consider the sentence The cat sat on the mat and it was fluffy. When interpreting the word it, attention helps the model link it directly to cat even though the tokens between them might be several steps long. This alleviates the forgetting problem that RNNs faced.

A key property of attention is parallelism. Attention computations can be done for all token pairs in parallel. This enables much faster training on modern GPUs and TPUs. Attention also makes it easier to scale to very large models and very long sequences.

Attention is All You Need

That phrase comes from the landmark 2017 paper ‘Attention is All You Need’ that introduced the Transformer architecture. The paper showed that a model built entirely around attention, without recurrent operations, could match or beat prior sequence models on machine translation and other tasks. Crucially, the paper demonstrated that attention based models are faster to train and scale better.

Let's dive into Transformers

At a high level, a Transformer is a neural network architecture that relies primarily on attention mechanisms to process sequences. It replaces the recurrent parts of previous models with attention based blocks and feed forward networks wrapped with normalization and residual connections. Transformers operate on the entire sequence at once and learn relationships between tokens through attention.

A Transformer typically has two major components in the original design: an encoder and a decoder. The encoder reads and builds a representation of the input. The decoder generates the output sequence based on that representation. Many modern variants use only the encoder or only the decoder depending on the task. For example, BERT is encoder only and is used for understanding tasks. GPT models are decoder only and are focused on generation. The general architecture and the attention concept are shared across all these variants.

High level flow

Here is the simplified flow you can keep in mind.

Input tokens are converted into embeddings, numeric vectors that capture meaning.
Positional information is added to embeddings so the model knows token order.
The encoder applies stacked layers of multi head self attention and feed forward networks to produce contextualized representations.
The decoder uses masked self attention to generate tokens step by step while also attending to the encoder outputs to ground generation on the input.
The final decoder output is passed through a linear layer and softmax to convert scores into probabilities for the next token.

Key components of Transformers

To understand Transformers in more detail, I will break down the most important pieces and explain what each does and why it matters.

1. Token embeddings and positional encoding

Text is discrete and machines need numbers. The first step is to convert each token into a vector. Embeddings capture word meaning in continuous space. Similar words or words that appear in similar contexts end up with similar vectors.

Transformers process the entire sequence in parallel, so they need explicit information about token order. That is the role of positional encoding. We add a positional vector to each token embedding. This combined vector tells the model both what the token is and where it is in the sequence. Without positional signals the model would not be able to distinguish dog bites man from man bites dog.

2. Self attention and scaled dot product

The core operation inside Transformers is self attention. For each token we compute three vectors: the query, the key, and the value. Queries and keys are used to compute attention scores that tell us how much one token should attend to another. Values carry the information that will be combined weighted by those attention scores.
Mathematically, we take the dot product of the query for token i with the key for token j, scale the result, and apply softmax across j to get attention weights. Those weights are used to compute a weighted sum of the value vectors, producing a new representation for token i that incorporates information from other tokens. This is done in parallel for all tokens.

3. Multi head attention

Multi head attention means we compute several independent attention operations in parallel and then concatenate their outputs. Each attention head can focus on different types of relationships. For example one head might learn to track subject verb agreement while another head learns to attach pronouns to their referents. Multiple heads give the model richer, more diverse ways to relate tokens.

4. Add and norm

Residual connections and normalization are critical for training deep models. After each attention or feed forward block we add the block input to the block output and normalize the result. This stabilizes gradients and enables training much deeper stacks of layers. Conceptually, add and norm helps the model combine new transformed information with the original signal while keeping the training dynamics stable.

5. Feed forward networks

Each Transformer layer contains a position wise feed forward network. This is a small two layer neural network applied independently to each position. It increases the model capacity by allowing non linear transformation of each token representation. Feed forward layers are applied after attention and help the model refine the contextualized representation.

6. Masked attention in the decoder

When generating sequences autoregressively, the model should not peek at future tokens. The decoder uses masked self attention so each position can only attend to previous positions and itself. This prevents cheating and ensures the model learns to predict the next token from what it has generated so far.

7. Cross attention from decoder to encoder

In the encoder decoder design, the decoder includes attention layers that attend to encoder outputs. This cross attention step lets the decoder use the encoder representation of the input as context while generating output. It is the mechanism by which the decoder grounds its generation on the input sequence.

8. Final linear and softmax

After the decoder produces the final contextualized vectors, a linear projection maps those vectors to vocabulary sized logits. Softmax converts the logits into probabilities over the vocabulary. The highest probability token is chosen as the next output, or a sampling strategy can be used to introduce diversity.

Putting it all together: encoder and decoder

Let me summarize the encoder and decoder roles in concrete terms.

Encoder: Takes the input sequence, converts tokens to embeddings, adds positional information, and applies N stacked layers of multi head self attention followed by feed forward networks. The encoder outputs a set of contextualized vectors, one per input token. Those vectors capture how each token relates to others in the input.
Decoder: Starts with output token embeddings plus positional encoding. It uses masked self attention to process the partial output sequence generated so far. Then it uses multi head cross attention to attend to the encoder outputs. It further refines the combined information with feed forward layers and finally produces logits that are converted to probabilities for the next token.

Repeat these blocks and stack many layers. Each layer refines the representation, enabling complex features and long range dependencies to be captured. That is the power of deep Transformers.

Why Transformers are so effective

I can condense the reasons why Transformers succeeded into a few connected points.

Parallelism. Unlike RNNs, Transformers process all tokens simultaneously. This unlocks massive speedups on GPUs and TPUs, making it feasible to train on very large datasets.
Direct long range interactions. Attention connects any pair of tokens directly, so models can capture relationships across long distances without needing to propagate information through many intermediate steps.
Scalability. Transformers scale well with model size and data. Increasing layers, hidden sizes, and heads generally leads to better performance when sufficient data and compute are available.
Flexibility. The same architecture can be applied to language, vision, audio, and multimodal tasks. The only changes necessary are tokenization and sometimes positional encodings.
Interpretability. Attention weights provide a rough, often useful signal about which tokens a model is focusing on. While not a definitive explanation tool, attention maps give us intuition about the model behavior.

Common analogies to understand attention and Transformers

I like using a few simple analogies to make intuition stick.

Reading a paragraph. When you read a paragraph, you do not reread every previous sentence in order to understand the current sentence. Your mind jumps to the most relevant earlier lines. Attention does the same. It lets the model jump to the most relevant tokens.
Searchlight. Think of attention as a searchlight that shines on relevant words. Multi head attention is multiple searchlights, each tuned to a different pattern such as subject tracking, negation detection, or coreference resolution.
Index cards on a table. Imagine laying all words out as index cards. Instead of stacking them and reading sequentially, you can scan across the table and pick the exact card you need. Transformers make it possible to scan the whole table at once.

Concrete examples

Examples cement understanding. Consider the simple sentence: ‘The cat sat on the mat and it was fluffy’. When the model generates the token, direct connections will allow the model to link it back to the cat token even though several tokens separate them.

Another example is translating a long sentence where the verb in the first clause must agree with a subject in a much later clause. RNNs struggled to retain that subject information across many steps. Transformers handle this by letting the decoder attend directly to the subject token in the encoder outputs.

Finally, consider tasks where relationships are non local. For instance in code generation, a function defined early can be called much later. Attention enables the model to relate the call site and the definition directly.

Variants and modern practice

Although I described the original encoder decoder Transformer, modern systems vary.

Encoder only: Models like BERT use only the encoder. They are trained to produce high quality contextualized representations and are suited for classification, question answering, and feature extraction tasks.
Decoder only: Models like GPT use only the decoder and are trained autoregressively to predict the next token. These models are natural for generation tasks like chat and story writing.
Encoder decoder with modifications: Machine translation and many sequence transduction tasks still use encoder decoder Transformers, often with task specific adjustments.
Sparse and efficient Transformers: Researchers are working on variants that reduce the quadratic cost of attention with respect to sequence length, enabling longer context windows at lower compute cost.

Practical implications

The arrival of Transformers led directly to the era of large language models. Because Transformers scale effectively, researchers built increasingly large models trained on web scale data. Those models exhibit surprising capabilities in translation, summarization, question answering, code generation, and more. A few practical consequences are worth noting.

Foundation models: Large pre trained Transformer based models serve as foundations that can be fine tuned or prompted for many downstream tasks.
Transfer learning: Pre training on large unlabeled corpora followed by supervised fine tuning or prompt engineering unlocked rapid progress across NLP tasks.
Multimodality: Transformers can be extended to multiple modalities simply by changing tokenization. Vision Transformers treat image patches as tokens, enabling a unified architecture across text and vision.
Computation and cost: The flip side of scaling is cost. Training large Transformers is expensive and energy intensive. This has pushed work on efficient architectures, distillation, and parameter efficient fine tuning.

From Transformers to Production: The Role of Data Infrastructure

While Transformers revolutionized how models process language, deploying these systems at scale introduces a critical challenge: managing the embeddings they produce. When models like GPT or BERT convert text into vector representations, those embeddings need to be stored, searched, and combined with enterprise data in real time. This is where specialized data infrastructure becomes essential.

SingleStore addresses this challenge by providing a unified platform that handles both vector embeddings and traditional enterprise data. The platform offers indexed Approximate Nearest Neighbor search that delivers up to 1000x faster vector search performance compared to precise methods, making it practical to search through millions of embeddings in milliseconds.

For generative AI applications, SingleStore enables Retrieval Augmented Generation, a pattern where relevant enterprise data is matched against user queries using semantic search before being sent to language models. This grounds Transformer-based systems in factual, company-specific information and reduces hallucinations.

The platform combines vector similarity search with full-text search, SQL analytics, and support for multiple data types including JSON and time-series data. It integrates with leading AI frameworks like LangChain, OpenAI, Hugging Face, and AWS Bedrock, simplifying the path from prototype to production.

Through SingleStore Notebooks, developers can prototype AI applications using familiar Jupyter-style interfaces while maintaining enterprise-grade security and performance. This bridges the gap between the theoretical power of Transformer architectures and practical deployment requirements that handle real-time data at scale.

Limitations and ongoing challenges

Transformers are powerful, but not perfect. Here are some key limitations and open problems I think about.

Quadratic attention cost: Vanilla attention computes interactions between all token pairs, which scales quadratically with sequence length. For very long contexts this becomes prohibitive.
Data and compute hunger: State of the art performance often requires enormous datasets and massive compute budgets. This limits who can train the largest models from scratch.
Hallucinations and factuality: Generative models can produce fluent but incorrect statements. Attention alone does not guarantee truthfulness.
Interpretability: While attention gives some interpretability, fully understanding why large models produce specific outputs remains challenging.

Summary and final thoughts

In practical terms Transformers brought three major shifts. First they allowed much larger models to be trained efficiently. Second they enabled models to learn complex, long range dependencies that earlier architectures struggled with. Third they provided a flexible framework that can be adapted to many modalities and tasks.

If you take away one point it is this. Attention changed the game. By letting models focus on the most relevant parts of a sequence no matter where they appear, Transformers made machines much better at understanding and generating language.

Know more about Transformers in my in-depth YouTube video.

What is Context Engineering!

Pavan Belagatti — Thu, 16 Oct 2025 09:27:27 +0000

AI systems have evolved so much that anyone can build highly agentic autonomous systems with no-code or low-code platforms/tools. We have come a long way from LLM chatbots to RAG systems to AI agents, but still there is one challenge that persists: context. LLMs are only as good as the information they have at the moment of reasoning. Without the right data, tools and signals, they hallucinate, make poor decisions or simply fail to execute reliably. Your AI systems should be equipped with proper context so that they are highly efficient and deliver value. This is where Context Engineering emerges as a discipline to optimally provide the right context at the right time to your AI systems.

In this article, we’ll dig deeper into the world of context engineering and understand everything about it. Let’s get started.

What is context engineering?

Unlike prompt engineering, which focuses mainly on crafting clever instructions for LLMs, context engineering is the systematic discipline of designing and optimizing the surrounding environment in which AI systems operate. It goes beyond prompts to carefully structure the data, tools, information and workflows that maintain the overall context for an AI system. By doing so, context engineering ensures that tasks are executed not just creatively, but reliably, consistently and intelligently.

At its core, context engineering acknowledges that an LLM by itself knows nothing relevant about a task. Its effectiveness depends on the quality and completeness of the context it receives. This involves curating the right knowledge sources, integrating external systems, maintaining memory across interactions, and aligning tools so the AI agent always has access to what it needs, when it needs it. Small gaps in context can lead to drastically different outcomes — errors, contradictions or hallucinations.

That’s why context engineering is emerging as one of the most critical practices in building robust AI applications. It’s not just about telling the model what to do; it’s about setting up the stage, the rules and the resources so the AI can make better decisions, reason effectively and adapt to real-world complexity.

Prompt engineering vs. context engineering

Context engineering is fundamentally superior to prompt engineering because it addresses the core limitation of AI systems: they only know what you give them.Prompt engineering is like giving someone instructions without any background information, tools or reference materials. You're constantly trying to cram everything into a single question, hoping the AI remembers enough to answer correctly. It's unreliable — the same prompt can produce different results, and there's no way to maintain consistency across interactions or access real-time data.

Context engineering treats the AI as part of a complete system. Instead of relying on clever wording, you architect the entire environment: you integrate knowledge databases so the AI accesses accurate information, connect external tools and APIs so it can perform real actions, implement memory systems so it remembers previous interactions, and establish workflows that ensure consistent, predictable behavior.

The difference is profound. Prompt engineering is about asking better questions. Context engineering is about building better systems. One produces occasionally impressive outputs; the other creates reliable, production-ready applications.

Small gaps in context lead to hallucinations, errors and failures. Context engineering eliminates these gaps systematically, ensuring the AI always has what it needs to make intelligent decisions and deliver consistent results in real-world applications.

RAG vs. context engineering

The RAG pipeline starts with a query from the user. That query is transformed into an embedding, a vector representation that captures semantic meaning. The system then performs a vector search across a knowledge base to find the most relevant pieces of information. Using Top-K retrieval, it selects a handful of the most similar results. These are then “stuffed into context” and fed into the LLM (Large Language Model). While this approach enriches the model with external knowledge, it is often rigid — relying heavily on similarity search and lacking adaptability in how context is used.

On the right, context engineering builds on this idea but adds sophistication. After the query, it introduces a context router that decides how best to process and route the information. This router supports three key processes: selection (choosing the most relevant pieces), organization (structuring information logically), and evolution (adapting and improving context dynamically). These steps produce an optimized context, which is then passed to the LLM.

The difference is clear: RAG fetches and dumps context, while context engineering curates, structures and evolves it, leading to more accurate, reliable and contextually aligned outputs.

The Role of MCP in Context Engineering

Model context protocol (MCP) has been the talk of the town for AI applications as a universal USB to plug & play with any tools & data sources. Instead of working with every API, MCP helps you manage everything in one place. The MCP serves as a critical foundation in context engineering, acting as a standardized intermediary between diverse data sources and AI models to deliver structured, actionable context for intelligent applications.

MCP eliminates the complexity of bespoke integrations by providing a universal interface for databases (such as SQL, NoSQL, and vector stores), APIs, file systems, and external analytics tools. Through its four essential capabilities—standardized interface, context aggregation, dynamic retrieval, and security—MCP seamlessly collects, normalizes, and governs real-time data flow from multiple systems.

Within context engineering, MCP enables dynamic context elicitation: it fetches, assembles, and secures relevant information tailored to the AI model’s current intent or task, vastly improving response relevance and grounding output in real, up-to-date enterprise knowledge. Developers utilize MCP servers to expose organization-specific data and permissions, while AI agents (such as LLMs) connect through MCP clients to intake context in machine-understandable formats, respond to user queries, and adapt outputs based on the latest data.

SingleStore exemplifies the practical power of MCP in AI workflows. Its MCP server bridges LLMs and SingleStore’s high-performance databases, enabling natural language queries, workspace management, SQL execution, and even schema visualization—directly via AI assistants like Claude or development tools. The SingleStore MCP server authenticates with enterprise databases, manages user-specific sessions, enforces access control, and provides seamless, context-rich interactions for both operational and analytical tasks—making it a flagship implementation of context engineering in modern enterprise AI.

Building context-aware workflows with SingleStore

The diagram illustrates a simplified context engineering workflow built around SingleStore as the long-term memory layer. It begins with the user input, which serves as the query or problem statement. The system then performs retrieval and assembly, where relevant context is fetched from SingleStore using vector search and combined with short-term memory such as recent chat history to build a complete, context-rich prompt. This enhanced prompt is then passed to the LLM or AI agent, which processes it, performs reasoning and optionally executes external tool calls to generate a coherent, informed response.

The final stage is write-back memory, where the generated answer, conversation insights and any new knowledge are stored back into SingleStore. This ensures that every new interaction strengthens the system’s contextual understanding over time. The result is a self-improving, context-aware workflow — the essence of context engineering in action.

Context-aware tutorial with SingleStore

Go to SingleStore, create a workspace and a database to hold the context.

Create a new notebook and start working

Step 1: Install required packages & dependencies

pip install openai langchain langchain-community langchain-openai singlestoredb --quiet

Step 2: Import requiredlLibraries and initialize components

from langchain_openai import OpenAIEmbeddings  # works after installing langchain-openai
from langchain_community.vectorstores import SingleStoreDB
from openai import OpenAI

Step 3: Set up SingleStore and OpenAI credentials

SINGLESTORE_HOST = "Add host URL"   # your host
SINGLESTORE_USER = "admin"                     # your user
SINGLESTORE_PASSWORD = "Add your SingleStore DB password"    # your password
SINGLESTORE_DATABASE = "context_engineering"   # your database
OPENAI_API_KEY = "Add your OpenAI API key"
Step 4: Connect to the SingleStore Database
connection_string = f"mysql://{SINGLESTORE_USER}:{SINGLESTORE_PASSWORD}@{SINGLESTORE_HOST}:3306/{SINGLESTORE_DATABASE}"

Step 5: Initialize embeddings and OpenAI client

embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY)
client = OpenAI(api_key=OPENAI_API_KEY)

Step 6: Initialize the SingleStore vector database

from langchain_community.vectorstores import SingleStoreDB
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY)

vectorstore = SingleStoreDB(
    embedding=embeddings,
    table_name="context_memory",
    host=SINGLESTORE_HOST,
    user=SINGLESTORE_USER,
    password=SINGLESTORE_PASSWORD,
    database=SINGLESTORE_DATABASE,
    port=3306
)

Step 7: Insert knowledge into long-term memory

docs = [
    {"id": "1", "text": "SingleStore unifies SQL and vector search in a single engine."},
    {"id": "2", "text": "Context engineering ensures AI agents always have the right context at the right time."},
    {"id": "3", "text": "SingleStore is ideal for real-time RAG pipelines due to low-latency queries."}
]

# Insert into vector DB
vectorstore.add_texts([d["text"] for d in docs], ids=[d["id"] for d in docs])
print("✅ Knowledge inserted into SingleStore")

Step 8: Retrieve relevant context

query = "Why is SingleStore useful for context engineering?"
results = vectorstore.similarity_search(query, k=2)

print("🔹 Retrieved Context:")
for r in results:
    print("-", r.page_content)

Step 9: Build prompt for LLM

from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)

user_input = "Explain context engineering using SingleStore."

context = "\n".join([r.page_content for r in results])

prompt = f"""
You are a helpful AI agent.
User asked: {user_input}
Relevant context from memory:
{context}
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

print("🔹 Agent Answer:\n", response.choices[0].message.content)

Step 10: Store conversation back (short-term → long-term memory)

vectorstore.add_texts([
    f"User: {user_input}", 
    f"Assistant: {response.choices[0].message.content}"
])


print("✅ Conversation stored back into SingleStore for future retrieval")

Step 11: Test retrieval again

followup_query = "What did we discuss earlier about context engineering?"
followup_results = vectorstore.similarity_search(followup_query, k=3)

print("🔹 Follow-up Retrieved Context:")
for r in followup_results:
    print("-", r.page_content)

The complete notebook code is present in this GitHub repository.

The future belongs to context-driven AI

As AI systems become more capable, the real differentiator won’t be bigger models — it will be better context. The ability to deliver the right data, at the right time, in the right format will define how useful and reliable AI truly becomes. Context engineering transforms isolated LLMs into intelligent systems that understand, remember and act with purpose.

By embracing this discipline, developers can move beyond clever prompts and instead build context-aware ecosystems where memory, reasoning and execution work in harmony. Frameworks like LangChain and databases like SingleStore make this vision practical — offering unified storage, hybrid search and high-speed retrieval that bring context to life.

In short, context engineering isn’t just a new buzzword — it’s the backbone of the next generation of AI. The sooner we master it, the closer we get to building AI systems that don’t just respond, but truly understand.

A Hands-On Guide to Model Context Protocol (MCP)!

Pavan Belagatti — Fri, 01 Aug 2025 07:33:56 +0000

In the rapidly evolving AI landscape, one of the most exciting developments is the Model Context Protocol, or MCP. This open-source protocol is transforming how large language models (LLMs) interact with external tools and data sources, enabling smarter, more context-aware AI applications. As someone deeply fascinated by AI and its real-world applications, I want to take you on a detailed journey into MCP — what it is, why it matters, and how you can start building your own MCP-enabled applications, especially using SingleStore as a powerful backend.

Whether you’re a developer, AI engineer, or data scientist, this guide will provide a clear, step-by-step walkthrough and practical insights to help you harness MCP and elevate your AI projects. Let’s dive right in!

What is MCP? An Introduction to Model Context Protocol

MCP stands for Model Context Protocol. At its core, MCP is an open-source standard initially developed by Anthropic to standardize the way AI systems, particularly large language models, interact with external tools and data sources.

Why is this important? Traditional LLMs are incredibly powerful but limited by their training data, which is static and can quickly become outdated. While retrieval-augmented generation (RAG) techniques allow LLMs to access external knowledge bases or documents, they fall short when it comes to interacting with dynamic tools or performing actions beyond reading data. This is where MCP shines.

MCP allows LLMs to access real-world data and applications beyond their initial training datasets. It enables AI agents to perform actions like querying databases, managing projects, or even creating notebooks — all in a standardized, secure, and scalable way.

Image credits Descope

Think of MCP as a universal remote or a USB-C port for AI applications: it provides a universal interface to connect any tool, service, or data source seamlessly to your AI models. This opens the door to building agentic AI applications that can automate complex workflows and interact with multiple external systems effortlessly.

Key Components of MCP

MCP image from the report

To understand how MCP works, it’s crucial to know its three main components:

Hosts: These are AI-powered applications where users interact with the AI, such as cloud desktops, integrated development environments (IDEs), or chatbots. This is your playground where the magic happens.
Clients: These modules exist within the host applications and manage the connections to servers. They act as intermediaries, facilitating communication between the host and the external resources.
Servers: These are wrappers around external tools or data sources, exposing their capabilities to AI applications in a standardized way. Examples include a GitHub server or a SingleStore server, each representing a specific external system that the AI can interact with.

By structuring MCP this way, the protocol ensures modularity and flexibility, making it easy to add or swap out servers without disrupting the overall system.

How MCP Works: A Practical Overview

Credits: MCP workflow image by the report Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

As we already know, the MCP (Model Context Protocol) workflow demonstrates how AI agents and applications can seamlessly access and utilize external resources through a standardized protocol. The process begins when a user submits a prompt (like requesting the latest AAPL stock price via email) to MCP Hosts such as chat applications, IDEs, or AI agents.

These hosts perform intent analysis to understand the request, then communicate through a Transfer Layer that handles the initial request, response, and notifications between clients and servers in a 1:1 relationship. The MCP Servers, which include various services like development tools, databases, and applications (represented by icons for services like GitHub, Gmail, Google Drive, and SQLite), receive these requests and leverage their specific capabilities — including access to Tools, Resources, and Prompts. Based on the request requirements, the servers perform tool selection and orchestration, potentially invoking APIs to access external data sources such as web services, databases, or local files.

The system can also trigger notifications and sampling mechanisms as needed, ultimately delivering the requested information back through the same pathway to fulfill the user’s original request, creating a comprehensive ecosystem where AI applications can securely and efficiently interact with diverse external resources and services.

Much Simpler MCP Flow

When a user interacts with an MCP-enabled AI application, the AI uses the protocol to access external information or trigger actions in other applications.

For example, you could ask the AI to:

Search a database for specific information
Create a task in a project management tool
Add dummy data to your database for testing
Create a notebook environment for data analysis_

These operations happen in real-time, enabling the AI to be context-aware and agentic — meaning it can act autonomously based on the context it has gathered externally.

MCP Through Practical Hands-On

Step 1: Meet SingleStore — The Ideal Database for MCP

For MCP to be truly powerful, it needs a backend that can keep up with real-time data needs and support versatile querying capabilities. This is where SingleStore comes in.

SingleStore is a relational database that supports vector data and hybrid search, making it perfect for RAG applications and serving as a vector database for AI models. Its high performance and real-time capabilities make it an excellent choice for MCP servers.

With SingleStore, you can store, query, and manage your data efficiently, and integrate it seamlessly with MCP to empower your AI applications.

Step 2: Setting Up Your SingleStore MCP Server

Setting up your own SingleStore MCP server is simpler than you might think. Here’s a step-by-step guide to get you started:

Clone the GitHub Repository: The SingleStore MCP server is open source and available on GitHub. The repository includes an installer and the MCP server code, enabling seamless integration.
Prepare Your Environment: Ensure you have Python installed, along with necessary dependencies such as uvicorn. You’ll also need a SingleStore account, which offers a free tier with credits to get you started.
Initialize the MCP Server: Using the repository’s provided commands, run the init command in your terminal or VS Code. This sets up the MCP server quickly and efficiently.
Authenticate with SingleStore: You’ll need to authenticate your SingleStore account to allow the MCP server to access your databases securely.
Connect Your MCP Client: Use an MCP-enabled client like a cloud desktop or chatbot to connect to your SingleStore MCP server. This client will manage interactions between you and the server.

Once connected, you are ready to start interacting with your SingleStore database through MCP!

Step 3: Exploring MCP Server Capabilities with SingleStore

With your SingleStore MCP server up and running, you can now explore various operations that showcase how MCP enhances AI capabilities.

Creating a Database: Start by asking your MCP client to create a new database. For example, you can say, “Create a database named test in my workspace.” The MCP server will handle the request, authenticate your workspace, and create the database for you.

This process demonstrates how MCP abstracts away the complexity of database management and lets you operate with natural language commands.

Adding Dummy Data: Next, you can instruct the MCP client to add dummy data to your new database. The server will:

Create tables such as employees, products, and orders
Populate these tables with sample records
Generate SQL commands behind the scenes to execute these tasks

For instance, the server might create an employees table with columns like first name, last name, email, department, and salary, then insert several sample employee records.

This feature is invaluable for developers and data scientists who want to quickly prototype and test SQL queries or AI workflows without manually setting up data.

Running Queries and Analyzing Data: After populating your database, you can query it using natural language or SQL commands. For example, you might ask for “all employees grouped by department,” and the MCP server will execute the SQL query and return aggregated data like employee counts and average salaries per department.

This capability enables dynamic data exploration and empowers AI agents to provide actionable insights based on real-time data.

Here is the complete step-by-step video tutorial below.

Step 4: Verifying Your Data in SingleStore

It’s always good to verify that the MCP server executed your commands correctly. You can log in to your SingleStore account and navigate to your workspace to check the following:

The newly created database (e.g., test)
The tables created (employees, products, orders)
Sample data inserted into these tables

By viewing the data directly in SingleStore’s dashboard or data studio, you gain confidence that your MCP server is working as expected and that your AI client can interact with the database seamlessly.

Step 5: Automating Workflows with MCP Servers

Beyond simple CRUD operations, MCP servers open the door to automating complex workflows. Here’s what you can do:

Schedule jobs to run at specific intervals
Create and manage notebooks for data analysis
Take snapshots of your database state
Trigger actions in external applications based on AI decisions

By building MCP servers around your favorite tools and services, you can create a unified AI ecosystem where your models not only understand data but also act on it intelligently and autonomously.

Why MCP is a Game-Changer for AI Applications

The promise of MCP lies in its ability to overcome limitations that have traditionally held back LLMs:

Real-Time Context: MCP enables LLMs to access up-to-date information rather than relying solely on static training data.
Tool Integration: LLMs can interact with a wide range of external tools, from databases to project management apps, expanding their usefulness.
Standardization: MCP provides a standardized protocol, meaning developers can build modular, interoperable AI systems without reinventing the wheel.
Agentic AI: With MCP, AI agents can take autonomous actions based on context, opening new horizons for automation and intelligent decision-making.

For anyone serious about building next-generation AI applications, understanding and leveraging MCP is essential.

Final Thoughts and Next Steps

Model Context Protocol is truly a revolutionary step forward in making AI models smarter, more flexible, and more capable of interacting with the real world. By standardizing how AI connects to external data and tools, MCP unlocks new possibilities for building agentic AI applications that can automate workflows, analyze data, and perform tasks autonomously.

Using SingleStore as an MCP server backend provides a robust, high-performance platform that supports complex querying and vector search, making it an ideal partner for MCP-powered AI systems.

If you’re eager to get hands-on, I highly encourage you to visit the SingleStore MCP server GitHub repository, sign up for a free SingleStore account, and try setting up your own MCP server. Experiment with creating databases, adding dummy data, and running queries. This practical experience will deepen your understanding of MCP and prepare you to build powerful AI applications.

Remember, the future of AI is not just about smarter models but about smarter interactions — and MCP is leading the way.

Useful Links to Get Started

Thanks for joining me on this deep dive into MCP. I hope this guide empowers you to explore and innovate with this exciting protocol.

Happy building!

Top AI Coding Assistants Every Developer Should Try!

Pavan Belagatti — Fri, 30 May 2025 08:49:54 +0000

The software development landscape has been revolutionized by AI coding assistants, transforming how developers write, debug, and optimize code. These intelligent tools have evolved from simple auto-completion features to sophisticated AI companions that understand context, generate entire functions, and even explain complex codebases.

With the rapid advancement of large language models and machine learning, today's AI coding assistants offer unprecedented capabilities—from real-time code suggestions and bug detection to automated testing and refactoring. Whether you're a seasoned developer looking to boost productivity or a newcomer seeking guidance, the right AI coding assistant can significantly accelerate your development workflow and code quality.

Let us talk about som eof the best AI coding assistants available in the market right now.

GitHub Copilot

The pioneer in AI code completion that works directly in your IDE, offering intelligent code suggestions and completions across multiple programming languages. It operates within integrated development environments (IDEs) like VS Code, JetBrains, or browser-based platforms, supporting a wide range of programming languages.

GitHub Copilot now offers multiple pricing tiers, starting with a free plan that includes 2,000 auto-completions and 50 premium requests per month. The Pro plan at $10/month provides unlimited completions, while the new Pro+ plan at $39/month offers access to premium models like GPT-4.5 and 1,500 premium requests monthly.

The platform excels in code review assistance, tracking work progress, and suggesting commit descriptions. It's particularly strong for developers working on collaborative projects, with features for organizational license management and IP indemnity for enterprise users. Students and open-source maintainers often qualify for free access, making it accessible to a broader developer community.

Cursor

One of the best AI developer tools in 2025. AI software development, Cursor is a powerful AI-first code editor that provides contextual assistance and code generation capabilities.

Cursor features an innovative Agent Mode for end-to-end task automation, "Cursor Tab" for highly predictive multi-line autocomplete, and powerful context management using .cursorrules for project-specific AI behavior customization. It's one of the most sophisticated and feature-rich AI-powered IDEs available, combining AI tools with extensive manual control and designed for developers who want precision and a wide array of options for code generation, editing, and debugging.

The platform excels at modifying existing code and making context-aware suggestions based on recent changes and linter errors. Its new agent mode can generate code across multiple files, run commands, and automatically determine required context without manual file selection.

Amazon Q Developer

Amazon's entry into AI coding assistants that evolved from CodeWhisperer, integrating with JetBrains IDEs and VS Code via a plugin, and uniquely also provides a CLI agent designed to handle large projects and multiple tasks.

Amazon Q Developer stands out with its enterprise-focused approach, offering specialized capabilities for AWS cloud development and infrastructure management. The platform provides real-time security scanning and vulnerability detection, making it particularly valuable for enterprises prioritizing secure code development.

Its CLI agent is uniquely positioned to handle complex, multi-repository projects and can assist with deployment automation and cloud resource management. The tool integrates seamlessly with Amazon's broader ecosystem, including AWS CodeCommit, CodeBuild, and CodeDeploy, providing a comprehensive development experience for teams already invested in Amazon's cloud infrastructure.

Windsurf

Listed among the best AI developer tools in 2025. AI software development, this is a newer entrant focused on providing comprehensive development assistance.

Windsurf generally has a cleaner UI compared to Cursor's, feeling like comparing an Apple product to a Microsoft one. Unlike Cursor where you usually have to add context manually or tag the codebase, Windsurf automatically analyzes the codebase and chooses the right file to work on.

The platform features advanced capabilities like Cascade, Supercomplete, and Memories, designed to boost developer productivity using the AI flow. Windsurf's step-by-step workflow is intuitive and offers superior automatic context detection. It excels in natural language code editing and provides sophisticated auto-completion that anticipates coding patterns. The editor's memory system learns from your coding style and project patterns, making suggestions increasingly personalized over time.

Aider

One of the top AI developer tools. AI software development, Aider specializes in helping with code modifications and refactoring.

Aider effortlessly integrates into your existing development practices, providing sophisticated features like natural language code editing, smart auto-completion, and context-sensitive recommendations. By anticipating your next coding step and aligning with your unique coding style, it becomes an indispensable tool for legacy code maintenance.

Aider excels in understanding complex codebases and can suggest architectural improvements and code optimization strategies. The platform is particularly valuable for teams working with technical debt, as it can identify areas for improvement and suggest refactoring approaches that maintain functionality while improving code quality. Its command-line interface makes it ideal for developers who prefer terminal-based workflows and batch processing of code modifications.

Cody by Sourcegraph

An AI-powered assistant specifically designed for working with complex or legacy codebases, helping you find what you need in seconds, explains unfamiliar logic, and suggests refactoring improvements.

Cody leverages Sourcegraph's powerful code search and analysis capabilities, making it exceptionally effective for large enterprise codebases with millions of lines of code. The platform provides advanced code graph analysis, allowing it to understand complex dependencies and relationships across multiple repositories.

It excels in explaining unfamiliar code patterns and can provide historical context about code changes and their rationale. Cody's integration with Sourcegraph's code intelligence platform enables it to offer insights about code usage patterns, potential security vulnerabilities, and compliance issues.

The tool is particularly valuable for onboarding new team members to complex projects and for maintaining code quality standards across large development teams.

Tabnine

A popular AI coding assistant that provides intelligent code completions and works across multiple IDEs and programming languages.

Tabnine offers both cloud-based and on-premises deployment options, making it suitable for organizations with strict security requirements. The platform supports over 30 programming languages and integrates with more than 15 IDEs, providing consistent AI assistance regardless of your development environment.

Tabnine's local AI models ensure that sensitive code never leaves your environment, addressing privacy concerns common in enterprise settings. The tool learns from your team's coding patterns and can be trained on your specific codebase to provide more relevant suggestions. Its focus on privacy and security, combined with flexible deployment options, makes it a preferred choice for financial institutions, healthcare organizations, and other highly regulated industries.

Microsoft IntelliCode

Another popular AI coding assistant that enhances Visual Studio and VS Code with AI-powered recommendations.

IntelliCode leverages Microsoft's extensive experience in developer tools and integrates deeply with the Visual Studio ecosystem. The platform uses machine learning models trained on thousands of open-source projects to provide contextually relevant suggestions that go beyond simple auto-completion.

It offers whole-line completions and can suggest variable names, function signatures, and code patterns based on your project's context. IntelliCode's tight integration with Microsoft's development ecosystem includes seamless support for .NET, Azure services, and Microsoft's broader developer toolchain. The tool provides personalized recommendations by learning from your coding habits and team conventions, making it particularly effective for teams standardized on Microsoft technologies.

Powering AI Development with Modern Data Infrastructure

As developers increasingly integrate AI coding assistants into their workflows, the underlying data infrastructure becomes crucial for building custom AI applications.

SingleStore offers a unified platform combining traditional database capabilities with advanced vector processing, making it ideal for developers building AI-powered tools.

With native vector database functionality, real-time performance, and support for RAG (Retrieval-Augmented Generation) applications, SingleStore enables developers to create sophisticated semantic search systems, documentation tools, and context-aware development assistants. Its hybrid search capabilities and SQL integration eliminate the need for multiple specialized databases, simplifying the architecture while delivering enterprise-grade performance for AI applications.

Below is my tutorial on how to build robust RAG systems using Deepseek-R1 & SingleStore.

Pavan Belagatti

Jan 29 '25

Run DeepSeek-R1 Locally & Build RAG Applications!

#deepseek #ai #developer #coding

2903

Comments 27

6 min read

Below is my other tutorial on how to build efficient RAG systems using Llama 4 and SingleStore.

Pavan Belagatti for SingleStore

Apr 8 '25

Learn How to Build Robust RAG Applications Using Llama 4!

#ai #tutorial #opensource #database

Comments 4

5 min read

Conclusion

AI coding assistants have become indispensable tools in modern software development, offering everything from intelligent code completion to sophisticated refactoring capabilities. The platforms covered—from GitHub Copilot's pioneering approach to Cursor's advanced agent mode and specialized tools like Cody for legacy codebases—demonstrate the diverse solutions available for different development needs.

As these tools continue evolving with better context understanding and more powerful AI models, they're reshaping how we approach coding challenges. The key is experimenting with multiple assistants to find the perfect match for your workflow, coding style, and project requirements. Embrace these AI companions to unlock your full development potential.