Forem: Arshdeep Singh

How I Added a Knowledge Graph to My AI Architecture Analyzer (and What I Learned About Graph Thinking)

Arshdeep Singh — Tue, 24 Feb 2026 03:50:29 +0000

So I built Tesseric to analyze AWS architectures using Claude via Bedrock. You describe your system, it returns structured findings with security risks and remediation steps. Standard stuff.

Then something weird happened.

I was testing it with a 3-tier web app description - ALB in front, EC2 instances in the middle, RDS at the bottom. The analysis came back with findings like "EC2 instances in single AZ" and "RDS encryption disabled." Good. But I kept scrolling between findings thinking "wait, which layer was that again?"

I was mentally reconstructing the architecture diagram from the text I'd just submitted.

That felt backward. The AI clearly understood my architecture - it was extracting service names, understanding relationships, identifying which components had issues. So why was I stuck staring at a list?

What if I could see the architecture? Not just read about it.

The First Graph: Knowledge Accumulation

Before I get to the fun part, let me rewind to where this started.

After running a few dozen test reviews, I noticed patterns. Every finding mentioned AWS services. Reviews kept touching the same services. I started asking questions like "if EC2 has issues, does RDS usually have problems too?" or "which services appear together in high-severity findings?"

I was doing graph traversal in my head on flat JSON responses.

That's when I added Neo4j. Not for the individual reviews - for cross-analysis pattern detection. The schema is simple:

(:Analysis)-[:HAS_FINDING]->(:Finding)-[:INVOLVES_SERVICE]->(:AWSService)

The magic trick is MERGE. In SQL, if EC2 appears in 50 analyses, you have 50 rows. In Neo4j, EC2 becomes one node, and every analysis just adds another relationship to it.

Query for service co-occurrence:

MATCH (s1:AWSService)-[r:CO_OCCURS_WITH]->(s2:AWSService)
RETURN s1.name, s2.name, r.count
ORDER BY r.count DESC

Result: "EC2 and RDS co-occur in 23 analyses. RDS and S3 in 17."

That pattern only emerges from accumulated data. After a few dozen reviews, the graph starts telling you things about architecture patterns you didn't explicitly ask for.

Graph writes happen async with asyncio.create_task() so the core analysis stays fast. If Neo4j is down, the user never knows. Graceful degradation, not cascading failures.

But here's where it got interesting.

The Second Graph: Your Architecture, Visualized

Once I had Neo4j working, I kept thinking "this is cool for meta-analysis, but users don't care about cross-analysis patterns for their first review."

Then it hit me: the AI is already extracting AWS services. It knows EC2 connects to RDS. It knows the ALB sits in front. That's topology information, not just a list.

What if I asked Claude to return the actual architecture structure?

I updated the Bedrock prompt:

"Extract the architecture topology: which services exist, how they connect, what layer they're in. Return as a structured graph."

It worked. First try.

Submit "ALB in front of two EC2 instances connecting to an RDS database" and Claude returns:

{
  "services": [
    {"id": "alb-1", "type": "ALB", "layer": "presentation"},
    {"id": "ec2-1", "type": "EC2", "layer": "application"},
    {"id": "rds-1", "type": "RDS", "layer": "data"}
  ],
  "connections": [
    {"from": "alb-1", "to": "ec2-1"},
    {"from": "ec2-1", "to": "rds-1"}
  ]
}

Now I could draw it. React-Flow + dagre for auto-layout. Color-code by layer (blue for frontend, green for backend, purple for data). Add borders based on severity (red if service has HIGH findings, orange for MEDIUM).

The result: you describe your architecture in text or upload a diagram, and Tesseric draws it back to you with visual problem indicators.

That's the moment users go "oh shit, it actually understood."

Why This Matters (The Graph Thinking Part)

Here's what I'm learning about graphs: they change how you think about your data.

In SQL, you model entities. In graphs, you model relationships. That shift completely changes what questions feel natural to ask.

With Neo4j knowledge graph, I can ask "show me all analyses where EC2 and RDS both appeared in HIGH severity findings" and get a visual cluster showing that risk pattern across time.

With architecture topology, I can click an EC2 node and instantly see which findings affect it. Or click a finding card and watch the affected services highlight on the diagram. Bidirectional linking that feels spatial, not tabular.

The architecture visualization isn't just prettier than a list - it's spatially encoded. You see the ALB at the top, EC2 in the middle, RDS at the bottom, and your brain instantly knows "that's a 3-tier app." You don't have to read it, you just know.

When three findings all point to the same EC2 node with a red border, you see the critical risk cluster without reading a single word.

Humans are visual creatures. We're ridiculously good at pattern recognition when information is laid out spatially. A graph lets you leverage that.

The Stack

Backend: FastAPI + Python 3.11, AWS Bedrock (Claude 3.5 Haiku/Sonnet), Neo4j AuraDB

Frontend: Next.js 14, TypeScript, React-Flow, dagre for auto-layout

Cost: ~$0.011 per text review, ~$0.028 per image analysis. Neo4j free tier. Railway + Vercel hosting ~$5/month. At 100 reviews/day, that's $33/month in AI costs.

React-Flow is genuinely well-designed. Custom nodes, edges, built-in minimap. Got a working graph in a few hours. Dagre handles hierarchical layout for architecture topology, force-directed for knowledge graph.

The Bigger Realization

I thought I was building a knowledge graph to analyze trends across reviews. That was useful.

But the architecture topology visualization - showing users their own system drawn back to them with problem indicators - that's the feature that makes people go "whoa."

It's the difference between "here's a list of issues" and "here's your architecture, and here's exactly where the problems are."

Spatial thinking beats text lists. Graphs give you that spatial encoding almost for free.

If you're building something with interconnected entities - services, users, dependencies, networks - you're probably fighting SQL when a graph would make those queries trivial.

And if you're analyzing architectures or anything with structure? Don't just return text. Draw it. Show it. Let people see the shape of the problem.

That's when AI stops feeling like a chatbot and starts feeling like it actually understands.

Try It

Live demo: tesseric.ca

Describe your AWS architecture or upload a diagram. Watch it draw your system back to you with visual problem indicators. Then explore the knowledge graph to see how services and findings connect across analyses.

Built with AWS Bedrock, Neo4j AuraDB, FastAPI, Next.js, and React-Flow. Open source. Brutally honest feedback mode included.

Check out the code at github.com/iamarsh/tesseric

Building a Config Drift Detector for AWS (with Snapshots, Lambdas, and a Next.js Dashboard)

Arshdeep Singh — Mon, 19 Jan 2026 04:14:01 +0000

Configuration drift is one of those problems that seems minor—until it isn’t.

A “temporary” security group rule stays open for weeks.

A manual change fixes a production incident but never makes it back to Terraform.

An EC2 instance gets a one-off flag “just for now” and quietly becomes the special case nobody wants to touch.

Over time, these tiny deviations compound into outages, security gaps, and a lot of “who changed what, when?” energy. This article walks through how I designed and built a lightweight Config Drift Detector for AWS that:

Takes regular snapshots of your infrastructure.
Compares them against a moving baseline.
Surfaces drift events in a Next.js dashboard.
Sends Slack alerts for high/critical changes.

High-level architecture

Here’s the architecture diagram used in this article:

At a glance:

AWS Services (e.g., EC2, Security Groups) are sampled on a schedule.
A Snapshot Lambda writes raw JSON snapshots to S3 and Supabase/PostgreSQL.
A Detect Lambda compares the latest snapshot to the previous baseline to detect drift.
An Alert Lambda writes drift events, updates baselines, and optionally sends Slack alerts.
A Next.js dashboard polls a lightweight API backed by Supabase/PostgreSQL to show drifts and baselines.

The rest of the article breaks this down from the perspective of an SRE/DevOps engineer who wants fast feedback, clear audit trails, and a UI that doesn’t feel like a side project.

Design goals and constraints

When I scoped this project, I set a few explicit goals:

Detect meaningful drift, not every single field that changes.
Keep the architecture boring and observable: managed services over bespoke infra.
Make the UI operator-friendly: think SRE console, not toy dashboard.
Be small enough to build solo, but credible enough to show to senior engineers or hiring managers.

From there, the architecture fell naturally into four pieces:

Snapshot pipeline.
Drift detection engine.
Alerting and audit trail.
Web dashboard.

1. Snapshot pipeline

What gets snapshotted?

To start, I focused on a narrow but high-impact slice of AWS resources:

EC2 instances: lifecycle, instance type, tags.
Security groups: inbound/outbound rules and attached resources.

These are common sources of “quick fixes” and “just for debugging” changes that later turn into security and reliability problems.

How snapshots flow through the system

The snapshot pipeline revolves around a scheduled Lambda:

Trigger: EventBridge rule runs every 30 minutes.
Snapshot Lambda:
- Calls AWS APIs to list EC2 instances and security groups.
- Normalizes the data into a stable JSON shape.
- Writes each snapshot to:
- S3: raw, timestamped JSON (e.g., YYYY-MM-DD/HH-MM-SS.json).
- Supabase/PostgreSQL: summarized snapshot metadata for faster queries later.

This gives you:

A cheap, append-only log of the world as it looked at each point in time (S3).
A queryable state for dashboards and drift detection (Postgres).

2. Drift detection engine

Baselines vs snapshots

The system uses a simple mental model:

A snapshot is “what the world looks like now”.
A baseline is “what we expect the world to look like”.

Every time a new snapshot arrives, the Detect Lambda compares it to the current baseline:

For each resource (instance, security group, etc.):
- Map by a stable identifier (e.g., instance ID).
- Compare relevant fields that matter for reliability/security.
- Ignore noisy, fast-changing fields (e.g., some timestamps).

The output is a set of drift events:

ADDED: resource exists in snapshot but not in baseline.
REMOVED: resource exists in baseline but not in snapshot.
MODIFIED: resource exists in both, but relevant fields differ.

Each drift event carries:

Resource metadata (ID, type, environment).
Which fields changed (before vs after).
A severity classification (more on that below).

Once detection is done, the baseline is updated forward so the system tracks drift incrementally rather than replaying from the beginning every time.

3. Alerting and severity

Not all drift is created equal. Changing a tag is not the same as opening SSH to the world.

To make alerts meaningful, drift events are classified by severity:

CRITICAL: Security group changes that materially expand exposure (e.g., 0.0.0.0/0 on sensitive ports).
HIGH: EC2 changes that alter lifecycle or network placement in risky ways.
MEDIUM: Configuration changes that might affect behavior but aren’t obviously dangerous.
LOW: Tag-only changes and other low-risk metadata updates.

The Alert Lambda is responsible for:

Writing drift events into Supabase/PostgreSQL for later querying.
Sending Slack notifications for HIGH and CRITICAL drifts:
- Channel: e.g., #infra-alerts.
- Message includes: resource, environment, severity, and a short description.

This keeps the Slack noise under control while still providing a tight feedback loop for changes that actually matter.

4. The Next.js dashboard

The dashboard is intentionally simple, but optimized for SRE/DevOps workflows rather than demos.

Key views

The app exposes three main pages:

Dashboard:
- High-level stats: number of active drifts, baselines, and monitored environments.
- Recent drifts, sorted by time and severity.
- Baseline overview (which environments are covered, which baselines are stale).
Drifts:
- Table of drift events with:
- Severity chips.
- Resource and environment.
- Type of drift (ADDED, REMOVED, MODIFIED).
- Detected time.
- Filters for severity, status, and environment.
Baselines:
- List of baselines with:
- Name, environment.
- Status (Active / Stale / Archived).
- Last updated time.
- Links into the Drifts view filtered by baseline.

Data flow

The dashboard queries Supabase/PostgreSQL via a light API layer:

Fetch lists of drifts and baselines.
Support simple aggregation for dashboard metrics (e.g., count of active drifts).
Polls frequently enough to make the UI feel “live” without hammering the backend.

The focus is on operational clarity:

It should be easy to answer:
- “What changed recently?”
- “Is this environment drifting more than others?”
- “Which baselines are out of date?”

Why this architecture?

This design deliberately avoids premature complexity:

Serverless for cadence-based work: Lambdas plus an EventBridge scheduler are a natural fit for “run every N minutes and compare snapshots”.
S3 + Postgres gives both durability and queryability:
- S3 for raw history.
- Postgres for fast reads and simple aggregations.
Next.js dashboard:
- Easy to deploy.
- Easy to iterate on UX.
- Pairs well with Supabase as a backend.

At the same time, it leaves room to grow:

Add more resource types beyond EC2 and security groups.
Introduce per-environment baselines and multi-account support.
Expand the dashboard with timelines, diff views, and richer filters.

Future improvements

There are several natural extensions to this architecture:

Better diff views: show structured diffs (field-level before/after) in the UI, not just “modified”.
Alert policies: configurable rules to decide which drifts should alert where (Slack, email, etc.).
Multi-cloud support: abstract snapshot/detect logic to handle other providers.
Drift remediation hooks: for certain classes of drift, trigger runbooks or automated remediation.

The current version focuses on the basics: detect, classify, alert, and visualize. That’s already enough to catch the most painful “someone changed prod” issues and to tell a coherent story in a portfolio or blog post.

Wrapping up

Config Drift Detector started as a way to make configuration changes more visible, but it also became a nice exercise in small, focused architecture:

One clear data flow from AWS → snapshots → drift detection → alerts → dashboard.
Minimal moving parts, each doing one job well.
A UI that reflects how operators actually investigate and respond to drift.

If you’re interested in configuration management, SRE tooling, or just want a portfolio project that goes beyond CRUD, building something like this is a great way to explore the intersection of cloud architecture, observability, and developer experience.

Stop Drawing Stacks: Seeing Drupal on AWS as a Graph

Arshdeep Singh — Sat, 17 Jan 2026 23:01:30 +0000

Every architecture diagram you've ever drawn is a graph. The boxes are nodes; the arrows are edges. Yet most teams treat those diagrams as static documentation rather than a working model they can reason about mathematically. Once you start applying graph theory to how Drupal and AWS actually behave at runtime, hard problems—latency budgets, failure blast radius, refactoring priorities—become graph problems you already know how to solve.

Your platform is a graph

A graph is simply a set of nodes (vertices) connected by edges. In a Drupal-on-AWS platform:

Drupal nodes: content types, configuration entities, services in the container, event subscribers, routes, caches.
AWS nodes: VPCs, subnets, security groups, ALBs, ECS tasks, Lambda functions, RDS clusters, S3 buckets, SQS queues, external SaaS (Auth0, Salesforce).
Edges: "calls API", "publishes to queue", "allowed by security group", "replicates to region", "feeds dashboard".

Once you accept this framing, you stop asking vague questions like "Is this architecture clean?" and start asking precise questions like "What's the shortest failure path from this AWS primitive to a broken SLO?"

Drupal as a dependency graph

Drupal is usually described in terms of content types, views, and modules. Operationally, it behaves like several overlapping directed graphs.

Configuration graph

Entity types, bundles, fields, field formatters, views, and access rules form a dependency graph. Change a field storage definition and you can cascade through displays, views, REST resources, and integrations.

Runtime call graph

The Symfony service container, event subscribers, and middleware stack define a call graph. Every HTTP request walks a specific path through this graph—touching routing, access checking, entity loading, rendering, and caching nodes in sequence.

Permission graph

Roles, permissions, and route access callbacks form yet another graph. Model "who can reach what" as directed edges and you can visualize privilege-escalation risks as unexpectedly short paths between low-privilege and high-privilege nodes.

A single anonymous page view is actually a walk through all three subgraphs simultaneously. Understanding that walk is the first step to optimizing it.

AWS as an infrastructure graph

AWS architectures are already drawn as graphs; graph theory just makes the math explicit.

Network topology graph

VPCs, subnets, route tables, security groups, and NACLs form a reachability graph. Two nodes can only communicate if there's a valid path through this graph—no path, no packets.

Data-flow graph

S3, Kinesis, SQS, SNS, Lambda, and analytics services form directed acyclic graphs (DAGs) of data transformations. Your ETL pipeline, event-driven workflows, and observability stack are all DAGs whether you drew them that way or not.

Service dependency graph

ECS services, Lambda functions, RDS, ElastiCache, and external APIs form a runtime dependency graph. Traces and flow logs let you infer this graph from production traffic rather than relying on outdated documentation.

Four graph concepts that sharpen your thinking

1. Paths and latency

User-perceived latency is the weighted sum of edges along the shortest path from browser to data and back. CDN, WAF, ALB, PHP-FPM, Redis, RDS—each hop adds weight.

Reducing latency means either removing nodes from the path (for example, serving from edge cache) or reducing edge weights (for example, connection pooling, read replicas). Frame every performance optimization as a path-shortening or weight-reduction exercise.

2. Minimum cuts and resilience

A minimum cut is the smallest set of nodes or edges whose removal disconnects the graph. In infrastructure terms, it's your single points of failure: the lone RDS writer, the shared Redis cluster, the internal auth service everything depends on.

High-availability design is the art of making minimum cuts large and expensive. Multi-AZ RDS, stateless Drupal behind multiple ALBs, and regional failover all increase the size of the cut an outage must hit.

3. Centrality and hotspots

Betweenness centrality measures how often a node sits on the shortest path between other nodes. High-centrality nodes are chokepoints: an API gateway every request flows through, a monolithic "integration" module in Drupal, a single SQS queue feeding multiple consumers.

Focus observability, rate limiting, and capacity planning on high-centrality nodes. Their failure has a disproportionate blast radius. If you can't eliminate centrality, at least instrument it.

4. Strongly connected components and coupling

A strongly connected component (SCC) is a subset of nodes where every node is reachable from every other. In practice, SCCs represent tightly coupled subsystems: Drupal plus a specific internal API, a queue, and a Lambda that all depend on each other.

Changes to one node in an SCC risk breaking the others. Identify SCCs before refactoring; break them apart by introducing explicit, versioned contracts—APIs, schemas, events—rather than implicit runtime dependencies.

Using graph thinking day to day

Architecture reviews

Instead of subjective "Is this clean?", ask:

What's the longest path in the critical user journey?
What's the minimum cut between the user and the SLO?
Which node has the highest centrality?

Incident analysis

Reconstruct failures as graph walks:

Which edge broke?
What alternative paths existed (or didn't)?
Which node's centrality amplified the blast radius?

Modernization roadmaps

Prioritize refactors by graph metrics:

Decompose the highest-centrality Drupal module first.
Replace a single massive integration edge with a message-driven subgraph.
Break apart the largest SCC into independent, deployable units.

Where to start

You don't need a graph database to benefit from graph thinking. Start with one exercise:

Pick a critical user journey (for example, authenticated page load, checkout, or form submission).
Sketch it as a directed graph: every service, cache, database, and external API is a node; every call or dependency is an edge.
Label edges with latency (p50, p99) and availability (historical uptime).
Identify the minimum cut and the highest-centrality node.
Use those findings to prioritize your next performance or reliability improvement.

Once you see your Drupal-on-AWS platform as a graph, decisions get crisper. You stop guessing at complexity and start operating on the structure that's actually there.

Automating Performance Engineering with Claude Code and New Relic MCP

Arshdeep Singh — Sun, 11 Jan 2026 21:52:26 +0000

For a long time, my “performance engineering workflow” as a Tech Lead looked like this:

Log into New Relic
Run a handful of NRQL queries
Inspect slow transactions and error traces
Map issues back to Drupal code
Estimate effort and impact
Create JIRA tickets with enough context
Post updates in Teams

It’s valuable work, but it is also repetitive, mechanical, and very interrupt‑friendly. It was quietly costing me 2-3 hours every week.

So I automated it.

This post walks through the workflow I built using Claude Code, the New Relic MCP, Jira, and Microsoft Teams to:

Continuously analyze performance and error data
Generate structured root cause analysis with code references
Create and prioritize Jira tickets
Notify the team with severity‑specific alerts
Optionally draft pull requests for straightforward fixes

Why This Was Worth Automating

As a Tech Lead on a large Drupal platform, my time is best spent on:

Architecture and design decisions
Reviewing high‑impact changes
Mentoring and unblocking engineers
Shaping priorities with product and leadership

But performance issues don’t care about calendars.

Every incident or regression forced me back into the same manual loop: query New Relic, decipher traces, reverse‑engineer root causes, and turn them into actionable tickets. It was important, but it wasn’t leverage.

The workflow in this post exists to do one thing: remove the mechanical part of performance engineering while keeping the judgment and risk decisions in human hands.

High‑Level Architecture

At a high level, the system looks like this:

New Relic collects APM metrics, traces, and error data from our Drupal application.
Claude Code (via the New Relic MCP) pulls that data, analyzes it, and decides what’s worth acting on.
Jira receives structured issues with metrics, root causes, effort estimates, and links back to New Relic.
Microsoft Teams gets severity‑color‑coded notifications so the right people see the right issues at the right time.
GitHub (optionally) receives draft pull requests for straightforward fixes the AI can safely propose.

See full resolution image here. This architecture keeps responsibilities clear:

New Relic is the source of truth.
AI is used for interpretation and orchestration.
Jira and Teams are where work and communication actually happen.
Humans stay firmly in the decision loop.

The Workflow End‑to‑End

Phase 1: Data Collection from New Relic

On demand (or via a scheduled script), the workflow starts with a single instruction, e.g.:

“Execute New Relic performance analysis for production last 1 hour”

Behind the scenes, Claude Code uses the New Relic MCP to run a focused set of NRQL queries:

Slow transactions: endpoints with response time above a threshold
Error rates: exception types, messages, and affected routes
Database performance: slow queries and N+1‑style patterns
Open incidents and application health: alerts, Apdex, throughput

The goal here is not to recreate the entire dashboard - it’s to pull just enough data for a useful decision.

Phase 2: AI‑Powered Root Cause Analysis

Claude Code then takes that raw telemetry and turns it into something my team can act on:

Groups slow transactions into meaningful units (e.g., /reports/latest, specific admin pages)
Connects issues to Drupal modules, controllers, or custom code paths
Distinguishes between:
- one‑off spikes vs. consistent degradation,
- user‑facing vs. admin‑only issues,
- backend jobs vs. interactive requests
Hypothesizes root causes:
- N+1 queries
- missing cache tags/contexts
- heavy external API calls
- misconfigured database access patterns

The important part: this analysis is always explainable. If the suggestion is wrong, it is wrong in a way that is obvious when you read the ticket.

Phase 3: Priority, Impact, and Story Points

Performance work always competes with feature work, so the system needs to express impact in a language the team understands.

The workflow classifies each issue as Critical, High, or Medium based on thresholds like:

Response time ranges
Error rate percentages
Whether the issue is user‑visible
Whether functionality is partially or fully degraded

From there, it estimates story points (1/2/3/5/8) using a simple heuristic:

Complexity (single query tweak vs. cross‑module change)
Scope (one endpoint vs. a subsystem)
Risk (low‑risk cache change vs. behavior‑changing refactor)
Effort (hours vs days)

These are not perfect, but they are consistent – which is often more useful than “perfect but ad hoc”.

Phase 4: Jira Ticket Generation

For every actionable issue, the workflow calls the Jira API to create a ticket in the current active sprint.

Each ticket includes:

A descriptive title (e.g. [Performance] /reports/latest endpoint – 650% response time increase)
A summary of the issue and affected environment
Metrics:
- average and p95 response time
- error rate
- timeframe and estimated impact
Root cause analysis in plain language
Affected code (modules, file paths, and line numbers where possible)
Suggested fix (cache changes, query optimizations, config sync, etc.)
Deep links back to the relevant New Relic views

This is the difference between “we should look into that spike” and “here is an actionable story ready for a sprint board”.

Phase 5: Teams Notifications and Optional PRs

Once tickets are created, the workflow posts an Adaptive Card to the right Teams channel:

Every card includes:

Short description of the issue
Key metrics (response time, error rate, environment)
Link to the Jira ticket (and through that, to New Relic)
Story points and priority

For certain well‑scoped cases (like adding cache metadata or adjusting a specific query), there is also an optional step:

“Yes, attempt to fix this issue”

When I explicitly opt in, Claude Code will:

Read the relevant files
Propose a code change
Run local commands/tests where available
Open a draft PR linked to the Jira ticket

Nothing merges automatically. Manual review and CI are still required.

Guardrails and Risk Management

The part that made this usable in a real production environment was not “more automation”; it was more guardrails.

Some of the key ones:

No confidential data in prompts or artifacts
- No Jira IDs in public logs or documentation
- No customer identifiers or business metrics
- No secrets, URLs, or internal hostnames
New Relic as source of truth
- The AI analyzes existing metrics; it does not invent data
Human‑in‑the‑loop by design
- Every ticket is reviewed before the team sees it
- PRs are suggestions, not actions
Opinionated thresholds
- Hard lines for Critical / High / Medium to avoid alert fatigue
- The system prefers fewer, higher‑quality tickets to a noisy firehose

These are boring details, but they are also the difference between “cool demo” and “thing we actually trust”.

ROI: What This Changed in Practice

In terms of time:

Manual performance checks and ticket creation went from ~2–3 hours/week down to about 15 minutes of review.
Incident response is faster because there is less friction between “we saw something weird in New Relic” and “there is a ticket with a clear owner and plan”.

In terms of quality:

We miss fewer issues, especially slow degradations.
Tickets come with better context and suggested fixes.
Standups focus more on trade‑offs (“Do we take this now or next sprint?”) and less on “What exactly is going on?”.

And in terms of team dynamics:

Developers can pick up performance work without having to live in New Relic.
The platform feels more “observed” without feeling more “surveilled”.

Implementation Notes

At a high level, this stack uses:

Application: Drupal on Acquia
Monitoring: New Relic APM, NRQL queries, incidents
AI: Claude Code with the New Relic MCP
Ticketing: Jira Cloud (REST API v3)
Communication: Microsoft Teams (incoming webhooks)
Version Control: GitHub (for optional PR automation)

From a configuration perspective, the main work is:

Wiring the New Relic account and NRQL queries into the MCP
Wiring Jira, Teams, and GitHub credentials through environment variables
Defining thresholds and ticket templates that reflect your actual workflow

The exact code in my setup is specific to our environment and not open‑sourced (yet), but the pattern is portable to any stack where you have:

Structured telemetry
A programmable AI agent
A ticketing system
A chat/notification channel

Credits

This workflow only exists because of the people around me.

Shannon Lal, our CTO, pushed us to adopt Claude Code and gave me the runway to experiment with this in a real system.
Maria Parra Pino and Ruslana Zagrai, two of the developers on the team, were the ones stress‑testing the idea, calling out edge cases, and helping refine it into something that is actually usable day‑to‑day.

And of course, thanks to New Relic for building a platform and MCP integration that made it possible to treat APM data as something to automate against, not just stare at.

When This Pattern Makes Sense

In my experience, this kind of automation works well when:

The workflow is repeatable and well‑understood.
The inputs are observable and reliable (like APM telemetry).
The outputs can be expressed as structured work (tickets, PRs, notifications).
You are willing to keep humans in the loop.

If you’re running New Relic and Jira already, the leap from “manual checks” to “AI‑assisted performance engineering” is more about design and guardrails than about exotic technology.

If you end up building a variant of this, I’d genuinely love to hear what worked, what broke, and what you did differently.