Forem: Swapnanil Saha

India's DPDP Act 2023 Explained — And How AI Handles Data Principal Requests at Scale

Swapnanil Saha — Thu, 21 May 2026 21:46:43 +0000

This post is for informational purposes only and does not constitute legal advice. The DPDP Act 2023 and its implementing Rules 2025 are relatively new — requirements may evolve through further notifications or guidance. Verify the current position with a qualified data protection lawyer before making compliance decisions.

Your company just received this email:

"I would like to know all personal data your organisation holds about me. This is a formal request under the DPDP Act."

It lands in a shared privacy@yourcompany.com inbox. Someone reads it. Forwards it to legal. Legal forwards it to engineering. Engineering says they need to check three databases. Nobody notes the date it arrived. Three weeks pass. When someone finally circles back, there are nine days left on the 30-day window the DPDP Rules require. Not enough time to locate the data, get legal sign-off, draft a response in the right language, and send it.

On day 32, you're in violation.

That scenario is the default for most Indian companies right now. Not because they're careless — but because nobody built the infrastructure for it.

I built DPDP Copilot to close that gap: a self-hosted operator tool that accepts public data requests, classifies them with Claude, drafts compliant multilingual replies, tracks every action as immutable evidence, and monitors SLA status in real time.

But before the tool, you need to understand what you're actually dealing with. Let's start with the law.

→ Full tool page and live demo

Part 1: What the DPDP Act 2023 Actually Requires

The Digital Personal Data Protection Act 2023 received presidential assent on 11 August 2023 and represents India's first comprehensive data protection legislation. Its structure borrows from GDPR while adapting to India's specific context — a 1.4 billion-person population, 22 scheduled languages, deep mobile penetration, and a digital public infrastructure layer (UPI, Aadhaar, DigiLocker) that most jurisdictions don't have.

The implementing Rules — the Digital Personal Data Protection Rules 2025 — were notified on 13 November 2025, giving the Act its operational teeth.

Here's what the law actually mandates, stripped of legalese, focusing on the parts most engineering and compliance teams get wrong.

The Four Rights Every Data Principal Has

The Act grants every "data principal" — the person whose data is being processed — four actionable rights. When someone exercises any of these, your organisation (as the "data fiduciary") has a legal obligation to respond.

Right of Access (Section 11)

Any person can ask you: what personal data do you hold about me, and for what purpose? You must provide a summary of the data being processed, the processing activities, and the identities of any other data fiduciaries or processors with whom their data has been shared. The Act doesn't specify a format, but silence is not sufficient.

Right of Correction and Completion (Section 12(a))

If a person believes data you hold is inaccurate, incomplete, or misleading, they can demand you correct or complete it. You must either act on the request or explain in writing why you're not.

Right of Erasure (Section 12(b))

A person can request deletion of their personal data from your systems. There are exceptions — data held for legal obligations, fraud prevention, pending litigation — but these exceptions have to be documented and justified, not just asserted.

Right to Grievance Redressal (Section 13)

Any person can file a grievance if they believe their rights under the Act have been violated. You must provide a mechanism to receive and respond to grievances. The Rules 2025 specify this mechanism must be genuinely accessible.

The Response Timelines

The DPDP Rules 2025 (notified November 2025) set specific mandatory windows for responding to data principal requests:

Access, Correction, and Erasure requests (Sections 11–12): Data fiduciaries must respond within 30 days.
Grievance Redressal (Section 13): Grievances must be resolved within a maximum of 90 days from receipt.

These are calendar days. For reference: GDPR (EU) also requires responses within one month for most data subject requests; California's CCPA gives 45 days. India's framework is broadly comparable to GDPR in its demands — but applies at the scale of 1.4 billion people, across 22 scheduled languages. That's where the operational challenge is categorically harder.

30 days sounds like a lot. For a company with no structured process, it evaporates fast. A request that lands in a shared inbox on a Friday, takes three days to be noticed, gets forwarded twice, waits a week for a legal review, and then requires manual drafting in the data principal's language — you're out of time before anyone writes the first sentence.

A note on the DPDP Copilot tool's SLA default: The tool's internal SLA clock defaults to 7 days — intentionally more conservative than the 30-day legal window. Most mature compliance programmes target internal deadlines that are significantly tighter than the regulatory maximum, so that normal delays (review cycles, approvals, language checks) don't push you to the edge. The 7-day default is configurable via orgs.sla_days. When the Rules are read by your legal team and a specific target is agreed, you set it once in the database.

What "Evidence" Actually Means Under DPDP

The Act and the Rules create a documentation burden that most organisations underestimate. You need to be able to prove:

That the request was received on a specific date
That it was handled (classified and routed) in a timely manner
What response you gave and when
Whether the response fulfilled the request or why it couldn't

This is audit evidence. If the Data Protection Board investigates a complaint, you need to produce this trail. A forwarded email chain is not audit evidence. A Slack thread is not audit evidence. An append-only timestamped log — with the original message, the classification, the drafted response, and the send event — is audit evidence.

The Financial Exposure

The Act's First Schedule specifies penalties by category of failure. The two most operationally relevant:

₹250 crore (~$30M USD) — Failure to implement reasonable security safeguards to prevent personal data breaches (Section 8(5)). This is the preventive obligation — having security measures in place. The penalty applies even where a breach subsequently occurs and the fiduciary claims they didn't anticipate it.

₹200 crore — Failure to notify the Data Protection Board and affected data principals when a personal data breach does occur (Section 8(6)). The notification obligation is separate from the security obligation — you can get penalised for both.

Other penalty tiers: ₹200 crore for violations related to children's personal data (Section 9); ₹150 crore for Significant Data Fiduciary obligation failures; ₹50 crore for other provision breaches.

The Data Protection Board, once fully constituted, will have adjudicatory powers to investigate and levy these penalties. Failing to acknowledge or respond to a data principal request, if that person escalates to the Board, creates a documented paper trail of non-compliance before any investigation begins.

Part 2: Why Your Current Process Fails (And Why That's the Default)

Let me describe the most common setup I've seen when talking to Indian companies dealing with DPDP requests:

A privacy@ email address that gets checked sporadically
No clock tracking — the 30-day window doesn't appear anywhere visible until it's almost gone
No classification — the person who reads it decides manually whether it's an access request, deletion request, or complaint
Reply drafted manually, from scratch, in English, by whoever processes it that week
No audit trail beyond the email itself, which may be deleted if an inbox is cleaned

This isn't negligence. It's the logical outcome of a process designed before the Act existed. The process was "email us with your concern" — and it worked fine when data requests were rare. The DPDP Act changes the legal weight of those requests, but most companies haven't updated their infrastructure to match.

The Three Ways Manual Processes Break

1. The deadline blind spot

When a request lands in an email inbox, the 30-day clock doesn't appear anywhere. Nobody stamps the receipt date. Nobody sends an automatic acknowledgement. The request sits until someone opens the inbox. If that takes a week — completely normal for a low-traffic shared inbox — you've already used 23% of your response window without touching the request. Legal review, data location, and drafting will eat most of what's left.

2. Classification inconsistency

"Please delete my data" is an erasure request. "I never gave you permission to use my data" is a grievance. "I want to update my phone number" is a correction request. "Can you send me everything you have on me" is an access request. A trained compliance professional can distinguish these consistently. Your Monday-morning on-call engineer who reads the shared inbox probably cannot — especially for requests written in Hindi, Bengali, or Tamil.

When requests are misclassified, they get routed to the wrong person, get the wrong response template, and sometimes get the wrong legal treatment. An erasure request handled as a grievance will likely produce a response that doesn't fulfil the legal obligation under Section 12(b), even if it sounds polite.

3. Evidence that can't survive audit

An auditor asks: "On what date did you receive and process this erasure request?" If your answer is "let me check the email thread," you have a problem. Email is mutable, searchable by keyword but not by event type, and has no integrity guarantees. An auditor looking for "REQUEST_CREATED at timestamp T" followed by "REPLY_SENT at timestamp T+22 days" needs a structured log, not an inbox.

Part 3: The Role of AI in DPDP Compliance

When I was designing DPDP Copilot, the central question was: where does AI actually add value, and where does it introduce risk?

DPDP compliance has two types of tasks: tasks that require human judgment about legal gray areas, and tasks that require consistent application of known rules to varied inputs. AI is well-suited to the second category and badly suited to the first.

Deciding whether your company has a legal obligation to retain data for a pending investigation? That's human judgment. Classifying an incoming message as an Access request vs. an Erasure request? That's pattern recognition on natural language — exactly what a well-prompted LLM is built for.

Classification: Where LLMs Outperform Rules

Naive rule-based classification for DPDP requests fails quickly. "Please delete my account" is an erasure request. "I want my data removed from your marketing list" is also an erasure request but uses entirely different vocabulary. "Remove me" submitted in a support ticket might be an erasure request or might just be asking to be unsubscribed from emails — context determines which.

A rules-based system that catches "delete my data" literally will miss most real-world submissions. People write in fragments, in their native language, with emotional context, in ways that don't follow a template.

An LLM with a well-structured prompt classifies these correctly without needing exhaustive keyword lists. The DPDP Copilot classification prompt:

Classify this message into exactly one of: Grievance, Access, Rectification, Deletion.
Respond as {"type":"<classification>"}.

Message:
${text}

The system prompt establishes the legal framework — "You are a DPDP compliance assistant classifying data principal requests under India's DPDP Act 2023." The model maps the message to the correct legal category.

The output is constrained to a JSON object with a single key. The application validates that type is one of the four legal categories. If the model returns something outside those four values, it's rejected and retried — the system never persists a classification it can't validate.

Multilingual Reply Drafting: Where AI Eliminates Weeks of Work

This is where AI creates the most leverage in the Indian compliance context.

India has 22 scheduled languages. The DPDP Act creates a right to grievance redressal — and for that mechanism to be genuinely accessible (which the Rules 2025 require), you need to respond in a language the person can understand.

Without AI, producing compliant response templates in Hindi, Bengali, Tamil, and Marathi means hiring translators, reviewing legal language, maintaining version parity across languages, and updating all templates whenever requirements change. That's a significant operational cost — one that most companies defer indefinitely, defaulting to English-only responses that disadvantage non-English speakers.

With a well-prompted LLM, drafting happens at response time. The model understands DPDP legal obligations and drafts a response that:

Acknowledges the specific request type (not a generic "thank you for reaching out")
Confirms receipt and logging with a reference number
States the applicable response timeline
Explains the next step the data principal should expect
Is written in the language they chose

The system prompt for drafting:

You are a DPDP compliance officer drafting replies to data principal requests under 
India's Digital Personal Data Protection Act 2023. Write professional, empathetic 
replies that: acknowledge the request type, confirm receipt and logging, state the 
applicable response timeline, and explain the next step. Keep the tone formal but accessible.

The user message to the model specifies the request type and target language:

Draft a DPDP-compliant reply in ${language} for a ${type} request.

Customer message:
${text}

These are suggested replies — an operator reviews them before sending. The human stays in the loop for all final communications.

Prompt Caching: Making AI Cost-Efficient at Scale

The system prompts for both classification and drafting use cache_control: { type: 'ephemeral' } via the Anthropic SDK, enabling prompt caching.

If you're processing dozens of data principal requests per day, the system prompt — which is identical for every request — gets cached by Anthropic's API after the first call. Subsequent calls are billed at a fraction of the full input token cost. At scale, prompt caching reduces API costs by 50–80% for the classification and drafting steps.

This is a small architectural detail that has no effect on the first request and compounding positive effect on the hundredth. If you're building compliance tooling that processes high volumes, prompt caching is the difference between a sustainable per-request cost and one that makes the tool impractical at production scale.

Retry Logic: Resilience Against Transient Failures

The LLM calls use exponential backoff retry logic:

async function callWithRetry(fn) {
  for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
    try {
      return await fn()
    } catch (err) {
      const isRetryable =
        err instanceof Anthropic.RateLimitError ||
        err instanceof Anthropic.InternalServerError
      if (!isRetryable || attempt === MAX_RETRIES - 1) throw err
      await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000))
    }
  }
}

Only rate limit errors and server errors trigger retries — not client errors (bad API key, invalid request format). The delay doubles with each attempt: 1 second, then 2, then 4. Three attempts total. A transient API hiccup doesn't fail the entire processing pipeline for a data principal's submission.

Part 4: DPDP Copilot — The Tool in Detail

With the legal and AI context established, here's how DPDP Copilot works end to end.

The Public Request Form

The entry point for data principals is /grievance — no login required. Requiring a login to submit a data rights request is a barrier that conflicts with the spirit of the Act. If someone can't easily submit an erasure request, the mechanism isn't truly accessible.

The form collects:

The request message (free text — people write what they mean in their own words)
Preferred response language (English, Hindi, Bengali, Tamil, Marathi)

There's no account creation, no verification code, no CAPTCHA wall. Data principals submit and receive an acknowledgement. The contact information is embedded in the message body — a known limitation of the current implementation, and a deliberate choice for the initial version: forcing a structured contact field requires more UI complexity and doesn't add meaningful compliance value until outbound email delivery is implemented.

What Happens in the Background on Submission

When the form is submitted, a single API call to POST /api/public/requests triggers a multi-step synchronous pipeline:

Step 1: Request creation

The system creates a database record with:

A UUID as the request ID
The raw message text
The chosen language
type: 'PENDING' — not yet classified
sla_due_at: now() + 7 days — the internal SLA clock starts at submission. This 7-day default is configurable via orgs.sla_days and is intentionally conservative relative to the 30-day legal window.
org_id from the active organisation configuration

Step 2: Evidence logging — REQUEST_CREATED

An evidence_events record is written immediately after creation:

{
  "event_type": "REQUEST_CREATED",
  "event_data": { "source": "public_form", "language": "Hindi" },
  "created_at": "2025-05-25T10:00:00.000Z"
}

This is the legal timestamp of receipt. The moment the request hits the database, it's on record. The evidence log is append-only at the application level — there are no delete or update operations on evidence_events.

Step 3: AI classification

The message text goes to Claude for classification. The model returns a JSON object. The application parses it and validates that type is one of { Grievance, Access, Rectification, Deletion }. Any other value throws an error. The request record is updated with the validated type.

Step 4: Evidence logging — REQUEST_CLASSIFIED

{
  "event_type": "REQUEST_CLASSIFIED",
  "event_data": { "type": "Deletion" },
  "created_at": "2025-05-25T10:00:01.342Z"
}

The classification result and timestamp are immutable facts in the evidence record from this point forward.

Step 5: AI reply drafting

Claude drafts a response in the data principal's chosen language, using the classified request type and the original message as context.

Step 6: Evidence logging — REPLY_SUGGESTED

{
  "event_type": "REPLY_SUGGESTED",
  "event_data": { "language": "Hindi", "model": "claude-sonnet-4-6" },
  "created_at": "2025-05-25T10:00:02.891Z"
}

The entire pipeline — creation, classification, drafting — runs in under 5 seconds for a typical request. By the time an operator opens the inbox, the request is already classified, a draft reply exists, and the SLA clock has been running since submission.

The Operator Inbox

The inbox at / is protected by authentication. It shows all requests for the active organisation, each with:

Request type (Grievance, Access, Rectification, Deletion, or PENDING if classification failed)
Message preview
Live SLA status (Within SLA / Due Soon / Overdue)
Creation timestamp

The SLA status is computed at read time — not stored as a cached value. The computeSlaStatus function runs on every page load:

export function computeSlaStatus(slaDueAt) {
  const now = new Date()
  const due = new Date(slaDueAt)
  const diffHours = (due - now) / (1000 * 60 * 60)

  if (diffHours < 0) return 'OVERDUE'
  if (diffHours < 24) return 'DUE_SOON'
  return 'WITHIN_SLA'
}

The status shown in the inbox reflects the current moment — not the status at the last time the record was updated. A request that was WITHIN_SLA yesterday is automatically DUE_SOON or OVERDUE today without any scheduled job or background worker.

The inbox is sorted by SLA urgency by default, so operators see the most at-risk requests first.

The Request Detail Page

Clicking into any request shows everything an operator needs to review, respond, and close:

The original message — verbatim, exactly as submitted. No interpretation layer between the operator and what the data principal actually wrote.

The AI-drafted reply — pre-populated with DPDP-compliant language in the data principal's chosen language. The operator can read it, edit it in the text area, and send it. The draft is a starting point, not a cage.

The resolution checklist — structured prompts for the operator to work through before closing the request:

Has the relevant data been located?
Has the requested action (access/correction/deletion) been taken?
Has the data principal been notified?

The evidence timeline — every event in chronological order with timestamps, event types, and metadata.

The export controls — one click to download the full evidence trail as PDF or CSV.

Marking a Reply as Sent

When an operator sends the response (currently: manually via email or another channel, then clicks "Mark as Sent" in the tool), the system:

Updates the request status to CLOSED
Logs REPLY_SENT to the evidence table:

   {
     "event_type": "REPLY_SENT",
     "event_data": { "operator": "admin", "channel": "manual" },
     "created_at": "2025-05-27T14:22:00.000Z"
   }

The gap between REQUEST_CREATED and REPLY_SENT timestamps is the documented response time. If an auditor asks "how long did you take to respond to this erasure request?" — the answer is computable from the evidence log to the second.

Part 5: The Evidence Architecture

The evidence design is the most important part of DPDP Copilot from a compliance standpoint. Everything else is workflow tooling. The evidence log is what you use when the Data Protection Board comes calling.

Append-Only by Design

The evidence_events table has no update or delete paths in the application. Once an event is written, it stays. There's no "edit evidence" API, no admin panel for removing events, no soft-delete flag.

Audit evidence that can be modified isn't evidence; it's a story you're telling. An append-only log where every event has a database-generated timestamp (not an application-provided one) is as close to tamper-evident as you can get in a PostgreSQL-backed application.

The schema:

CREATE TABLE IF NOT EXISTS evidence_events (
  id          uuid PRIMARY KEY,
  request_id  uuid REFERENCES requests(id),
  event_type  text NOT NULL,
  event_data  jsonb,
  created_at  timestamptz DEFAULT now() NOT NULL,
  org_id      uuid NOT NULL REFERENCES orgs(id)
);

The created_at field uses DEFAULT now() — the database server's timestamp, not the application's Date.now(). Database server clocks in a managed PostgreSQL instance are NTP-synchronized and authoritative. Application clocks can drift.

The Four Event Types

REQUEST_CREATED — logged at the moment of database insertion, before any processing. This is the legal timestamp of receipt.

REQUEST_CLASSIFIED — logged immediately after the AI classification succeeds and the type is validated. Contains the classified type in event_data. If classification fails and retries are exhausted, this event is not logged — the absence of this event tells you classification failed.

REPLY_SUGGESTED — logged when the AI draft is written to the request record. Contains the language and model used.

REPLY_SENT — logged when an operator marks the reply as sent. Contains the operator identity and channel. This closes the request lifecycle in the evidence log.

The presence of all four events, in order, within the applicable window means the request was handled correctly from intake to response. An auditor reviewing the CSV export can verify this in seconds.

Organisation Scoping

Every evidence event carries an org_id. Every query on evidence_events is scoped to the active organisation. A single deployment can serve multiple organisations, and their evidence trails are strictly isolated.

The org_id in evidence events is written by the application using the resolved organisation context — not passed in by the caller. A data principal submitting a request cannot specify or forge the organisation context; it's resolved server-side from the environment configuration.

What the Export Looks Like

CSV export for a complete request:

event_type,created_at
REQUEST_CREATED,2025-05-25T10:00:00.000Z
REQUEST_CLASSIFIED,2025-05-25T10:00:01.342Z
REPLY_SUGGESTED,2025-05-25T10:00:02.891Z
REPLY_SENT,2025-05-27T14:22:00.000Z

Four rows. Auditor reads it: request received Sunday 10:00 AM, responded Tuesday 2:22 PM — 52 hours, well within any reasonable response window.

PDF export includes:

Organisation name and request ID
Request type and creation timestamp
Original message (verbatim)
Suggested reply (the draft that was reviewed and sent)
Full evidence timeline

The PDF is generated server-side using Puppeteer with Chromium. The HTML template is a known XSS risk in the current implementation (user-provided message text is interpolated directly into HTML) — the fix is explicit HTML escaping before interpolation, which is on the roadmap.

Part 6: SLA Architecture — The Compliance Clock

SLA management is where most compliance tools fail. They either track SLA status as a static database field (which becomes stale the moment the clock ticks past the deadline) or they rely on background jobs (which can fail silently and leave the status indicator wrong).

DPDP Copilot takes a third approach: compute SLA status at read time, every time.

How the Internal SLA Clock Works

The sla_due_at timestamp is written once, at request creation: now() + sla_days. The default is 7 days — more conservative than the 30-day legal window, so normal review and approval cycles don't consume the entire legal budget. That's the only mutation to this field — it never changes after the request is created.

On every inbox load, every request detail page load, the computeSlaStatus(slaDueAt) function runs in the API layer:

const diffHours = (new Date(slaDueAt) - new Date()) / (1000 * 60 * 60)

if (diffHours < 0)   return 'OVERDUE'
if (diffHours < 24)  return 'DUE_SOON'
                     return 'WITHIN_SLA'

No database update. No background worker. No scheduled job. The status shown to the operator is always accurate as of the current server time.

DUE_SOON triggers at 24 hours remaining — a one-day warning before the internal deadline. This gives operators a meaningful heads-up without creating false urgency days in advance.

Setting the Right Internal SLA for Your Organisation

The DPDP Rules 2025 set a 30-day legal maximum for access/correction/erasure responses. How you set your internal target depends on your process:

A startup where one person handles requests end-to-end: 7–10 days is achievable and leaves buffer
A mid-size company where requests go through legal review and data lookup across multiple systems: 14–21 days as the internal target, with the legal 30-day window as the backstop
A large enterprise with formal approval workflows: set the SLA to match your internal SLA policy; use the evidence log to track compliance with your own commitments

The configurable orgs.sla_days field in the database — not yet wired to request creation in the current version, but in the roadmap — will let each organisation set its own target without changing code.

The Status vs. SLA Distinction

Early versions of DPDP Copilot conflated two concepts in a single field: the workflow status of the request (open, closed?) and the computed SLA urgency (within deadline?). The second database migration separates these:

-- migration 002_split_status_from_sla.sql
ALTER TABLE requests ADD COLUMN IF NOT EXISTS status text DEFAULT 'OPEN' NOT NULL;

UPDATE requests
SET status = CASE
  WHEN sla_status = 'CLOSED' THEN 'CLOSED'
  ELSE 'OPEN'
END;

After this migration:

status is the workflow state: OPEN or CLOSED. Closed means a reply was sent and the request is resolved.
The live SLA urgency is always computed by computeSlaStatus at read time.

This matters for reporting. You want to answer: "Of all requests that were open during the last month, what percentage were responded to within the internal SLA?" That question requires separating workflow state from deadline state.

Part 7: Multilingual Compliance at Scale

The multilingual capability deserves more attention than it typically gets in discussions of DPDP tooling.

Why Language Matters for DPDP

India's 2011 census (the most recent with detailed language data) recorded 19,569 raw mother tongue entries from respondents — often cited as "over 19,500 languages spoken in some capacity" — which consolidate into 121 languages with more than 10,000 speakers each. The DPDP Act and Rules 2025 require that grievance mechanisms be accessible, which practically means: if your users write to you in Hindi, a response mechanism that only understands English is not accessible.

DPDP Copilot supports five languages in the current version:

English — the default, always available
Hindi — 528 million speakers (2011 census)
Bengali — 97 million speakers
Tamil — 69 million speakers
Marathi — 83 million speakers

The public request form shows these as radio button options. The selection flows into the API request, through the drafting call, and into the AI prompt.

What the AI Draft Looks Like in Practice

Here's the same erasure request processed in two languages.

Input (English):

"I never gave consent for you to sell my data. Please delete everything you have about me immediately."

Suggested reply in English:

Dear Data Principal,

Thank you for your request submitted on 25 May 2025. We have received and logged your request for erasure of personal data under Section 12(b) of the Digital Personal Data Protection Act, 2023.

Your request has been assigned reference number [REF-ID]. Our compliance team will review your request, locate the relevant data, and initiate the erasure process in accordance with applicable legal requirements. We will respond with the outcome within the timeframe required under the DPDP Act.

Please retain this acknowledgement for your records.

Suggested reply in Hindi:

प्रिय डेटा प्रिंसिपल,

25 मई 2025 को प्रस्तुत आपके अनुरोध के लिए धन्यवाद। हमने डिजिटल व्यक्तिगत डेटा संरक्षण अधिनियम, 2023 की धारा 12(ख) के अंतर्गत आपके व्यक्तिगत डेटा के विलोपन के अनुरोध को प्राप्त कर दर्ज किया है।

आपके अनुरोध को संदर्भ संख्या [REF-ID] दी गई है। हमारी अनुपालन टीम आपके अनुरोध की समीक्षा करेगी, संबंधित डेटा का पता लगाएगी और लागू कानूनी आवश्यकताओं के अनुसार विलोपन प्रक्रिया शुरू करेगी। हम डीपीडीपी अधिनियम के तहत निर्धारित समय-सीमा के भीतर आपको परिणाम की सूचना देंगे।

कृपया इस पावती को अपने रिकॉर्ड के लिए सुरक्षित रखें।

The structure is identical. The legal references are consistent. The tone is professional but accessible. An operator who reviews the Hindi draft can run it through a translation tool to verify quality before sending — the AI draft is a starting point, not a blindly trusted final output.

Why Separate Prompts Per Language Matter

A naive approach would translate a fixed English template into other languages once, then serve those static translations. This works for simple acknowledgements but fails for personalised responses that need to reference the specific request content.

Because DPDP Copilot drafts replies by passing the original message to the model, the suggested reply can acknowledge specific details the data principal mentioned — not just their request type. If someone writes "I asked you to stop sending me SMS messages three months ago and you're still doing it," a good response acknowledges that history. A static template can't.

The LLM approach generates a response that's contextually appropriate in the data principal's language — which is a qualitatively different outcome from translation.

Part 8: The Data Architecture

Schema Design for Compliance

The database schema is designed around compliance requirements first, application convenience second.

-- Three tables, three responsibilities

CREATE TABLE orgs (
  id         uuid PRIMARY KEY,
  name       text NOT NULL,
  created_at timestamptz DEFAULT now(),
  sla_days   integer DEFAULT 7 NOT NULL
);

CREATE TABLE requests (
  id              uuid PRIMARY KEY,
  message         text NOT NULL,
  type            text NOT NULL,
  status          text DEFAULT 'OPEN' NOT NULL,
  suggested_reply text,
  sla_due_at      timestamptz,
  org_id          uuid NOT NULL REFERENCES orgs(id),
  created_at      timestamptz DEFAULT now() NOT NULL
);

CREATE TABLE evidence_events (
  id          uuid PRIMARY KEY,
  request_id  uuid REFERENCES requests(id),
  event_type  text NOT NULL,
  event_data  jsonb,
  created_at  timestamptz DEFAULT now() NOT NULL,
  org_id      uuid NOT NULL REFERENCES orgs(id)
);

The orgs.sla_days field exists and is populated but not yet wired to request creation — the 7-day hardcode is the current implementation. When that field is connected, different organisations can run different internal SLA targets. The schema is ready for that; the application code isn't yet.

The evidence_events.event_data field is jsonb — flexible enough to store different metadata per event type without schema changes. As the tool evolves (new event types, operator attribution, channel tracking), existing rows aren't invalidated.

Index Strategy

Two composite indexes:

CREATE INDEX requests_org_created_idx
  ON requests (org_id, created_at DESC);

CREATE INDEX evidence_events_request_org_created_idx
  ON evidence_events (request_id, org_id, created_at);

The first index supports the inbox query: "give me all requests for this org, sorted by most recent." The second supports the request detail query: "give me all evidence events for this request in this org, in chronological order."

Both indexes include org_id as the leading column because every query in the application is org-scoped. An index that starts with org_id is used by the query planner even for queries that also filter by request_id — the org scope eliminates most of the table before the planner looks at other columns.

Part 9: Deployment Architecture

Self-Hosted by Design

DPDP Copilot is self-hosted. That's a deliberate product decision, not an oversight.

DPDP requests often contain sensitive personal data — names, contact details, account information, and sometimes sensitive categories of data like health information or financial details. The organisation processing these requests is the data fiduciary. Routing that data through a third-party SaaS for classification and storage creates its own compliance risk: you're a data processor, processing data principal requests by sending them to another data processor, with all the consent and data transfer implications that entails.

Running the tool in your own infrastructure — whether on-premises or in a cloud account you control — keeps the data principal's message in your trust boundary. The only data that leaves your environment is the message text sent to Anthropic's API for classification and drafting. That's a single, scoped, auditable data transfer that you control.

Docker Compose Quickstart

# Clone and configure
git clone https://github.com/swapnanil/dpdp-copilot
cd dpdp-copilot
cp .env.example .env

# .env minimum required:
# ANTHROPIC_API_KEY=sk-ant-...
# DATABASE_URL=postgresql://user:pass@db:5432/dpdp
# ADMIN_USER=compliance_admin
# ADMIN_PASS=your_secure_password
# DEFAULT_ORG_ID=          # fill after running seed
# ADMIN_SESSION_SECRET=    # openssl rand -hex 32

# Start database
docker compose up db -d

# Run migrations
docker compose run --rm migrate

# Seed initial org (note the UUID it prints)
docker compose run --rm seed

# Start the application
docker compose up app

Open http://localhost:3000 for the operator inbox.
Open http://localhost:3000/grievance for the public form.

Environment Configuration Reference

Variable	Required	Description
`ANTHROPIC_API_KEY`	Yes	Your Anthropic API key for Claude
`DATABASE_URL`	Yes	PostgreSQL connection string
`ADMIN_USER`	Yes	Operator login username
`ADMIN_PASS`	Yes	Operator login password
`DEFAULT_ORG_ID`	Yes	UUID of the active organisation (from seed)
`ADMIN_SESSION_SECRET`	Production	Signs session cookies — `openssl rand -hex 32`
`MODEL`	No	Claude model (default: `claude-sonnet-4-6`)
`MAX_TOKENS`	No	Reply draft length (default: 1024)
`PUPPETEER_EXECUTABLE_PATH`	Docker	Chromium path — set automatically in Docker

Production Considerations

Session signing: Generate ADMIN_SESSION_SECRET with openssl rand -hex 32. In development you can skip this; in production the session cookie must be signed or it's trivially forgeable.

Database: The Docker Compose setup runs Postgres in a container. For production, use a managed database (AWS RDS, Google Cloud SQL, Supabase) with automated backups. The evidence table is your legal record — you want it on infrastructure with point-in-time recovery.

HTTPS: Run behind a reverse proxy (nginx, Caddy) that terminates TLS. Session cookies should have Secure and SameSite=Strict — these aren't set in the current implementation but are straightforward to add in a production nginx config.

Rate limiting: The public /grievance form has no rate limiting in the current version. A reverse proxy rate limit on the public intake endpoint prevents abuse without touching the application code.

Part 10: Known Limitations and What's Next

Honesty about limitations is part of useful tooling documentation. Here's what DPDP Copilot currently doesn't do and what the roadmap looks like.

Current Limitations

No outbound delivery: The "send reply" workflow doesn't actually send anything. It marks the reply as sent in the evidence log and sets the request to CLOSED. The operator is responsible for actually sending the drafted reply via their existing channel (email, portal, etc.). This is a limitation of the MVP, not the design — real outbound email delivery is the obvious next step.

Single-admin authentication: The current auth model is a single username/password pair from environment variables. There's no user table, no role model, no per-operator audit trail. Multiple operators can't be tracked individually. This is fine for a team of one; it's a problem for a compliance team of five.

Static org configuration: The active organisation is selected via DEFAULT_ORG_ID in the environment. There's no UI for switching organisations or a multi-tenant router. The database schema supports multiple orgs; the application routing doesn't.

No structured contact data: Contact information is embedded in the free-form message. There's no contact_email or contact_phone field. This means there's no reliable way to programmatically address the data principal in the reply or route the response to them.

PDF XSS risk: The PDF template interpolates user-provided text directly into HTML without escaping. A malicious actor could potentially inject HTML into the generated PDF. This is a known issue and is the highest-priority security fix.

No notifications: Operators have no way to be alerted when a new request comes in or when a request is approaching its internal SLA deadline. Checking the inbox manually is the only current mechanism.

The Roadmap

Outbound reply delivery: Send the drafted reply via email (SendGrid, AWS SES, or SMTP) directly from the tool. Logs the delivery event to the evidence table. The operator reviews the draft, edits if needed, and clicks Send — not "Copy this and email it manually."

SLA alerts: Email or Slack notification when a request enters DUE_SOON status. Optional daily digest of all open requests with their current SLA status.

Multi-operator support: A users table, per-operator login, and role assignment (reviewer vs. approver). Evidence events attributed to specific operators. Audit trail for who touched what.

Structured contact fields: Separate contact_email from the message body at intake. Validate format. Apply retention controls — contact data should be deletable when the request is closed without deleting the evidence trail.

Configurable SLA: Wire orgs.sla_days to request creation. Different organisations have different internal SLA commitments — the schema already supports this.

Approval workflow: A draft reply requires supervisor approval before it can be sent. The evidence log records who approved and when. This is an operational pattern for organisations where a junior compliance analyst drafts but a senior officer approves.

Analytics dashboard: How many requests per week? What types? Average response time? What percentage are within the internal SLA? This is a reporting requirement for any compliance programme worth its name.

Part 11: How DPDP Copilot Fits Into a Broader Compliance Programme

DPDP Copilot handles the data principal rights workflow. That's one piece of a complete DPDP compliance programme. Here's how it fits:

What DPDP Copilot Covers

Receiving data principal requests (Access, Rectification, Deletion, Grievance)
Classifying them correctly and consistently
Drafting multilingual responses
Tracking internal SLA deadlines
Generating the audit evidence trail
Exporting evidence for regulatory review

What It Doesn't Cover

Data discovery: Finding where a person's data actually lives across your systems. DPDP Copilot receives and tracks the request but doesn't automate the underlying data lookup. That's a data catalogue problem.
Consent management: Recording and tracking what data was collected under what consent. That's a separate consent registry.
Privacy notices: Generating or maintaining the notice required under Section 5 of the Act. That's a legal document workflow.
Data breach notification: Section 8(6) requires prompt notification of significant breaches to the Data Protection Board and affected persons. That's a separate incident response workflow.
Cross-border transfer compliance: The Act restricts transfers of personal data to certain countries. That's a data governance and infrastructure question.

A full DPDP compliance programme needs all of these. DPDP Copilot handles the rights management piece — the part that creates the most immediate operational urgency because it has a hard deadline on individual transactions and a direct escalation path to the Data Protection Board.

The Risk Reduction Calculation

Before DPDP Copilot:

Time to acknowledge a request: hours to days (depends on inbox monitoring)
Time to classify a request: manual, inconsistent, language-dependent
Time to draft a response: hours (finding a template, adapting it, translating it)
Deadline tracking: none — someone has to remember
Evidence: none — email threads that can be deleted

After DPDP Copilot:

Time to acknowledge: seconds (the evidence log records receipt immediately on submission)
Time to classify: 1–2 seconds (LLM call)
Time to draft a response: 2–3 seconds (LLM call)
Deadline tracking: automatic, live-computed, visible in the operator inbox
Evidence: append-only database log, exportable as PDF or CSV in one click

The reduction in time-to-first-action is the most important improvement. The legal clock starts when the request is submitted — not when someone reads it. DPDP Copilot ensures that classification and drafting are done before any human even opens the inbox. The operator's job is review and send, not receive-classify-draft-send.

Part 12: Who Should Use This

Compliance and legal teams at Indian companies processing personal data of Indian residents under the DPDP Act. If you're a data fiduciary — collecting or processing personal data — you have obligations under this Act. If you don't have a structured process for handling data principal requests, you need one.

Engineering teams building privacy infrastructure who need a reference implementation of DPDP request handling. The codebase is open-source. The data model, the API structure, the evidence logging pattern, the SLA computation logic — all of it is readable, runnable, and adaptable.

Startups at the early compliance stage who don't yet have a dedicated compliance team. The tool runs on a single machine. Configuration is a .env file. The public form can be linked from your privacy policy. You don't need a compliance department to run it — you need someone who checks the inbox.

Organisations handling multilingual Indian user bases where an English-only inbox isn't accessible to all the people it's supposed to serve. If your users write to you in Hindi and Tamil, they deserve responses in Hindi and Tamil — and the time cost of manual translation has historically made that impractical. It isn't anymore.

A Complete Example Walkthrough

Let me walk through a real scenario end-to-end, using the tool as a data principal and then as an operator.

As the Data Principal

You purchased something from a company. You're now getting SMS marketing messages you didn't opt in to. You want to file an erasure request and a grievance.

You go to https://yourcompany.com/grievance.

You write:

"I never gave you permission to send me SMS promotions. I want you to delete my phone number and all data you hold about me. I also want to formally complain about this."

You select Hindi as your preferred language and submit.

You receive an acknowledgement: "Your request has been received and logged. Reference: [UUID]. Our compliance team will be in touch with the outcome."

As the Compliance Operator

You open the operator inbox the next morning. You see a new request, classified as Grievance (the model detected the formal complaint language alongside the deletion request), with WITHIN_SLA status.

You click into the request. You read the original message. The suggested reply in Hindi is already drafted. You read it — it acknowledges the complaint, confirms the erasure request has been noted, and explains next steps in Hindi.

You make a small edit to reference your company's specific erasure process. You click "Send Reply" — which in the current version means you copy the draft, send it via your email system, and then click "Mark as Sent" in the tool.

The evidence timeline now shows:

REQUEST_CREATED     2025-05-25 10:00:00
REQUEST_CLASSIFIED  2025-05-25 10:00:01  (Grievance)
REPLY_SUGGESTED     2025-05-25 10:00:02
REPLY_SENT          2025-05-26 09:15:00

Total response time: 23 hours. Well within any reasonable SLA window. The CSV export documents this. If the data principal escalates to the Data Protection Board, you have a timestamped, exportable record of the complete interaction.

Quick Reference

Public form: GET /grievance

API Endpoints:

Method	Path	Auth	Description
`POST`	`/api/public/requests`	None	Submit a data principal request
`POST`	`/api/login`	None	Operator login
`POST`	`/api/logout`	None	Operator logout
`GET`	`/api/requests`	Operator	List all requests with live SLA status
`GET`	`/api/requests/:id`	Operator	Request detail + evidence timeline
`POST`	`/api/requests/:id/send-reply`	Operator	Mark reply sent, close request
`GET`	`/api/requests/:id/export/pdf`	Operator	Download PDF evidence report
`GET`	`/api/requests/:id/export/csv`	Operator	Download CSV evidence export

Request lifecycle:

Public form submission
  → Internal SLA clock starts (configurable, default 7 days)
  → REQUEST_CREATED logged
  → AI classification (Grievance / Access / Rectification / Deletion)
  → REQUEST_CLASSIFIED logged
  → AI reply drafted in chosen language
  → REPLY_SUGGESTED logged
  → Operator reviews in inbox
  → Operator marks reply sent
  → REPLY_SENT logged
  → Request status: CLOSED

Legal response windows under DPDP Rules 2025:

Request type	Section	Legal window
Access	Section 11	30 days
Correction / Erasure	Section 12	30 days
Grievance	Section 13	90 days

Final Thought

The DPDP Act's data principal rights framework isn't complicated. Four rights, two response windows, one evidence requirement. The complexity is operational — handling a high-variance stream of natural language requests, in multiple languages, against a hard time constraint, with an audit trail that has to survive regulatory scrutiny.

Manual processes fail under those conditions not because of negligence but because the requirements are genuinely hard to satisfy with shared inboxes and email chains.

DPDP Copilot automates the classification and drafting — the two tasks that are the most time-consuming and the most error-prone. It makes the internal SLA clock visible before it expires. It generates the audit evidence as a byproduct of normal operation, not as a separate reporting task.

The tool is open-source, self-hosted, and runs on a single Docker Compose command. If you're an Indian company with DPDP obligations and no structured data rights workflow, this is where to start.

→ View the full tool page, docs, live demo, and GitHub repo

Built by Swapnanil Saha — swapnanilsaha.com

How to Stop Evaluating LLM Outputs by Gut Feel

Swapnanil Saha — Thu, 21 May 2026 05:25:31 +0000

The standard workflow for evaluating LLM output quality goes something like this: someone reads Response A, reads Response B, and says "I think A is better." Everyone nods. The prompt ships.

This is a problem for three reasons:

It doesn't scale. You can't manually review 500 eval pairs after every prompt change.
It's inconsistent. The same person evaluating the same pair on different days produces different results.
It doesn't tell you why. "Response A is better" doesn't tell you what to fix when Response B becomes the baseline.

I built LLM Eval Suite to replace gut feel with structured, evidence-backed scoring — for any task type, with CI integration.

→ Full tool page

The Core Insight: Evidence, Not Opinion

Every score in LLM Eval Suite is accompanied by a verbatim quote from the response being evaluated. Not "this response has poor faithfulness" — but:

Faithfulness: 1.0/10
Quote: "30-day return policy, no questions asked"
Reasoning: "Source document specifies 14 days. This is a clear hallucination, not an interpretation."

This changes what you can do with the output. You can show it to a stakeholder. You can track it over time. You can build a regression test from it. You can tell the model what specifically went wrong.

Six Evaluation Capabilities

Multi-Dimensional Scoring

Ten task presets — QA, summarisation, RAG, code generation, creative writing, classification, translation, and more. Each preset activates the dimensions that matter for that task:

Task Type	Key Dimensions
`qa`	Faithfulness, Completeness, Conciseness, Relevance
`summarisation`	Coverage, Compression, Accuracy, Readability
`rag`	Faithfulness, Answer Relevancy, Context Precision, Context Recall
`code`	Correctness, Efficiency, Readability, Security

Every dimension score comes with verbatim evidence from the response text.

docker-compose run cli eval \
  --file examples/eval_qa.json \
  --mode compare \
  --format markdown

Regression Testing

Save any eval report as a named baseline:

docker-compose run cli regression save results.json --id prod-baseline

Run future evals against it:

docker-compose run cli regression run results.json --id prod-baseline --format markdown

Per-dimension deltas are compared against configurable thresholds. Exit code 1 when scores drop below your floor. This is the feature that makes the tool useful in CI.

GitHub Actions Integration

- name: Run LLM eval
  run: |
    docker-compose run cli eval \
      --file evals/suite.json \
      --mode rank \
      --format junit \
      --output results.xml

- uses: mikepenz/action-junit-report@v3
  with:
    report_paths: results.xml

- name: Regression check
  run: |
    docker-compose run cli regression run \
      results.json --id prod-baseline
    # exits 1 if any dimension drops beyond threshold

This gates model upgrades, prompt changes, and fine-tune releases automatically. The JUnit XML output integrates with any CI system that understands test reports.

Hallucination Detection

Claim-level analysis against a source document. Each claim in the response is classified as supported or unsupported — binary, not "mostly faithful."

docker-compose run cli hallucination \
  --response output.txt \
  --source source.txt \
  --format markdown

Risk levels: none / low / moderate / high / critical, with a safe_to_use boolean for downstream gating. This is what you run before using LLM output in a production pipeline where accuracy matters.

Example output:

hallucination_risk: high
safe_to_use: false

Claim: "30-day return policy"
  status: unsupported
  evidence: "Source specifies 14 days"
  severity: critical

Claim: "no questions asked"
  status: unsupported
  evidence: "Source makes no mention of return conditions"
  severity: high

Prompt Sensitivity Analysis

Test 2–5 prompt variants against a fixed response. Per-dimension variance tells you which dimensions are fragile across phrasings and which are stable.

docker-compose run cli sensitivity \
  --file examples/prompt_variants.json \
  --format markdown

Know which prompt phrasings shift your scores before you deploy. High-variance dimensions across prompts signal that your evaluation isn't measuring the response — it's measuring the prompt wording.

Panel Evaluation

Run N independent judge passes on the same evaluation. Mean and variance per dimension expose where judges agree and where they disagree.

docker-compose run cli panel \
  --file examples/eval_qa.json \
  --judges 5 \
  --format markdown

High-variance dimensions are flagged for human review automatically. The panel mode is the right choice when you're evaluating subjective tasks like creative writing where a single judge's opinion is insufficient signal.

RAGAS-Compatible RAG Preset

The rag task type maps the four RAGAS metrics — faithfulness, answer relevancy, context precision, context recall — as first-class evaluation dimensions with equal weighting. The output is compatible with RAGAS reporting conventions, so you can integrate this into existing RAGAS workflows or use it as a drop-in alternative.

Example: Two Responses In, Clear Winner Out

Input:

{
  "task_type": "qa",
  "eval_mode": "compare",
  "source": "Refunds are accepted within 14 days if the item is unused.",
  "responses": [
    {
      "label": "Response A",
      "text": "You can get a refund within 14 days if the item hasn't been used."
    },
    {
      "label": "Response B",
      "text": "Our 30-day return policy means no questions asked."
    }
  ]
}

Output:

winner: Response A
margin: clear

Response B — Faithfulness
  score: 1.0/10
  quote: "30-day return policy, no questions asked"
  reasoning: "Source specifies 14 days. 'No questions asked' is not in the source.
              Two distinct hallucinations in one sentence."

Response A — Faithfulness
  score: 9.5/10
  quote: "within 14 days if the item hasn't been used"
  reasoning: "Accurately paraphrases the source with no additions."

Why This Matters in Production

LLM evaluation is usually treated as a one-time concern — you evaluate before you ship. But models change, prompts drift, data distributions shift, and retrieval quality fluctuates. A system that was 90% faithful in January may be 75% faithful in April because the upstream data changed.

The regression testing and CI integration in LLM Eval Suite are designed for this reality. You run evals continuously, not just at release time. The baseline is the floor — if you drop below it, the pipeline stops.

→ View the full tool page, docs, and GitHub repo

Stop Getting 'It Depends' Answers About RAG Architecture

Swapnanil Saha — Thu, 21 May 2026 05:09:30 +0000

Ask five AI engineers which vector database to use for your RAG system. You'll get five different answers, and they'll all start with "it depends."

It depends on your data volume. It depends on your query patterns. It depends on whether you need GDPR compliance. It depends on your team's infra maturity. It depends on your budget. It depends on whether you're doing hybrid search.

The "it depends" answer is technically correct and operationally useless. It turns an architecture decision into an unbounded research project.

I built RAG Readiness to make one specific recommendation per component — and explain why.

→ Full tool page

The Design Principle: Opinions, Not Options

Most RAG tooling and documentation presents you with a comparison table. Pinecone vs. Weaviate vs. Qdrant vs. Chroma. BM25 vs. dense vs. hybrid. ada-002 vs. text-embedding-3-large.

Comparison tables are useful if you already know which dimensions matter for your use case. They're paralyzing if you don't.

RAG Readiness is opinionated by design. You describe your use case, your data, your constraints. The tool returns one choice per component — with full reasoning.

If GDPR applies, managed cloud vector databases are eliminated from consideration before the LLM is even called. That's a rule, not an LLM judgment. The recommendation you receive is already constraint-filtered.

Six Modes, One Tool

Architecture Recommendation

The core mode. Answer a structured set of questions about your use case — document types, query patterns, scale, compliance requirements, team capabilities. Get back:

Vector database: one specific choice with rationale
Embedding model: one specific choice
Chunking strategy: one specific approach with parameters
Retrieval method: dense / BM25 / hybrid — one answer
Reranker: whether you need one and which

python main.py audit --interactive
# or from file:
python main.py audit --file examples/usecase_legal_contracts.json --with-cost

Architecture Diagnosis

You already have a RAG system. It's not working. This mode takes your existing architecture and the problems you're seeing, and returns a root-cause analysis per component with severity levels and one specific fix.

Not "improve your chunking" — "switch from fixed 512-token chunks to parent-child hierarchical chunking with 512-token child nodes. Your documents have multi-clause structure that fixed chunks split mid-sentence."

python main.py diagnose --file examples/diagnosis_pinecone_fixed.json

Example output:

overall_severity: critical

chunking_strategy — critical
  "Fixed 512-token chunks split mid-clause in long legal documents"
  Fix: Parent-child hierarchical chunking, 512-token child nodes

retrieval_method — high
  "Dense-only misses exact terms like dollar amounts and clause references"
  Fix: Hybrid BM25 + dense with RRF fusion

quick_fix: Enable 10% token overlap today. Takes 20 minutes, reduces
           the worst failures while you implement the full fix.

Multi-Use-Case Session

Run up to 5 parallel audits in a single request — useful when you're scoping a RAG platform that needs to serve multiple internal teams.

The output includes cross-cutting insights: which components can be shared across use cases, where requirements conflict (the legal team needs GDPR-compliant storage; the sales team wants managed cloud), and which use case to build first for the highest return on the shared infrastructure investment.

Implementation Bundle

Once you have an architecture you trust, generate a complete implementation starter kit:

python main.py bundle <session-id>

Output: a requirements.txt, docker-compose.yml, .env.example, and migration guide tailored to the recommended architecture. If you have an existing stack, you get ordered migration steps with rollback notes.

Cost Estimation

Rule-based monthly cost breakdown per component — no LLM call. Lookup tables for vector DB pricing tiers, embedding API costs, reranker inference, and LLM costs at your estimated query volume.

python main.py cost <session-id>

Returns a line-item breakdown, optimization tips (e.g., "switching to a self-hosted embedding model saves ~$800/month at this query volume"), and a hosting model classification (managed vs. self-hosted trade-off at your scale).

RAGAS Eval Dataset Generation

Generate evaluation questions grounded in your actual use case and query patterns — not generic retrieval questions.

python main.py eval-dataset <session-id> --num-questions 20

Output includes easy/medium/hard distribution, RAGAS metric mapping (which questions test faithfulness vs. answer relevancy vs. context precision), an annotation guide, and a time estimate for human review.

Session Persistence and Refinement

Every audit persists to SQLite. You can refine against new constraints:

python main.py refine <session-id> --feedback "Qdrant was too heavy for our infra team"

The tool re-runs with the feedback as an additional constraint. Refinement history is tracked — you can see how the recommendation evolved across iterations.

A Complete Quickstart

git clone https://github.com/swapnanil/rag-readiness
cd rag-readiness
cp .env.example .env  # add your ANTHROPIC_API_KEY
docker-compose up api

# New architecture audit (interactive)
python main.py audit --interactive

# Diagnose a broken stack
python main.py diagnose --interactive

# Multi-use-case session
python main.py multi-audit examples/multi_usecase_lexvault.json

# List sessions and refine
python main.py sessions
python main.py refine <session-id> --feedback "need self-hosted only"

# Cost breakdown and eval dataset
python main.py cost <session-id>
python main.py eval-dataset <session-id> --num-questions 20

The Pre-Scoring Layer

Before any LLM call, a rule-based pre-scorer computes a complexity score (1–10) from the use case inputs. This has two effects:

It calibrates the LLM prompt — a complexity-1 use case gets a simpler, more direct recommendation; a complexity-9 use case gets a recommendation with more explicit trade-off reasoning.
It runs conflict detection — if your inputs contain contradictory constraints (e.g., "GDPR compliant" + "use Pinecone"), the conflict is flagged before the LLM is called, not discovered in the output.

Who This Is For

AI engineers starting a new RAG project who want a structured starting point rather than a blank page
Engineering leads who need to scope a RAG system for a business use case and justify the architecture choices to non-technical stakeholders
Teams with an existing RAG system that isn't performing as expected and need a systematic diagnosis, not a hunch

The tool is open-source, runs locally, and persists everything to SQLite. Your use case details don't leave your environment beyond the single LLM API call per audit.

→ View the full tool page, docs, and GitHub repo

Building Distributed Systems, Backend Infrastructure & AI Platforms — My Engineering Journey

Swapnanil Saha — Tue, 19 May 2026 09:34:58 +0000

Hey everyone 👋

I’m Swapnanil Saha, a backend and distributed systems engineer from Mumbai, India with 9+ years of experience building high-performance infrastructure systems, backend platforms, optimization pipelines, and AI-driven architectures.

🌐 Website: swapnanilsaha.com

💻 GitHub: github.com/swapnanil

🔗 LinkedIn: linkedin.com/in/swapnanil