<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Karan Singh Chandel</title>
    <description>The latest articles on Forem by Karan Singh Chandel (@karan_devrel_0207).</description>
    <link>https://forem.com/karan_devrel_0207</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3720910%2F4cc6d5a9-b278-41f7-a6a9-1d2fceeb1b9c.png</url>
      <title>Forem: Karan Singh Chandel</title>
      <link>https://forem.com/karan_devrel_0207</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/karan_devrel_0207"/>
    <language>en</language>
    <item>
      <title>Claude Code Under the Hood: How It Actually Works</title>
      <dc:creator>Karan Singh Chandel</dc:creator>
      <pubDate>Mon, 06 Apr 2026 06:49:27 +0000</pubDate>
      <link>https://forem.com/karan_devrel_0207/claude-code-under-the-hood-how-it-actually-works-2kb8</link>
      <guid>https://forem.com/karan_devrel_0207/claude-code-under-the-hood-how-it-actually-works-2kb8</guid>
      <description>&lt;p&gt;In March 2026, Anthropic accidentally published a source map in an npm package.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw7yd4ra19hf5r8imifq9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw7yd4ra19hf5r8imifq9.png" alt="image of claude running" width="772" height="675"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That map pointed to a zip on R2. That zip contained unobfuscated TypeScript for Claude Code. Someone posted it on X. And suddenly, a lot of us had front-row seats to how a modern coding agent is actually built.&lt;/p&gt;

&lt;p&gt;I spent some time digging through the code, docs, and community notes. This post is the version I wish I had on day one: practical, opinionated.&lt;/p&gt;

&lt;p&gt;No hype. Just architecture, tradeoffs, and what this means if you build AI tooling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick take
&lt;/h2&gt;

&lt;p&gt;Claude Code is not magic. It is a very disciplined system around a very capable model.&lt;/p&gt;

&lt;p&gt;The model is the brain. The product quality comes from the nervous system: loop orchestration, permissions, context compaction, tool contracts, caching, retries, and UI responsiveness.&lt;/p&gt;

&lt;p&gt;You can clone the outer loop in a weekend. You cannot clone the reliability story in a weekend.&lt;/p&gt;




&lt;h2&gt;
  
  
  The moment it clicked for me
&lt;/h2&gt;

&lt;p&gt;I had that "oh" moment while tracing one simple task: &lt;strong&gt;"fix failing tests."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On paper, that's one sentence. In production, it's a chain of fragile decisions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Decide to run tests first without being told.&lt;/li&gt;
&lt;li&gt;Parse noisy failures and isolate relevant files.&lt;/li&gt;
&lt;li&gt;Search a big repo without getting lost.&lt;/li&gt;
&lt;li&gt;Make a minimal edit instead of a destructive rewrite.&lt;/li&gt;
&lt;li&gt;Re-run tests and interpret second-order failures.&lt;/li&gt;
&lt;li&gt;Recover when the first fix was wrong.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you've ever built an internal "code agent" prototype, you know where this usually dies. Not on step one. On step four or five, when context is bloated, terminal output is huge, and one flaky command derails the run.&lt;/p&gt;

&lt;p&gt;That is where Claude Code is strongest: &lt;strong&gt;not in the first answer, but in the 17th decision.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture in simple diagram
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4epdh6ptq1261zyiyni.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4epdh6ptq1261zyiyni.png" alt="agentic loop Image" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you remember one thing from this article, remember this: the loop is the skeleton, but the guardrails and context strategy are the organs.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Start: the first 200 ms are engineered, not lucky
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;claude&lt;/code&gt;, startup does something I love from a DevRel perspective: it optimizes what users &lt;em&gt;feel&lt;/em&gt;, not just what engineers measure.&lt;/p&gt;

&lt;p&gt;Before heavy modules fully finish loading, it can kick off parallel work like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;policy lookups (MDM/enterprise settings)&lt;/li&gt;
&lt;li&gt;secure token retrieval&lt;/li&gt;
&lt;li&gt;warming a connection to API infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means by the time your first prompt lands, expensive setup has already started.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Funny Analogy:&lt;/strong&gt; this is the restaurant that starts warming your plate while your order is still being typed. You call it a "fast kitchen," but really it's choreography.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Bootstrap: boring systems that prevent expensive pain
&lt;/h2&gt;

&lt;p&gt;Initialization has all the things seasoned platform teams eventually add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;schema-validated config with migrations&lt;/li&gt;
&lt;li&gt;auth and token lifecycle handling&lt;/li&gt;
&lt;li&gt;telemetry and feature flags&lt;/li&gt;
&lt;li&gt;environment-specific policy resolution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is flashy. All of this is what keeps large org rollouts from becoming support tickets.&lt;/p&gt;

&lt;p&gt;One detail I found especially smart is &lt;strong&gt;build-time elimination for disabled features&lt;/strong&gt;. If a feature flag is off, code can be stripped from shipped artifacts rather than merely gated at runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;smaller runtime surface&lt;/li&gt;
&lt;li&gt;fewer hidden interactions&lt;/li&gt;
&lt;li&gt;safer gradual rollout&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Let's see QueryEngine
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4qk13nx2kbqxjbn8mef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4qk13nx2kbqxjbn8mef.png" alt="Query Engine Image" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;People say "it's just an agent loop." Sure.&lt;/p&gt;

&lt;p&gt;So is saying "a database is just read and write."&lt;/p&gt;

&lt;p&gt;The core loop is conceptually small:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;append user input&lt;/li&gt;
&lt;li&gt;build system and user context&lt;/li&gt;
&lt;li&gt;stream model output&lt;/li&gt;
&lt;li&gt;execute tool calls if present&lt;/li&gt;
&lt;li&gt;feed tool output back and continue&lt;/li&gt;
&lt;li&gt;stop when final response is produced&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The hard part is everything around those six steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streaming that survives partial network failures&lt;/li&gt;
&lt;li&gt;retry behavior that does not duplicate dangerous actions&lt;/li&gt;
&lt;li&gt;concurrency control for tools that can safely run in parallel&lt;/li&gt;
&lt;li&gt;token and cost accounting&lt;/li&gt;
&lt;li&gt;context compaction under pressure&lt;/li&gt;
&lt;li&gt;UX that stays responsive during long-running commands&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Think of QueryEngine as an air traffic control tower.&lt;/strong&gt; Planes landing is the easy part. Preventing collisions, handling weather, and rerouting under stress is the job.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why context engineering is the hidden moat
&lt;/h2&gt;

&lt;p&gt;Most people underestimate this.&lt;/p&gt;

&lt;p&gt;Context windows are finite and agent sessions are greedy. Every file read, terminal dump, and tool result consumes budget.&lt;/p&gt;

&lt;p&gt;Claude Code's split is simple and very effective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stable session metadata goes where caching benefits most&lt;/li&gt;
&lt;li&gt;rapidly changing memory and turn-specific data stay separate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That design improves cache hit rates and reduces reprocessing overhead.&lt;/p&gt;

&lt;p&gt;When pressure increases, &lt;strong&gt;compaction strategy&lt;/strong&gt; kicks in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;trim stale tool output first&lt;/li&gt;
&lt;li&gt;summarize older history when needed&lt;/li&gt;
&lt;li&gt;preserve recent user intent and high-signal artifacts longest&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt; this is not deleting notes randomly; it's compressing your meeting transcript while pinning the action items.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Tools: one contract, many capabilities
&lt;/h2&gt;

&lt;p&gt;This was one of the cleanest design choices I saw.&lt;/p&gt;

&lt;p&gt;Whether a tool is built-in, remote, or plugin-provided, it follows a common shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;validated input schema&lt;/li&gt;
&lt;li&gt;permission semantics&lt;/li&gt;
&lt;li&gt;execution implementation&lt;/li&gt;
&lt;li&gt;rendering behavior&lt;/li&gt;
&lt;li&gt;concurrency declaration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That single contract enables a lot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;minimal special-casing in the loop&lt;/li&gt;
&lt;li&gt;easier extension via MCP&lt;/li&gt;
&lt;li&gt;safer composition at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedcsyxwpx01ozrhh80o6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedcsyxwpx01ozrhh80o6.png" alt="request loop image" width="800" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've ever maintained a platform where every plugin needed custom branching logic, you know how valuable this is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Permissions: trust the model, verify the action
&lt;/h2&gt;

&lt;p&gt;Giving an AI shell access without controls is how you get a postmortem.&lt;/p&gt;

&lt;p&gt;The effective pattern is layered, not binary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;organization and project policy&lt;/li&gt;
&lt;li&gt;tool-level risk semantics&lt;/li&gt;
&lt;li&gt;current operation mode&lt;/li&gt;
&lt;li&gt;safety classification pass&lt;/li&gt;
&lt;li&gt;user confirmation when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns "agent autonomy" into &lt;strong&gt;managed autonomy&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt; modern CI/CD lets engineers ship fast, but only through protected branches, checks, and approvals. Same philosophy here.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Subagents: practical answer to context bloat
&lt;/h2&gt;

&lt;p&gt;Subagents are not just a shiny feature. They are a &lt;strong&gt;memory-management strategy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A subagent gets a fresh context, solves a bounded task, and returns a summary. The primary thread keeps signal without inheriting every intermediate breadcrumb.&lt;/p&gt;

&lt;p&gt;In real teams, this feels like delegation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;give a teammate a narrow problem&lt;/li&gt;
&lt;li&gt;let them research independently&lt;/li&gt;
&lt;li&gt;ask for a concise brief, not raw logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the shape of healthy agent orchestration.&lt;/p&gt;




&lt;h2&gt;
  
  
  "React in a terminal" sounds weird, but it makes sense
&lt;/h2&gt;

&lt;p&gt;Yes, terminal UI implemented with React primitives can sound like overkill.&lt;/p&gt;

&lt;p&gt;But after reading the architecture, I get it.&lt;/p&gt;

&lt;p&gt;For a state-heavy, component-driven, asynchronous interface, the React mental model is useful even outside the browser:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deterministic render updates&lt;/li&gt;
&lt;li&gt;composable UI states&lt;/li&gt;
&lt;li&gt;reusable hooks for permissions, sessions, and notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is less about "React everywhere" and more about &lt;strong&gt;"use a model your team can reason about under pressure."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The part many posts miss: this is a product bet, not a feature bet
&lt;/h2&gt;

&lt;p&gt;There are two philosophies in AI devtools right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach A:&lt;/strong&gt; heavily script workflows, tightly constrain decisions, make behavior predictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach B:&lt;/strong&gt; give the model broader agency and build hard safety systems around it.&lt;/p&gt;

&lt;p&gt;Claude Code leans toward B.&lt;/p&gt;

&lt;p&gt;That only works if infrastructure is serious: permissions, checkpoints, compaction, retries, cost visibility, extension boundaries. Without that, "agentic" quickly becomes "chaotic."&lt;/p&gt;

&lt;p&gt;With that, model improvements flow through the product with less product-level rewiring.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;You can absolutely build a capable coding loop in a weekend.&lt;/p&gt;

&lt;p&gt;What takes real engineering maturity is making that loop feel trustworthy on Tuesday at 6:40 PM, when tests are flaky, output is noisy, context is tight, and a teammate is waiting on your fix.&lt;/p&gt;

&lt;p&gt;That is what I came away respecting most. Not that it can answer. &lt;strong&gt;That it can keep operating.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're building in this space, study the orchestration and safety architecture as much as the prompts. The prompts get demos. The architecture gets adoption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's keep building!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>devtools</category>
      <category>architecture</category>
    </item>
    <item>
      <title>I Built a CLI Tool to Fix dbt Governance Problem Using DataHub</title>
      <dc:creator>Karan Singh Chandel</dc:creator>
      <pubDate>Wed, 21 Jan 2026 09:58:41 +0000</pubDate>
      <link>https://forem.com/karan_devrel_0207/i-built-a-cli-tool-to-fix-dbts-governance-problem-using-datahub-2e6</link>
      <guid>https://forem.com/karan_devrel_0207/i-built-a-cli-tool-to-fix-dbts-governance-problem-using-datahub-2e6</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I use dbt to write transformations, manage dependencies, run tests, and keep everything in version control. For building data pipelines, dbt is excellent.&lt;/p&gt;

&lt;p&gt;But when it comes to governance, dbt falls short.&lt;/p&gt;

&lt;p&gt;I wanted to know who owns each model. I wanted to enforce documentation standards. I wanted to catch when someone depends on deprecated data. I wanted these checks to run during development, in my terminal, in CI, not after deployment.&lt;/p&gt;

&lt;p&gt;dbt doesn't do this. It's a transformation tool, not a governance platform.&lt;/p&gt;

&lt;p&gt;So I built a CLI tool that brings governance into the dbt workflow by connecting it to DataHub.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pain Points with dbt (When It Comes to Governance)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  No Ownership Tracking
&lt;/h3&gt;

&lt;p&gt;dbt has no concept of ownership. When &lt;code&gt;dim_customer_metrics&lt;/code&gt; breaks at 2 AM, who do you contact? The original author might have left the company months ago. There's no ownership field in dbt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Documentation is Optional
&lt;/h3&gt;

&lt;p&gt;You &lt;em&gt;can&lt;/em&gt; add descriptions in dbt. But nothing enforces it. Half my models have no description. The other half say "customer table" which tells you nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  No Awareness of Deprecated Data
&lt;/h3&gt;

&lt;p&gt;dbt doesn't know if an upstream source is deprecated. You can depend on a table scheduled for removal and dbt won't warn you. You find out when production breaks.&lt;/p&gt;

&lt;h3&gt;
  
  
  No Data Classification
&lt;/h3&gt;

&lt;p&gt;Which models contain PII? Which are safe to share externally? dbt has no built-in tagging for sensitivity or compliance. Teams track this in spreadsheets that go stale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Governance Happens After the Fact
&lt;/h3&gt;

&lt;p&gt;Even if you try to enforce standards, it's manual. Someone reviews a PR, maybe checks ownership, maybe doesn't. It's inconsistent and depends on who's reviewing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: DataHub as the Governance Platform
&lt;/h2&gt;

&lt;p&gt;DataHub solves exactly what dbt lacks. It's a metadata platform that tracks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ownership&lt;/strong&gt;: who's responsible for each dataset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domains&lt;/strong&gt;: which business area owns the data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deprecation status&lt;/strong&gt;: what's scheduled for removal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tags&lt;/strong&gt;: PII classification, sensitivity levels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Descriptions&lt;/strong&gt;: rich documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lineage&lt;/strong&gt;: upstream and downstream dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DataHub has the governance metadata. The question was: how do I use it during dbt development?&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;dbt-datahub-cli&lt;/strong&gt;, a command-line tool that validates your dbt models against governance rules stored in DataHub.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You have dbt models&lt;/li&gt;
&lt;li&gt;You have governance metadata in DataHub (ownership, tags, etc.)&lt;/li&gt;
&lt;li&gt;This CLI checks if your models meet the governance standards&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;When you run the CLI:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reads your dbt manifest&lt;/strong&gt; - gets all your models from &lt;code&gt;manifest.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Looks up each model in DataHub&lt;/strong&gt; - fetches ownership, domain, tags, deprecation status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checks against your rules&lt;/strong&gt; - does this model have an owner? a description? is it using deprecated data?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reports what passed and failed&lt;/strong&gt; - with clear messages about what to fix&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  CLI Commands
&lt;/h3&gt;

&lt;p&gt;The main command is &lt;code&gt;validate&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dbt-datahub-cli validate &lt;span class="nt"&gt;--manifest&lt;/span&gt; target/manifest.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other useful commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;init&lt;/code&gt; - creates a starter config file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;list-rules&lt;/code&gt; - shows all available rules&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test-connection&lt;/code&gt; - checks if DataHub is reachable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  No DataHub Yet? Use Dry Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dbt-datahub-cli validate &lt;span class="nt"&gt;--manifest&lt;/span&gt; target/manifest.json &lt;span class="nt"&gt;--dry-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs validation without connecting to DataHub. Good for trying out the tool first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlfv1qjov00dldbjjig5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlfv1qjov00dldbjjig5.png" alt="Flow chart of how system connect" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rules
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;What It Enforces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;require_owner&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Every model must have an owner in DataHub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;require_description&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Models must have meaningful descriptions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;no_deprecated_upstream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Can't depend on deprecated datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;require_domain&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Models must belong to a business domain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;require_tags&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Required tags (like PII) must be present&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;upstream_must_have_owner&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dependencies must have owners too&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;naming_convention&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model names follow your patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each rule can be set as &lt;code&gt;error&lt;/code&gt; or &lt;code&gt;warning&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why DataHub?
&lt;/h2&gt;

&lt;p&gt;I chose DataHub because it already tracks everything I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single source of truth&lt;/strong&gt; - ownership, domains, tags in one place&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean API&lt;/strong&gt; - easy to fetch metadata with Python SDK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lineage aware&lt;/strong&gt; - knows what depends on what&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible&lt;/strong&gt; - custom metadata works too&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data was already there. I just needed to use it at the right time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Clone and Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
git clone https://github.com/karan0207/dbt-datahub-cli.git
&lt;span class="nb"&gt;cd &lt;/span&gt;dbt-datahub-cli
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[all]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Create Config
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dbt-datahub-cli init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;code&gt;governance.yml&lt;/code&gt; where you enable the rules you want.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Connect to DataHub
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DATAHUB_GMS_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://your-datahub:8080"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Run Validation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dbt-datahub-cli validate &lt;span class="nt"&gt;--manifest&lt;/span&gt; examples/sample_manifest.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sample Output
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiev8rst0ytto4yug0ma2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiev8rst0ytto4yug0ma2.png" alt="Validation Output in CLI" width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Web Dashboard
&lt;/h2&gt;

&lt;p&gt;Prefer a GUI? I also built a Streamlit dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dbt-datahub-cli dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flh4vrbfeoebq4chg04dl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flh4vrbfeoebq4chg04dl.png" alt="Streamlit Dashboard" width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same validation engine, visual interface.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before &amp;amp; After
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ownership unknown&lt;/td&gt;
&lt;td&gt;Ownership enforced via DataHub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Descriptions are "nice to have"&lt;/td&gt;
&lt;td&gt;Descriptions required to merge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deprecated deps break prod&lt;/td&gt;
&lt;td&gt;Deprecated deps blocked in CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance is manual&lt;/td&gt;
&lt;td&gt;Governance runs automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;Try it out:&lt;/strong&gt; &lt;a href="https://github.com/karan0207/dbt-datahub-cli" rel="noopener noreferrer"&gt;https://github.com/karan0207/dbt-datahub-cli&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Questions or ideas for new rules? Drop a comment or open an issue.&lt;/p&gt;

</description>
      <category>datahub</category>
      <category>dbt</category>
      <category>datagovernance</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
