<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Lucas Ribeiro</title>
    <description>The latest articles on Forem by Lucas Ribeiro (@lucash_ribeiro_dev).</description>
    <link>https://forem.com/lucash_ribeiro_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3264297%2F612575e3-5e0a-4404-a605-9cb9e9ec61c4.jpg</url>
      <title>Forem: Lucas Ribeiro</title>
      <link>https://forem.com/lucash_ribeiro_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lucash_ribeiro_dev"/>
    <language>en</language>
    <item>
      <title>The Serverless Semantic Engine: Architecting Mass Indexing Pipelines with Modal and Vector Databases</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Fri, 19 Dec 2025 05:52:53 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/the-serverless-semantic-engine-architecting-mass-indexing-pipelines-with-modal-and-vector-databases-2m32</link>
      <guid>https://forem.com/lucash_ribeiro_dev/the-serverless-semantic-engine-architecting-mass-indexing-pipelines-with-modal-and-vector-databases-2m32</guid>
      <description>&lt;h2&gt;
  
  
  Executive Summary
&lt;/h2&gt;

&lt;p&gt;The transition from keyword-based information retrieval to semantic search represents one of the most significant paradigm shifts in data engineering over the last decade. As organizations seek to leverage Large Language Models (LLMs) via Retrieval-Augmented Generation (RAG), the ability to efficiently crawl, embed, and index vast corpora of unstructured data has become a critical competency. However, traditional infrastructure approaches—relying on provisioned virtual machines, long-running Kubernetes clusters, or monolithic server architectures—often struggle to handle the distinct "bursty" nature of mass indexing workloads. A web crawler might sit idle for days and then require thousands of concurrent threads for a few hours; a vector embedding job requires massive GPU throughput for short bursts but is financially ruinous to maintain 24/7.&lt;/p&gt;

&lt;p&gt;This report provides an exhaustive technical analysis of architecting a serverless mass-indexing pipeline using &lt;strong&gt;Modal&lt;/strong&gt; for compute orchestration and &lt;strong&gt;Vector Databases&lt;/strong&gt; (specifically analyzing &lt;strong&gt;Pinecone&lt;/strong&gt; and &lt;strong&gt;Qdrant&lt;/strong&gt;) for high-dimensional storage. To facilitate a rigorous examination of these technologies, we introduce a fictional yet realistic application scenario: &lt;strong&gt;"DocuVerse,"&lt;/strong&gt; a decentralized technical documentation aggregator. This simulation involves the ingestion of millions of technical documents, requiring a pipeline that is robust, scalable, and cost-efficient.&lt;/p&gt;

&lt;p&gt;Our analysis extends beyond simple implementation details to explore second-order implications: the graph-theoretical properties of web crawling (the "Matrix Link"), the economics of ephemeral GPU compute, and the nuances of distributed state management in a stateless environment. Furthermore, bridging the gap between deep engineering and public communication, the report concludes with a comprehensive LinkedIn content strategy, including visual "card" designs and a conceptual mind map of the application, designed to communicate these complex architectures to a professional audience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part I: The Paradigm Shift in Search Infrastructure
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 The Evolution of Retrieval: From Keywords to Vectors
&lt;/h3&gt;

&lt;p&gt;To understand the necessity of the architectures proposed in this report, one must first appreciate the fundamental limitations of the systems they replace. For decades, the industry standard for search was the &lt;strong&gt;Inverted Index&lt;/strong&gt;—a data structure mapping unique terms to the documents containing them (e.g., Apache Lucene, Elasticsearch). While highly efficient for exact keyword matching, inverted indices suffer from "lexical gap": they cannot match a query for "automobile" to a document containing "car" unless explicitly synonymized.&lt;/p&gt;

&lt;p&gt;The advent of Transformer-based language models (BERT, RoBERTa, and later GPT) introduced &lt;strong&gt;Vector Embeddings&lt;/strong&gt;. In this paradigm, text is transformed into a high-dimensional vector (often 768 to 1536 dimensions) where semantic meaning is encoded in the geometric distance between points. "Car" and "Automobile" end up in the same neighborhood of this vector space.1&lt;/p&gt;

&lt;p&gt;This shift changes the fundamental resource requirements of the indexing pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU to GPU Shift:&lt;/strong&gt; Inverted indexing is I/O and CPU bound (tokenization). Vector indexing is compute-bound, requiring matrix multiplications best performed on GPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Throughput Sensitivity:&lt;/strong&gt; The embedding model is a bottleneck. Processing millions of documents through a deep neural network requires massive parallelization that single-server architectures cannot provide.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage Complexity:&lt;/strong&gt; Storing and searching millions of dense vectors requires specialized Approximate Nearest Neighbor (ANN) algorithms (like HNSW), which have different memory and disk IOPS profiles compared to traditional B-Trees.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1.2 The Infrastructure Dilemma: Burstiness vs. Provisioning
&lt;/h3&gt;

&lt;p&gt;Mass indexing events—such as the initial ingestion of a new dataset or a full re-indexing after an embedding model update—are characterized by extreme burstiness.&lt;/p&gt;

&lt;p&gt;Consider a documentation platform that crawls the web. For 23 hours a day, traffic is minimal (incremental updates). For 1 hour, a major new library release might trigger a crawl of 100,000 pages.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Provisioned Capacity (e.g., EC2/Kubernetes):&lt;/strong&gt; If you provision for the peak, you pay for idle GPUs 95% of the time. If you provision for the average, the peak load causes massive latency spikes, violating Service Level Agreements (SLAs).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traditional Serverless (e.g., AWS Lambda):&lt;/strong&gt; While scalable, these services often lack GPU support, have restrictive timeouts (15 minutes), and suffer from "cold starts" that make loading large ML models (often gigabytes in size) too slow for real-time responsiveness.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1.3 The Modal Solution
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Modal&lt;/strong&gt; has emerged as a specialized cloud platform designed to solve these specific discrepancies. Unlike general-purpose serverless platforms, Modal is optimized for data-intensive and AI workloads. Its architecture allows for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Container Lifecycle Management:&lt;/strong&gt; Modal separates the container image definition from the execution. It employs advanced caching and lazy-loading techniques to launch containers in milliseconds, even those with heavy dependencies like PyTorch or TensorFlow.1&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPU Ephemerality:&lt;/strong&gt; Functions can request specific GPU hardware (e.g., NVIDIA A10G, H100) on a per-invocation basis. The billing model is per-second of usage, enabling a "scale-to-zero" architecture where the cost of a massive GPU cluster is incurred only during the minutes it is actually crunching data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distributed Primitives:&lt;/strong&gt; Modal provides native distributed data structures (Queues, Dicts) that allow functions to coordinate state without needing an external Redis or message bus.2&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This report validates Modal as the foundational compute layer for "DocuVerse," demonstrating how it orchestrates the complex dance of crawling, embedding, and indexing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part II: The Fictional Use Case: "DocuVerse"
&lt;/h2&gt;

&lt;p&gt;To ground our architectural decisions in reality, we define the specifications of &lt;strong&gt;DocuVerse&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Mission and Scope
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;DocuVerse&lt;/strong&gt; is a "Universal Documentation Search Engine" for developers. It aggregates technical documentation from:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Official Sources:&lt;/strong&gt; Python docs, MDN, AWS documentation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Community Sources:&lt;/strong&gt; Stack Overflow archives, GitHub Wikis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decentralized Web:&lt;/strong&gt; Technical whitepapers hosted on IPFS/Arweave.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is to provide a single search bar that retrieves the most relevant technical answers using RAG, regardless of where the information lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Dataset Specifications (Fictional Data)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Value&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Implications&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Documents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5,000,000&lt;/td&gt;
&lt;td&gt;Requires efficient bulk indexing strategies.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average Doc Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4 KB (approx. 800 tokens)&lt;/td&gt;
&lt;td&gt;Fits within standard embedding context windows; chunking may be minimal.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Update Velocity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~200,000 docs/day&lt;/td&gt;
&lt;td&gt;Incremental indexing must be robust.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Dimensions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,536 (OpenAI Ada-002 compatible)&lt;/td&gt;
&lt;td&gt;Standard high-fidelity dimensionality.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Index Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~30 GB (Vectors + Metadata)&lt;/td&gt;
&lt;td&gt;Fits in memory for some DBs, requires disk-offload for others.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt; 200ms (Search), &amp;lt; 15 min (Index Freshness)&lt;/td&gt;
&lt;td&gt;Tight constraints on the ingestion pipeline.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.3 The "Matrix Link" Requirement
&lt;/h3&gt;

&lt;p&gt;Beyond simple text search, DocuVerse aims to implement a &lt;strong&gt;"PageRank-for-Code"&lt;/strong&gt; algorithm. It must construct a graph of how documentation pages link to each other (e.g., how many pages link to the React &lt;code&gt;useEffect&lt;/code&gt; hook documentation?). This "Matrix Link" 3 will be used to boost the relevance of authoritative pages during vector retrieval. This adds a complexity layer: the crawler must not just extract text, but also preserve the adjacency matrix of the web graph.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part III: Architecting the Distributed Crawler on Modal
&lt;/h2&gt;

&lt;p&gt;The ingestion layer is the gateway to the system. Building a crawler that can handle 5 million pages without getting blocked, crashing, or entering infinite loops requires a sophisticated distributed architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 The Producer-Consumer Pattern using &lt;code&gt;modal.Queue&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;In a monolithic script, crawling is a recursive function: &lt;code&gt;visit(url) -&amp;gt; find_links() -&amp;gt; visit(links)&lt;/code&gt;. In a serverless environment, deep recursion leads to stack overflows or timeout errors. We must flatten this recursion into a &lt;strong&gt;Queue-Based Architecture&lt;/strong&gt;.2&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architecture Design:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Frontier Queue:&lt;/strong&gt; A &lt;code&gt;modal.Queue&lt;/code&gt; named &lt;code&gt;crawl-frontier&lt;/code&gt;. This persistent queue holds the URLs waiting to be visited. It acts as the buffer between the discovery of work and the execution of work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Seed Injector:&lt;/strong&gt; A scheduled function (&lt;code&gt;@app.function(schedule=modal.Cron(...))&lt;/code&gt;) 5 that runs periodically (e.g., every morning at 02:00 UTC) to push known "root" URLs (e.g., &lt;code&gt;https://docs.python.org/3/&lt;/code&gt;) into the Frontier Queue. This kickstarts the process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Fetcher Swarm:&lt;/strong&gt; A set of worker functions that &lt;code&gt;pop()&lt;/code&gt; items from the queue. This is where Modal's auto-scaling shines. We can configure the Fetcher to scale between 0 and 500 concurrent containers depending on the queue length.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why Not modal.map?&lt;/p&gt;

&lt;p&gt;While modal.map allows parallel execution over a list, it is static. It expects the list of inputs to be known beforehand. A crawler is dynamic—parsing Page A reveals Page B and C. The Queue pattern is essential here because it allows the workload to expand dynamically during runtime.5&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 State Management: The Deduplication Matrix
&lt;/h3&gt;

&lt;p&gt;To prevent infinite loops (Page A links to B, B links to A) and to ensure we don't waste compute crawling the same page twice, we need a shared state of visited URLs.&lt;/p&gt;

&lt;p&gt;The Distributed Dictionary:&lt;/p&gt;

&lt;p&gt;We employ modal.Dict as a shared key-value store accessible by all 500 fetcher containers simultaneously.2&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Key:&lt;/strong&gt; The URL (normalized).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Value:&lt;/strong&gt; A metadata object containing &lt;code&gt;timestamp&lt;/code&gt;, &lt;code&gt;hash&lt;/code&gt; (for content change detection), and &lt;code&gt;status&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Consistency Challenge:&lt;/p&gt;

&lt;p&gt;In a high-concurrency environment, a race condition exists: two workers might pop the same URL or discover the same link simultaneously. modal.Dict provides atomicity guarantees for operations, ensuring that visited.put_if_absent(url) is thread-safe across the distributed cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 The "Matrix Link" Construction
&lt;/h3&gt;

&lt;p&gt;As referenced in the research 3, the structure of the web is an adjacency matrix. Most crawlers discard this structure, keeping only the content. DocuVerse preserves it.&lt;/p&gt;

&lt;p&gt;Implementation:&lt;/p&gt;

&lt;p&gt;When the Fetcher parses a page, it extracts two distinct datasets:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Content:&lt;/strong&gt; The text for vectorization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Edges:&lt;/strong&gt; A list of outbound links.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These edges are pushed to a secondary &lt;code&gt;link_matrix_queue&lt;/code&gt;. A separate aggregator function reads this queue and builds a sparse matrix representation of the documentation graph. This matrix is later used to calculate "Authority Scores" for each document, which will be stored as metadata in the Vector Database. This approach leverages Graph Neural Network (GNN) concepts where the link structure informs the semantic importance of the node.4&lt;/p&gt;

&lt;h3&gt;
  
  
  3.4 Handling Politeness and Anti-Bot Measures
&lt;/h3&gt;

&lt;p&gt;A naive crawler scaling to 500 containers will resemble a DDoS attack to the target server. We must implement &lt;strong&gt;Politeness Sharding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Sharded Queue Strategy:&lt;/p&gt;

&lt;p&gt;Instead of one global queue, we logically partition the work by domain.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Worker Type A: Processes &lt;code&gt;*.github.io&lt;/code&gt; (Concurrency Limit: 5).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Worker Type B: Processes &lt;code&gt;*.readthedocs.io&lt;/code&gt; (Concurrency Limit: 10).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Worker Type C: General Web (Concurrency Limit: 100).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Modal, this is achieved by defining different Functions with different &lt;code&gt;concurrency_limit&lt;/code&gt; decorators, all consuming from filtered views of the main queue or separate domain-specific queues. This ensures that while the &lt;em&gt;aggregate&lt;/em&gt; throughput of DocuVerse is high, the &lt;em&gt;per-domain&lt;/em&gt; impact remains respectful of &lt;code&gt;robots.txt&lt;/code&gt; etiquette.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part IV: The Processing Core: Embeddings &amp;amp; GPU Orchestration
&lt;/h2&gt;

&lt;p&gt;Once the raw HTML is secured, the pipeline shifts from network-bound (crawling) to compute-bound (embedding). This is the most expensive phase of the operation and where Modal's value proposition is strongest.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 The Container Loading Advantage
&lt;/h3&gt;

&lt;p&gt;In traditional container orchestration (like Kubernetes), adding a new GPU node and pulling a Docker image containing a 5GB PyTorch model can take several minutes. This latency makes it difficult to react to a sudden influx of 50,000 documents.&lt;/p&gt;

&lt;p&gt;Modal solves this with a highly optimized container runtime.1&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image Snapshotting:&lt;/strong&gt; The file system of the container (including the installed Python packages and the model weights) is snapshot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lazy Loading:&lt;/strong&gt; When a function is invoked, Modal mounts this snapshot over the network. Data is read on-demand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; A container capable of running a BERT-large model can boot in under 2 seconds.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Implication for DocuVerse:&lt;/p&gt;

&lt;p&gt;This allows us to treat the Embedding Function as a purely on-demand resource. We do not need to keep a "warm pool" of GPU servers running. If the crawler finds a new pocket of documentation, Modal instantly spins up 50 GPU containers to process it and shuts them down the second the queue is empty.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Batching Strategy for Throughput
&lt;/h3&gt;

&lt;p&gt;GPUs are throughput devices, not latency devices. Sending one document at a time to a GPU is inefficient due to the overhead of moving data from CPU RAM to GPU VRAM.&lt;/p&gt;

&lt;p&gt;The Batcher Pattern:&lt;/p&gt;

&lt;p&gt;We insert a "buffer" function between the Crawler and the Embedder.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Crawler:&lt;/strong&gt; Pushes text chunks to &lt;code&gt;embedding_input_queue&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Batcher:&lt;/strong&gt; A lightweight CPU function that pulls from the queue and accumulates items until it reaches a batch size of 128 or a timeout of 500ms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dispatcher:&lt;/strong&gt; The Batcher sends the &lt;code&gt;List&lt;/code&gt; (batch of 128) to the GPU Embedding Function.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensures that every time we pay for a GPU cycle, we are utilizing its matrix multiplication cores to their maximum capacity.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 Model Selection and Quantization
&lt;/h3&gt;

&lt;p&gt;For DocuVerse, we have two primary options for embeddings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API-Based (e.g., OpenAI):&lt;/strong&gt; Simple to implement but costly at scale ($0.10 per million tokens can add up with 5 million docs re-indexed weekly).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Self-Hosted (e.g., &lt;code&gt;multilingual-e5-large&lt;/code&gt;):&lt;/strong&gt; Running open-source models on Modal's GPUs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We choose the &lt;strong&gt;Self-Hosted&lt;/strong&gt; approach for this architecture to demonstrate the capability. We utilize the &lt;code&gt;multilingual-e5-large&lt;/code&gt; model, which provides state-of-the-art performance for technical text.6&lt;/p&gt;

&lt;p&gt;Quantization:&lt;/p&gt;

&lt;p&gt;To reduce the memory footprint in the Vector Database and speed up search, we apply Scalar Quantization (converting 32-bit floats to 8-bit integers) within the embedding function. This reduces the index size by 4x with minimal loss in retrieval accuracy (Recall@10).&lt;/p&gt;




&lt;h2&gt;
  
  
  Part V: The Vector Database Layer: Storage and Indexing
&lt;/h2&gt;

&lt;p&gt;The vectors produced by our GPU workers need a home. We analyze two leading contenders, &lt;strong&gt;Pinecone&lt;/strong&gt; and &lt;strong&gt;Qdrant&lt;/strong&gt;, and how they integrate into this serverless pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Pinecone: The Serverless Standard
&lt;/h3&gt;

&lt;p&gt;Pinecone's recent "Serverless" offering 7 aligns perfectly with our architecture. Unlike their previous "Pod-based" model where users provisioned capacity, the serverless model decouples storage from compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Separation of Concerns:&lt;/strong&gt; Vectors are stored in blob storage (S3-compatible) and loaded into the index only when needed. This means we can store 5 million vectors cheaply, even if we rarely search the "long tail" of the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mass Indexing via Object Storage:&lt;/strong&gt; For the initial load of DocuVerse (the "Bootstrap" phase), pushing vectors one by one via API is too slow. Pinecone allows &lt;strong&gt;bulk import from object storage&lt;/strong&gt;.8 Our Modal pipeline can write Parquet files to an S3 bucket, and Pinecone can ingest them asynchronously. This is the fastest and most cost-effective way to build the initial index.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Integration Strategy:&lt;/p&gt;

&lt;p&gt;We use a Hybrid Search index. We store both the dense vector (from the GPU model) and a sparse vector (BM25) for keyword matching. This ensures that if a user searches for a specific error code (e.g., "Error 503"), the keyword match takes precedence over semantic similarity.9&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Qdrant: The High-Performance Alternative
&lt;/h3&gt;

&lt;p&gt;Qdrant offers a different value proposition. It is open-source and can be run as a managed cloud service or self-hosted.&lt;/p&gt;

&lt;p&gt;HNSW Graph Construction:&lt;/p&gt;

&lt;p&gt;Qdrant uses the Hierarchical Navigable Small World (HNSW) algorithm.9 Constructing this graph is computationally expensive.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Insight:&lt;/strong&gt; During mass indexing, inserting vectors and updating the graph in real-time destroys performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Optimization:&lt;/strong&gt; We configure the Qdrant client to disable "optimization" (graph re-balancing) during the bulk upload. Once the upload is complete, we trigger a forced optimization. This reduces total indexing time by approximately 60%.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LangChain Integration:&lt;/p&gt;

&lt;p&gt;Qdrant has deep integration with LangChain.11 We can leverage the QdrantVectorStore class to handle metadata filtering out of the box. For DocuVerse, metadata is crucial.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Filter Example: filter={"project": "react", "version": "18.0"}.&lt;/p&gt;

&lt;p&gt;This allows the search engine to respect the structure of the documentation sets.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5.3 The DocuVerse Decision
&lt;/h3&gt;

&lt;p&gt;For the primary architecture, we select &lt;strong&gt;Pinecone Serverless&lt;/strong&gt; for the production index due to its zero-maintenance elasticity. However, we utilize &lt;strong&gt;Qdrant&lt;/strong&gt; (running ephemerally in a Modal Sandbox) for testing and development pipelines, allowing developers to run the full stack locally without incurring cloud costs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VI: Retrieval and Integration (RAG)
&lt;/h2&gt;

&lt;p&gt;The ultimate consumer of our index is the RAG pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 The LangChain Orchestrator
&lt;/h3&gt;

&lt;p&gt;We use LangChain to wire the components together.11&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User Query:&lt;/strong&gt; "How do I mount a volume in Modal?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Query Embedding:&lt;/strong&gt; The query is sent to the &lt;em&gt;same&lt;/em&gt; Embedding Function (hosted on Modal) used for indexing. This ensures the query vector and document vectors are in the same latent space.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieval:&lt;/strong&gt; LangChain queries Pinecone with the vector + filters (e.g., "only show me docs updated in the last year").&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Re-Ranking:&lt;/strong&gt; To improve precision, we fetch 50 candidates and pass them through a Cross-Encoder model (also hosted on Modal) to re-rank them. This is more expensive but guarantees higher relevance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Synthesis:&lt;/strong&gt; The top 5 chunks are passed to GPT-4 via the OpenAI API to generate the answer.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  6.2 The "Matrix Link" Boost
&lt;/h3&gt;

&lt;p&gt;Here, our earlier graph work pays off. When retrieving results, we apply a boosting factor based on the "Authority Score" calculated during the crawl.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Score Formula:&lt;/em&gt; &lt;code&gt;Final_Score = (Vector_Similarity * 0.8) + (PageRank_Score * 0.2)&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This ensures that the "official" documentation page (which has many incoming links) ranks higher than a random forum post (which has few), even if the forum post has slightly higher semantic similarity.4&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part VII: Operational Resilience and Observability
&lt;/h2&gt;

&lt;p&gt;Building a distributed system on fictional data is easy; running it in production is hard.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 The Dead Letter Queue (DLQ)
&lt;/h3&gt;

&lt;p&gt;In a system processing millions of items, 0.1% will fail. The HTML might be malformed; the embedding model might encounter a token limit.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; We define a &lt;code&gt;dlq_queue&lt;/code&gt; in Modal.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mechanism:&lt;/strong&gt; Wrap the processing logic in a &lt;code&gt;try/except&lt;/code&gt; block. On exception, serialize the input + the error traceback and push it to the DLQ.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recovery:&lt;/strong&gt; A separate "Janitor" function runs daily to inspect the DLQ. It can either retry the jobs (if the error was transient, like a network timeout) or alert a human.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7.2 Idempotency and Determinism
&lt;/h3&gt;

&lt;p&gt;The pipeline must be idempotent. If a worker crashes after writing to Pinecone but before acknowledging the queue message, the message will be re-delivered.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Solution:&lt;/strong&gt; We generate Document IDs deterministically using a hash of the URL (&lt;code&gt;sha256(url)&lt;/code&gt;). If we try to write the same document to Pinecone twice, the second write simply overwrites the first with identical data. No duplicates are created.13&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7.3 Cost Monitoring
&lt;/h3&gt;

&lt;p&gt;To prevent "wallet-denial-of-service", we implement budget guards.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Token Counting:&lt;/strong&gt; We track the total tokens processed by the Embedding Function.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Circuit Breaker:&lt;/strong&gt; If the daily spend exceeds a threshold (e.g., $50), the &lt;code&gt;seed_injector&lt;/code&gt; function is disabled, pausing new crawls until the next billing cycle or manual override.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part VIII: LinkedIn Content Strategy &amp;amp; Visuals
&lt;/h2&gt;

&lt;p&gt;To effectively communicate the sophistication of the DocuVerse architecture to a professional network, we need a content strategy that bridges the gap between high-level value and low-level engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  8.1 The "Hook" and Narrative
&lt;/h3&gt;

&lt;p&gt;Headline: "How I Built a 'Google for Code' Indexing 5 Million Pages for &amp;lt;$50."&lt;/p&gt;

&lt;p&gt;Narrative Arc:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Villain:&lt;/strong&gt; The "Idle Resource". Identifying the waste in traditional provisioned clusters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Hero:&lt;/strong&gt; The "Serverless Trinity" (Modal + Pinecone + LangChain).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Climax:&lt;/strong&gt; The "Mass Indexing Event"—scaling from 0 to 500 GPUs in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Resolution:&lt;/strong&gt; A predictable, low-cost bill and a high-performance search engine.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  8.2 Card Suggestions (Visual Assets)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Card 1: The "Cold Start" Myth&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual:&lt;/strong&gt; A stopwatch comparing "Standard Docker" (2 min) vs. "Modal Snapshot" (1.5 sec).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text:&lt;/strong&gt; "Serverless GPUs used to be too slow for real-time AI. Not anymore. Container snapshotting changes the physics of cold starts." 1&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Card 2: The Architecture Map&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual Strategy:&lt;/strong&gt; Instead of a static image, use this flow diagram to illustrate the "Producer-Consumer" decoupling that enables scale.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Diagram:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Snippet de código&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    subgraph Ingestion ["Ingestion Layer (CPU)"]
        Seed(Seed Injector) --&amp;gt; Frontier[Frontier Queue]
        Frontier --&amp;gt; Crawler
        Crawler --&amp;gt;|HTML| Parser
        Crawler --&amp;gt;|Links| Frontier
    end

    subgraph Processing ["Processing Layer (GPU)"]
        Parser --&amp;gt;|Text Chunks| BatchQueue[Embedding Queue]
        BatchQueue --&amp;gt; Batcher
        Batcher --&amp;gt;|Batch of 128| Embedder
        Embedder --&amp;gt;|Vectors| VectorBuffer
    end

    subgraph Storage
        VectorBuffer --&amp;gt;|Bulk Import| S3
        S3 --&amp;gt;|Async Ingest| Pinecone
        Crawler -.-&amp;gt;|Deduplication| Dict
    end

    subgraph Retrieval ["Interaction Layer"]
        User --&amp;gt;|Query| API
        API --&amp;gt;|Embed Query| Embedder
        API --&amp;gt;|Search| Pinecone
        Pinecone --&amp;gt;|Results| RAG
        RAG --&amp;gt; User
    end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Card 3: The "Matrix Link"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual:&lt;/strong&gt; A network graph with nodes glowing. One central node is brighter.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text:&lt;/strong&gt; "Vectors aren't enough. We mapped the adjacency matrix of 5 million docs to boost 'Authority' alongside 'Similarity'. This is RAG + Graph Theory." 3&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Card 4: The Cost Curve&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual:&lt;/strong&gt; A graph showing a flat line (Cost) overlaying a spiky line (Traffic), compared to a blocky "Provisioned" cost line.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Text:&lt;/strong&gt; "Stop paying for air. Scale to zero means your infrastructure bill hits $0.00 when your users sleep."&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8.3 Application Mind Map
&lt;/h3&gt;

&lt;p&gt;The following mind map illustrates the four pillars of the DocuVerse engine: Ingestion, Processing, Memory, and Interaction.&lt;/p&gt;

&lt;p&gt;Snippet de código&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mindmap
  root((DocuVerse&amp;lt;br/&amp;gt;Engine))
    Ingestion
      Crawler Swarm
        Politeness Sharding
        Deduplication
      Frontier Queue
      Seed Injector
    Processing
      HTML Parser
      Graph Builder
        Matrix Link
      Batcher
      Embedder
        Model: e5-large
        Quantization: 8-bit
    Memory
      Pinecone Serverless
      S3 Bucket
      DLQ Error Handler
    Interaction
      API Endpoint
      LangChain Orchestrator
      RAG Pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part IX: Comparison Data and Fictional Metrics
&lt;/h2&gt;

&lt;p&gt;To further illustrate the efficiency of this architecture, we present fictional performance data derived from the "DocuVerse" simulation.&lt;/p&gt;

&lt;h3&gt;
  
  
  9.1 Cost Comparison: Serverless vs. Provisioned
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Component&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Architecture A: Kubernetes (EKS) + P3 Instances&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Architecture B: DocuVerse (Modal + Pinecone)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Savings&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute (Crawler)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$450/mo (3 nodes always on)&lt;/td&gt;
&lt;td&gt;$42/mo (Pay per CPU-second)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute (GPU)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$2,200/mo (p3.2xlarge reserved)&lt;/td&gt;
&lt;td&gt;$150/mo (A10G spot, burst usage)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;93%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$300/mo (Managed Instance)&lt;/td&gt;
&lt;td&gt;$45/mo (Serverless Usage-Based)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DevOps Labor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10 hrs/mo (Cluster maintenance)&lt;/td&gt;
&lt;td&gt;1 hr/mo (Config tweaks)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Monthly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$2,950&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$237&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~92%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Table 1: Monthly operational cost projection for indexing 5M documents with daily updates.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  9.2 Throughput Metrics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Operation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Crawling Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,200 pages/sec&lt;/td&gt;
&lt;td&gt;Scaled to 300 concurrent containers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4,500 docs/sec&lt;/td&gt;
&lt;td&gt;Utilizing 50 concurrent A10G GPUs with batch size 128.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Indexing Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10,000 vectors/sec&lt;/td&gt;
&lt;td&gt;Bulk upsert to Pinecone via S3 import.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cold Start Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.8 seconds&lt;/td&gt;
&lt;td&gt;Time to boot fresh container + load model weights.1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Table 2: Performance benchmarks observed during the "MegaCorp" documentation ingestion simulation.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The "DocuVerse" case study illustrates a powerful truth about modern data engineering: &lt;strong&gt;Architecture is the new Optimization.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the past, optimizing a search engine meant writing faster C++ code to tokenize strings. Today, it means composing the right set of serverless primitives to handle the physics of data movement and model inference.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Modal&lt;/strong&gt; provides the elastic compute fabric, solving the "bursty" nature of crawling and embedding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vector Databases&lt;/strong&gt; like Pinecone and Qdrant provide the semantic storage layer, solving the retrieval problem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Graph Theory&lt;/strong&gt; (the Matrix Link) provides the relevance signal, solving the authority problem.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By treating the cloud not as a collection of servers, but as a single, programmable computer, engineers can build systems that are orders of magnitude more efficient—both in cost and performance—than their predecessors. The era of the "Serverless Semantic Engine" is here, and it is accessible to any developer willing to embrace these new paradigms.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: DocuVerse Reference Implementation
&lt;/h2&gt;

&lt;p&gt;This section provides the reference source code for the core logic of the "DocuVerse" engine. The application is structured as a Modal package.&lt;/p&gt;

&lt;h3&gt;
  
  
  A.1 &lt;code&gt;src/common.py&lt;/code&gt; - Shared Structures
&lt;/h3&gt;

&lt;p&gt;Defines the data models and shared configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="c1"&gt;# Constants
&lt;/span&gt;&lt;span class="n"&gt;QUEUE_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-frontier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;DICT_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-visited&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;EMBED_QUEUE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;LINK_MATRIX_QUEUE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-matrix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;doc_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VectorRecord&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A.2 &lt;code&gt;src/crawler.py&lt;/code&gt; - The Distributed Fetcher
&lt;/h3&gt;

&lt;p&gt;Implements the Producer-Consumer pattern with &lt;code&gt;modal.Queue&lt;/code&gt; and the Matrix Link extraction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;QUEUE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DICT_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EMBED_QUEUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LINK_MATRIX_QUEUE&lt;/span&gt;

&lt;span class="c1"&gt;# Define the container image with necessary scraping libraries
&lt;/span&gt;&lt;span class="n"&gt;crawler_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;pip_install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;beautifulsoup4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-crawler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Persistent State
&lt;/span&gt;&lt;span class="n"&gt;frontier_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;QUEUE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_if_missing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;visited_db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DICT_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_if_missing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;embed_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EMBED_QUEUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_if_missing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;matrix_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LINK_MATRIX_QUEUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_if_missing&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;crawler_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;concurrency_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;

    &lt;span class="c1"&gt;# Idempotency check
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;visited_db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 1. Extract Content
&lt;/span&gt;        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;
        &lt;span class="n"&gt;doc_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# 2. Extract Matrix Links (Graph Edges)
&lt;/span&gt;        &lt;span class="n"&gt;links&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="n"&gt;normalized_links&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;links&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="c1"&gt;# Simplified logic
&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# Truncate for demo
&lt;/span&gt;            &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;normalized_links&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;doc_hash&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crawler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 3. Mark as visited
&lt;/span&gt;        &lt;span class="n"&gt;visited_db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# 4. Dispatch for Processing
&lt;/span&gt;        &lt;span class="c1"&gt;# Push content to embedding queue
&lt;/span&gt;        &lt;span class="n"&gt;embed_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Push edges to matrix calculator queue
&lt;/span&gt;        &lt;span class="n"&gt;matrix_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;targets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;normalized_links&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# 5. Expand Frontier
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;normalized_links&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;visited_db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;frontier_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to crawl &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Cron&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 2 * * *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;seed_injector&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Daily job to restart the crawl from root nodes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;roots&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://docs.python.org/3/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://react.dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;roots&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;frontier_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A.3 &lt;code&gt;src/embedder.py&lt;/code&gt; - GPU Batch Processing
&lt;/h3&gt;

&lt;p&gt;Uses &lt;code&gt;modal.cls&lt;/code&gt; to maintain the model state (weights) in GPU memory between invocations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VectorRecord&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EMBED_QUEUE&lt;/span&gt;

&lt;span class="c1"&gt;# Define a GPU-enabled image with PyTorch and Transformers
&lt;/span&gt;&lt;span class="n"&gt;gpu_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pip_install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transformers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-embedder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.cls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A10G&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gpu_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_idle_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__enter__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;
        &lt;span class="c1"&gt;# Load model once when container starts (Cold Start optimization)
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;intfloat/multilingual-e5-large&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@modal.method&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate dense vectors
&lt;/span&gt;        &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;emb&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;VectorRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;doc_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;emb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;batch_coordinator&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Reads from queue, batches items, and sends to GPU.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;embed_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EMBED_QUEUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ModelService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Fetch items with a short timeout
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embed_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_many&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BATCH_SIZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

            &lt;span class="c1"&gt;# Invoke GPU function
&lt;/span&gt;            &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embed_batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remote&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# TODO: Send vectors to Pinecone/Qdrant
&lt;/span&gt;            &lt;span class="c1"&gt;# pinecone_upload.remote(vectors)
&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A.4 &lt;code&gt;src/vector_db.py&lt;/code&gt; - Pinecone Integration
&lt;/h3&gt;

&lt;p&gt;Demonstrates the bulk upload strategy via S3 (Conceptual code).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-vectordb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bulk_upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parquet_file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

    &lt;span class="c1"&gt;# 1. Upload Parquet to S3
&lt;/span&gt;    &lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-ingest-bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;imports/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parquet_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parquet_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Trigger Pinecone Import
&lt;/span&gt;    &lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docuverse-prod&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Start async import
&lt;/span&gt;    &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;integration_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3-integration-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bulk import started.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>programming</category>
      <category>ai</category>
      <category>python</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Engineering Manual for Fine-Tuning Gemini 2.5 Pro on Vertex AI: Architecture, Implementation, and Operationalization at Scale</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Mon, 08 Dec 2025 16:41:00 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/engineering-manual-for-fine-tuning-gemini-25-pro-on-vertex-ai-architecture-implementation-and-3ehf</link>
      <guid>https://forem.com/lucash_ribeiro_dev/engineering-manual-for-fine-tuning-gemini-25-pro-on-vertex-ai-architecture-implementation-and-3ehf</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;1. Introduction: The New Era of Multimodal Generative Model Specialization&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Generative artificial intelligence crossed a critical threshold with the introduction of the Gemini 2.5 model family by Google. This iteration represents not just an incremental increase in parameter count or pre-training data diversity, but a fundamental shift in the cognitive architecture of Large Language Models (LLMs). &lt;strong&gt;Gemini 2.5 Pro&lt;/strong&gt;, positioned as the "workhorse" model for complex enterprise applications, introduces native capabilities for &lt;strong&gt;adaptive thinking&lt;/strong&gt; and multimodal reasoning that redefine the state of the art.1&lt;/p&gt;

&lt;p&gt;However, for solution architects and machine learning engineers operating in mission-critical environments, the base model—however sophisticated—is rarely the final product. The need for strict adherence to formats, specific domain terminology, regulatory compliance, and complex agent behaviors necessitates a refinement process known as &lt;strong&gt;Supervised Fine-Tuning (SFT)&lt;/strong&gt;.4&lt;/p&gt;

&lt;p&gt;This technical report constitutes an exhaustive analysis and a step-by-step methodology for performing fine-tuning on the Gemini 2.5 Pro model using the Google Cloud Vertex AI platform. Unlike superficial documentation, this document delves into architectural nuances, necessary data engineering, production-grade code implementation, and the MLOps (Machine Learning Operations) strategies required to host and consume these models at a global scale.&lt;/p&gt;

&lt;p&gt;The complexity of fine-tuning Gemini 2.5 Pro is exacerbated by its nature as a "thinking model." Technical documentation and release notes suggest a subtle interaction: during SFT, the model learns to mimic the desired output, which often allows dispensing with the extensive thinking process that consumes tokens and latency. This creates a scenario where supervised training effectively "short-circuits" explicit reasoning in favor of standardized efficiency.5 Understanding this dynamic is vital for optimizing the cost-benefit ratio and latency in production.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;2. Theoretical and Architectural Foundation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before manipulating code, it is imperative to understand the theoretical substrate upon which Gemini 2.5 fine-tuning operates. Vertex AI abstracts the physical infrastructure, but engineering decisions depend on understanding what happens behind the scenes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.1. The Gemini 2.5 Pro Model: Specifications and Capabilities&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Gemini 2.5 Pro was released as a stable version in June 2025.7 It stands out for significant improvements in coding, mathematical reasoning, and image understanding, along with a massive context window.&lt;/p&gt;

&lt;p&gt;| Specification | Technical Detail | Implication for Fine-Tuning |&lt;/p&gt;

&lt;p&gt;| :---- | :---- | :---- |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Context Window&lt;/strong&gt; | ~1M tokens (input) | While it supports ~1M in inference, fine-tuning on Vertex AI currently limits training examples to &lt;strong&gt;131,072 tokens&lt;/strong&gt;.5 Larger examples are truncated. |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Knowledge Cutoff&lt;/strong&gt; | January 2025 4 | The model is unaware of events post-Jan/2025. SFT is not the ideal method for inserting new factual knowledge (use RAG for this); SFT should focus on &lt;em&gt;style&lt;/em&gt;, &lt;em&gt;format&lt;/em&gt;, and &lt;em&gt;behavior&lt;/em&gt;. |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Thinking Mode&lt;/strong&gt; | Dynamic/Adaptive 2 | The model decides when to "think." In SFT, it is recommended to &lt;strong&gt;disable&lt;/strong&gt; or minimize this budget to avoid conflict between latent reasoning and adjusted weights.5 |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Modalities&lt;/strong&gt; | Text, Image, Audio, Video | Current SFT supports multimodal inputs, but this report focuses on textual and logical tuning, the basis of most enterprise applications.5 |&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.2. The Mechanics of PEFT and LoRA on Vertex AI&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The "fine-tuning" process available on Vertex AI is not a traditional &lt;em&gt;Full Fine-Tuning&lt;/em&gt; where all billions of model weights are updated. Instead, it utilizes &lt;strong&gt;Parameter-Efficient Fine-Tuning (PEFT)&lt;/strong&gt;, specifically the &lt;strong&gt;Low-Rank Adaptation (LoRA)&lt;/strong&gt; technique.4&lt;/p&gt;

&lt;p&gt;In LoRA, the original pre-trained model weights ($W_0$) are frozen. Training injects pairs of low-rank decomposition matrices ($A$ and $B$) into the transformer layers. Weight updates are represented as $\Delta W = B \times A$. During inference, the result is $W_{new} = W_0 + \Delta W$.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does this matter for the engineer?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage Efficiency:&lt;/strong&gt; We do not save an entire copy of Gemini 2.5 Pro. We save only the "adapters" (a few megabytes or gigabytes).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multitenancy:&lt;/strong&gt; A single base model can serve multiple dynamically swapped adapters per request, reducing infrastructure costs.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hyperparameter Adapter Size:&lt;/strong&gt; This parameter, configurable in Vertex AI (values 1, 2, 4, 8 for Pro), defines the rank ($r$) of the matrices. A larger $r$ allows learning more complex patterns but increases the risk of overfitting on small datasets.5&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.3. Vertex AI Platform vs. Google AI Studio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It is crucial to distinguish between Google AI Studio and Vertex AI for fine-tuning purposes. Historically, AI Studio offered a simplified interface. However, Google has deprecated fine-tuning support for newer models (like Gemini 1.5 Flash/Pro and 2.5 series) directly via the Gemini API in AI Studio, migrating it exclusively to &lt;strong&gt;Vertex AI&lt;/strong&gt;.8&lt;/p&gt;

&lt;p&gt;Vertex AI offers a managed infrastructure that provides granular control over:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Sovereignty:&lt;/strong&gt; ensuring training data and the adapted model remain in specific geographic regions (e.g., us-central1, europe-west4).6  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MLOps Pipeline:&lt;/strong&gt; Integration with Vertex AI Experiments for metric tracking and model versioning.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;3. Environment Preparation and Google Cloud Infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Success in a fine-tuning job depends on a solid infrastructure foundation. Permission errors or quota misconfigurations are the most common causes of failure before training even begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.1. Project and API Configuration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It is recommended to isolate the fine-tuning environment in a dedicated GCP project to facilitate cost control and access auditing.&lt;/p&gt;

&lt;p&gt;Step 1: Activate APIs  &lt;/p&gt;

&lt;p&gt;The following APIs are mandatory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;aiplatform.googleapis.com (Vertex AI API): The core of the operation.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;storage.googleapis.com (Google Cloud Storage): For storing datasets and artifacts.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;iam.googleapis.com: For identity management.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 2: Region Configuration  &lt;/p&gt;

&lt;p&gt;Region choice is non-trivial. Gemini 2.5 Pro and the accelerators required for its tuning are not available in all Google Cloud regions. Supported regions for tuning typically include us-central1 and europe-west4.6 Attempting to start a job in an unsupported region will result in a resource unavailability error.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.2. Identity and Access Management (IAM)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Service Account (SA) executing the training pipeline needs specific permissions.10&lt;/p&gt;

&lt;p&gt;| IAM Role | Technical Justification |&lt;/p&gt;

&lt;p&gt;| :---- | :---- |&lt;/p&gt;

&lt;p&gt;| roles/aiplatform.user | Allows creating training jobs, models, and endpoints in Vertex AI. |&lt;/p&gt;

&lt;p&gt;| roles/storage.objectAdmin | Allows reading the JSONL dataset and writing logs/artifacts to the staging bucket. |&lt;/p&gt;

&lt;p&gt;| roles/serviceusage.serviceUsageConsumer | Allows the account to consume project API quota. |&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.3. Quota Verification&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Fine-tuning consumes highly contested accelerator resources. Even though the service is managed, there is a project-level quota called Global concurrent tuning jobs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Verification:&lt;/strong&gt; Access "IAM &amp;amp; Admin" -&amp;gt; "Quotas" and filter by "Vertex AI" and "Tuning".  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Default:&lt;/strong&gt; New projects often have this quota set to 0 or 1 concurrent job.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Action:&lt;/strong&gt; Request a quota increase in advance if planning multiple parallel experiments.4&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.4. Python SDK Installation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The environment must have the latest version of the SDK to support Gemini 2.5 classes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;


&lt;span class="se"&gt;\#&lt;/span&gt; Critical update &lt;span class="k"&gt;for &lt;/span&gt;Gemini 2.5 support and SFT features  

pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="nt"&gt;-upgrade&lt;/span&gt; google-cloud-aiplatform google-auth google-cloud-storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python Environment Initialization:&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aiplatform&lt;/span&gt;



&lt;span class="c1"&gt;# Project Constants  
&lt;/span&gt;
&lt;span class="n"&gt;PROJECT&lt;/span&gt;\&lt;span class="n"&gt;_ID&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-gcp-project-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="n"&gt;REGION&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Mandatory region for Gemini 2.5 tuning availability \[6\]  
&lt;/span&gt;
&lt;span class="n"&gt;STAGING&lt;/span&gt;\&lt;span class="n"&gt;_BUCKET&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gs://your-staging-bucket-logs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;



&lt;span class="c1"&gt;# SDK Initialization  
&lt;/span&gt;
&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PROJECT&lt;/span&gt;\&lt;span class="n"&gt;_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt;\&lt;span class="n"&gt;_bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STAGING&lt;/span&gt;\&lt;span class="n"&gt;_BUCKET&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vertex AI SDK version &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;aiplatform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;\&lt;span class="n"&gt;_&lt;/span&gt;\&lt;span class="n"&gt;_version&lt;/span&gt;\&lt;span class="n"&gt;_&lt;/span&gt;\&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; initialized.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;4. Data Engineering: The Heart of Fine-Tuning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data quality, consistency, and formatting are the single most important determinants of fine-tuning success. A noisy dataset will result in a model that hallucinates, regardless of the training epochs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.1. JSONL Format and Message Structure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Vertex AI strictly requires the dataset to be provided in &lt;strong&gt;JSON Lines (.jsonl)&lt;/strong&gt; format. Each line is a valid, independent JSON object representing a full training session, following the chat "messages" pattern.5&lt;/p&gt;

&lt;p&gt;Required Canonical Structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Common Formatting Errors:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inconsistent System Prompt:&lt;/strong&gt; If you use a system prompt in training ("You are a finance expert..."), you must use &lt;em&gt;exactly the same&lt;/em&gt; system prompt during inference.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-turn vs. Single-turn:&lt;/strong&gt; Gemini supports multi-turn chat. If training a chatbot that maintains context, your JSONL examples should contain the conversation history (User -&amp;gt; Model -&amp;gt; User -&amp;gt; Model).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.2. Data Quality and Volume Strategy&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Vertex AI documentation and market practice suggest clear guidelines for data volume:&lt;/p&gt;

&lt;p&gt;| Dataset Size | Expectation |&lt;/p&gt;

&lt;p&gt;| :---- | :---- |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;1 - 50 examples&lt;/strong&gt; | Insufficient for SFT. Better to use &lt;em&gt;Few-Shot Prompting&lt;/em&gt;. SFT here risks rapid overfitting. |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;100 - 500 examples&lt;/strong&gt; | The &lt;strong&gt;"Sweet Spot"&lt;/strong&gt; for most style and format adaptation tasks.5 The model generalizes the pattern without memorizing content. |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;&amp;gt; 1,000 examples&lt;/strong&gt; | Necessary for teaching new languages (e.g., DSLs), complex reasoning tasks, or very specific knowledge domains. |&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.3. Data Validation Script&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before uploading to Cloud Storage, it is vital to validate the dataset locally.&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;



&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate&lt;/span&gt;\&lt;span class="nf"&gt;_jsonl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;\&lt;span class="n"&gt;_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;valid&lt;/span&gt;\&lt;span class="n"&gt;_count&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;\&lt;span class="n"&gt;_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;\&lt;span class="n"&gt;_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Check 1: 'messages' key  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Line &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;\&lt;span class="n"&gt;_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: Missing &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; key.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;\&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Check 2: Roles  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;roles&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; \&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;\&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;roles&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Line &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;\&lt;span class="n"&gt;_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: Must contain at least one &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; and one &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; message.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Check 3: Non-empty content  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Line &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;\&lt;span class="n"&gt;_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: Empty content detected.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;valid&lt;/span&gt;\&lt;span class="n"&gt;_count&lt;/span&gt; \&lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Line &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;\&lt;span class="n"&gt;_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: Invalid JSON.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; errors in dataset:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;\&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;\&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validation successful. &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;\&lt;span class="n"&gt;_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; valid examples.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;



&lt;span class="c1"&gt;# Usage  
&lt;/span&gt;
&lt;span class="c1"&gt;# validate\_jsonl("my\_train\_dataset.jsonl")
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;5. Executing Fine-Tuning: Code and Hyperparameters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We utilize the vertexai.tuning.sft module, which is the standard programmatic interface for this task.6&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.1. Defining the Base Model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Use the correct version tag.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Target Model: gemini-2.5-pro-001 (or the latest versioned tag).  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; Avoid generic aliases if strict reproducibility is required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.2. Training Code (SFT Pipeline)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vertexai.tuning&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sft&lt;/span&gt;



&lt;span class="c1"&gt;# Job Configuration  
&lt;/span&gt;
&lt;span class="n"&gt;BASE&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-pro-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="n"&gt;TRAIN&lt;/span&gt;\&lt;span class="n"&gt;_DATASET&lt;/span&gt;\&lt;span class="n"&gt;_URI&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gs://your-bucket-ml/gemini-tuning/v1/train.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="n"&gt;VALIDATION&lt;/span&gt;\&lt;span class="n"&gt;_DATASET&lt;/span&gt;\&lt;span class="n"&gt;_URI&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gs://your-bucket-ml/gemini-tuning/v1/validation.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="n"&gt;TUNED&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;\&lt;span class="n"&gt;_DISPLAY&lt;/span&gt;\&lt;span class="n"&gt;_NAME&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-pro-finance-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;



&lt;span class="c1"&gt;# Hyperparameter Configuration  
&lt;/span&gt;
&lt;span class="n"&gt;EPOCHS&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="n"&gt;ADAPTER&lt;/span&gt;\&lt;span class="n"&gt;_SIZE&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;&lt;span class="c1"&gt;# Supported values for Pro: 1, 2, 4, 8  
&lt;/span&gt;
&lt;span class="n"&gt;LEARNING&lt;/span&gt;\&lt;span class="n"&gt;_RATE&lt;/span&gt;\&lt;span class="n"&gt;_MULTIPLIER&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;\&lt;span class="n"&gt;_fine&lt;/span&gt;\&lt;span class="n"&gt;_tuning&lt;/span&gt;\&lt;span class="nf"&gt;_job&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Starting SFT job for model &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Create and submit the Job  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# sft.train initiates the managed pipeline on Vertex AI  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;sft&lt;/span&gt;\&lt;span class="n"&gt;_tuning&lt;/span&gt;\&lt;span class="n"&gt;_job&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sft&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;\&lt;span class="n"&gt;_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;\&lt;span class="n"&gt;_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TRAIN&lt;/span&gt;\&lt;span class="n"&gt;_DATASET&lt;/span&gt;\&lt;span class="n"&gt;_URI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt;\&lt;span class="n"&gt;_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VALIDATION&lt;/span&gt;\&lt;span class="n"&gt;_DATASET&lt;/span&gt;\&lt;span class="n"&gt;_URI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EPOCHS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;\&lt;span class="n"&gt;_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ADAPTER&lt;/span&gt;\&lt;span class="n"&gt;_SIZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;learning&lt;/span&gt;\&lt;span class="n"&gt;_rate&lt;/span&gt;\&lt;span class="n"&gt;_multiplier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LEARNING&lt;/span&gt;\&lt;span class="n"&gt;_RATE&lt;/span&gt;\&lt;span class="n"&gt;_MULTIPLIER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;tuned&lt;/span&gt;\&lt;span class="n"&gt;_model&lt;/span&gt;\&lt;span class="n"&gt;_display&lt;/span&gt;\&lt;span class="n"&gt;_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TUNED&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;\&lt;span class="n"&gt;_DISPLAY&lt;/span&gt;\&lt;span class="n"&gt;_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Region is inferred from vertexai.init  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sft&lt;/span&gt;\&lt;span class="n"&gt;_tuning&lt;/span&gt;\&lt;span class="n"&gt;_job&lt;/span&gt;



&lt;span class="c1"&gt;# Execute  
&lt;/span&gt;
&lt;span class="c1"&gt;# tuning\_job \= run\_fine\_tuning\_job()
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;5.3. Deep Dive into Hyperparameters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;| Hyperparameter | Technical Impact and Recommendations |&lt;/p&gt;

&lt;p&gt;| :---- | :---- |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Epochs&lt;/strong&gt; | Defines how many times the model sees the dataset. • &lt;strong&gt;Few (&amp;lt;3):&lt;/strong&gt; &lt;em&gt;Underfitting&lt;/em&gt;. • &lt;strong&gt;Many (&amp;gt;10):&lt;/strong&gt; &lt;em&gt;Overfitting&lt;/em&gt;. • &lt;strong&gt;Recommendation:&lt;/strong&gt; Start with 3-5. |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Adapter Size (LoRA Rank)&lt;/strong&gt; | Defines the dimensionality of trainable matrices. • &lt;strong&gt;Size 1 or 4:&lt;/strong&gt; Ideal for simple tasks (formatting, tone). • &lt;strong&gt;Size 8:&lt;/strong&gt; Necessary for complex tasks requiring reasoning. • &lt;strong&gt;Note:&lt;/strong&gt; Pro supports 1, 2, 4, 8.5 |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Learning Rate Multiplier&lt;/strong&gt; | Scales the default optimizer rate. • &lt;strong&gt;1.0:&lt;/strong&gt; Safe default. • &lt;strong&gt;&amp;lt;1.0:&lt;/strong&gt; Use if the base model is already performing well and only needs slight adjustment. |&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.4. Monitoring and Polling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The script should monitor the state to ensure the process completes successfully.11&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monitor&lt;/span&gt;\&lt;span class="n"&gt;_tuning&lt;/span&gt;\&lt;span class="nf"&gt;_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;has&lt;/span&gt;\&lt;span class="n"&gt;_ended&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;refresh&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Status: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; \&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUCCEEDED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Training completed successfully\!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model Resource Name: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tuned&lt;/span&gt;\&lt;span class="n"&gt;_model&lt;/span&gt;\&lt;span class="n"&gt;_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Endpoint (Auto-Deploy): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tuned&lt;/span&gt;\&lt;span class="n"&gt;_model&lt;/span&gt;\&lt;span class="n"&gt;_endpoint&lt;/span&gt;\&lt;span class="n"&gt;_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tuned&lt;/span&gt;\&lt;span class="n"&gt;_model&lt;/span&gt;\&lt;span class="n"&gt;_endpoint&lt;/span&gt;\&lt;span class="n"&gt;_name&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Job FAILED. Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;6. Hosting, Deployment, and Inference Optimization&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Where is the model after the job SUCCEEDED? How is it served?&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.1. The Vertex AI Endpoint Concept&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In Vertex AI, you do not "download" the tuned Gemini 2.5 Pro model. The base model is proprietary and massive. Instead, your LoRA adapters are saved in the Model Registry.  &lt;/p&gt;

&lt;p&gt;When you deploy (which the SFT job often does automatically), Vertex AI provisions an Endpoint. An Endpoint is a managed service URL pointing to compute infrastructure that loads Gemini 2.5 Pro + Your Adapters.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.2. Consuming the Model via Python SDK&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To consume the model, instantiate the GenerativeModel class pointing to the &lt;strong&gt;Endpoint Resource Name&lt;/strong&gt;.6&lt;/p&gt;

&lt;p&gt;Endpoint Resource Name Format:  &lt;/p&gt;

&lt;p&gt;projects/{PROJECT_NUMBER}/locations/{REGION}/endpoints/{ENDPOINT_ID}&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generative&lt;/span&gt;\&lt;span class="n"&gt;_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GenerationConfig&lt;/span&gt;



&lt;span class="c1"&gt;# Replace with the value returned by monitor\_tuning\_job or from Console  
&lt;/span&gt;
&lt;span class="n"&gt;TUNED&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;\&lt;span class="n"&gt;_ENDPOINT&lt;/span&gt;\&lt;span class="n"&gt;_RESOURCE&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/123456789012/locations/us-central1/endpoints/11223344556677&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;\&lt;span class="n"&gt;_with&lt;/span&gt;\&lt;span class="n"&gt;_tuned&lt;/span&gt;\&lt;span class="nf"&gt;_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;\&lt;span class="n"&gt;_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sending prompt to: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;TUNED&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;\&lt;span class="n"&gt;_ENDPOINT&lt;/span&gt;\&lt;span class="n"&gt;_RESOURCE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Instantiate model pointing to the tuned endpoint  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# The SDK routes this to your adapter  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TUNED&lt;/span&gt;\&lt;span class="n"&gt;_MODEL&lt;/span&gt;\&lt;span class="n"&gt;_ENDPOINT&lt;/span&gt;\&lt;span class="n"&gt;_RESOURCE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# Generation Config: The Thinking Budget Paradox  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# For SFT models, documentation  recommends disabling thinking  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# or setting it to minimum, as SFT teaches the direct answer.  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;generation&lt;/span&gt;\&lt;span class="n"&gt;_config&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerationConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;\&lt;span class="n"&gt;_output&lt;/span&gt;\&lt;span class="n"&gt;_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# If supported by the specific SDK version for the model:  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="c1"&gt;# thinking\_config={"include\_thoughts": False}  
&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;\&lt;span class="nf"&gt;_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;\&lt;span class="n"&gt;_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;generation&lt;/span&gt;\&lt;span class="n"&gt;_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generation&lt;/span&gt;\&lt;span class="n"&gt;_config&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Inference Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;



&lt;span class="c1"&gt;# Real Test  
&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the following financial report focusing on EBITDA:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predict&lt;/span&gt;\&lt;span class="n"&gt;_with&lt;/span&gt;\&lt;span class="n"&gt;_tuned&lt;/span&gt;\&lt;span class="nf"&gt;_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;---------------- RESPONSE \----------------&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;6.3. The "Thinking Budget" Paradox in SFT Models&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A critical finding for this report is the behavior of Gemini 2.5 Pro regarding its "thinking budget" when subjected to supervised fine-tuning.&lt;/p&gt;

&lt;p&gt;Gemini 2.5 Pro is a "thinking" model. However, SFT trains the model to map directly Input -&amp;gt; Desired Output. If you keep "thinking mode" enabled with a high token budget, the model tries to "reason" its way to a response it has already memorized via training. This can cause:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Increased Latency and Cost:&lt;/strong&gt; Paying for useless thinking tokens.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality Degradation:&lt;/strong&gt; The model may "overthink" and diverge from the strict format you taught it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Therefore, best engineering practice is to &lt;strong&gt;zero out or minimize&lt;/strong&gt; the thinking budget for SFT endpoints.5&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;7. Evaluation and Quality Assurance (QA)&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7.1. Manual AB Testing (Qualitative)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Create a "Side-by-Side" evaluation script sending the same prompt to both the base model and the tuned model.&lt;/p&gt;

&lt;p&gt;| Test Prompt | Base Model Response (Gemini 2.5 Pro) | Tuned Model Response | Engineer Analysis |&lt;/p&gt;

&lt;p&gt;| :---- | :---- | :---- | :---- |&lt;/p&gt;

&lt;p&gt;| "Analyze contract X." | Generic response, academic tone. | Technical response, cites specific local laws, senior legal tone. | &lt;strong&gt;Success:&lt;/strong&gt; Adoption of persona and domain knowledge. |&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7.2. Automatic Evaluation with Gen AI Evaluation Service&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Vertex AI offers the &lt;em&gt;Gen AI Evaluation&lt;/em&gt; service. You can use an LLM as a "Judge" to evaluate your tuned model's responses.6&lt;/p&gt;

&lt;p&gt;Metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coherence:&lt;/strong&gt; Does the answer make logical sense?  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Instruction Following:&lt;/strong&gt; Did it follow format constraints (JSON, XML)?  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;h2&gt;
  
  
  &lt;strong&gt;Safety:&lt;/strong&gt; Did it generate toxic content?
&lt;/h2&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;8. MLOps and Production Considerations&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;8.1. Troubleshooting Common Errors&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ResourceExhausted Error:&lt;/strong&gt; You hit the concurrent job quota. Cancel old jobs or request a quota increase.4  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;InvalidArgument in Dataset:&lt;/strong&gt; Usually means an example exceeds the 131k token limit or the JSONL is malformed.5  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Safety Filters:&lt;/strong&gt; Fine-tuning does not remove native safety filters. If your domain is sensitive (medical/legal), you may need to adjust harm_category settings in GenerationConfig.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;8.2. Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Fine-tuning Gemini 2.5 Pro on Vertex AI is a powerful tool for transforming a generalist model into a domain specialist. The secret lies not in the Python code—which is relatively simple thanks to the SDK—but in rigorous &lt;strong&gt;Data-Centric AI&lt;/strong&gt; engineering and the correct management of hyperparameters and inference budgets. By following this guide, engineers can deploy generative AI solutions that are not only impressive but robust, auditable, and ready for the enterprise environment.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;References&lt;/strong&gt;
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Gemini 2.5 Pro – Vertex AI - Google Cloud Console, acessado em dezembro 8, 2025, &lt;a href="https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemini-2.5-pro" rel="noopener noreferrer"&gt;https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemini-2.5-pro&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gemini thinking | Gemini API - Google AI for Developers, acessado em dezembro 8, 2025, &lt;a href="https://ai.google.dev/gemini-api/docs/thinking" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/thinking&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gemini 2.5 on Vertex AI: Pro, Flash &amp;amp; Model Optimizer Live | Google Cloud Blog, acessado em dezembro 8, 2025, &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-pro-flash-on-vertex-ai" rel="noopener noreferrer"&gt;https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-pro-flash-on-vertex-ai&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gemini 2.5 Pro | Generative AI on Vertex AI - Google Cloud Documentation, acessado em dezembro 8, 2025, &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;About supervised fine-tuning for Gemini models | Generative AI on Vertex AI | Google Cloud Documentation, acessado em dezembro 8, 2025, &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tune Gemini models by using supervised fine-tuning | Generative AI on Vertex AI, acessado em dezembro 8, 2025, &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Release notes | Gemini API - Google AI for Developers, acessado em dezembro 8, 2025, &lt;a href="https://ai.google.dev/gemini-api/docs/changelog" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/changelog&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fine-tuning with the Gemini API - Google AI for Developers, acessado em dezembro 8, 2025, &lt;a href="https://ai.google.dev/gemini-api/docs/model-tuning" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/model-tuning&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tuning API | Generative AI on Vertex AI - Google Cloud Documentation, acessado em dezembro 8, 2025, &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/tuning" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/tuning&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;googleapis/python-aiplatform: A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning. - GitHub, acessado em dezembro 8, 2025, &lt;a href="https://github.com/googleapis/python-aiplatform" rel="noopener noreferrer"&gt;https://github.com/googleapis/python-aiplatform&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fine-tune Generative AI models with Vertex AI Supervised Fine-tuning, acessado em dezembro 8, 2025, &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-tuning-basic" rel="noopener noreferrer"&gt;https://docs.cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-tuning-basic&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to use Google Vertex AI fine tuned model via Node.js - Stack Overflow, acessado em dezembro 8, 2025, &lt;a href="https://stackoverflow.com/questions/78738829/how-to-use-google-vertex-ai-fine-tuned-model-via-node-js" rel="noopener noreferrer"&gt;https://stackoverflow.com/questions/78738829/how-to-use-google-vertex-ai-fine-tuned-model-via-node-js&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>gemini</category>
      <category>llm</category>
      <category>architecture</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The Model Context Protocol (MCP): A Foundational Standard for Agentic AI Systems</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Tue, 28 Oct 2025 18:24:17 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/the-model-context-protocol-mcp-a-foundational-standard-for-agentic-ai-systems-4dg</link>
      <guid>https://forem.com/lucash_ribeiro_dev/the-model-context-protocol-mcp-a-foundational-standard-for-agentic-ai-systems-4dg</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Abstract&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This paper presents an exhaustive analysis of the Model Context Protocol (MCP), an open standard that represents a paradigm shift from ad-hoc integrations to a standardized, secure, and scalable communication layer, essential for the development of robust, production-grade agentic AI systems. MCP is designed to address the intrinsic limitations of Large Language Models (LLMs), such as static knowledge and a propensity for "hallucinations," by providing a universal language for them to interact with external tools, data, and services. This work details the protocol's tripartite architecture (Host, Client, and Server), its operation over JSON-RPC 2.0, and its fundamental primitives. Furthermore, it offers a significant practical contribution by providing two comprehensive implementation tutorials for creating MCP servers, one using Python with Pydantic and another advancing to Protocol Buffers for high-performance use cases. The analysis culminates in a critical examination of production considerations, including security, scalability, and performance, positioning MCP as an architectural pillar for the next generation of AI applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Introduction: Bridging the Context Gap in Modern AI&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1.1. The Challenge of Grounding Large Language Models in Reality&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Large Language Models (LLMs) have revolutionized natural language processing, but their capabilities are inherently limited by the nature of their training. An LLM's knowledge is static, a snapshot of the vast dataset on which it was trained, rendering it incapable of accessing real-time information or events that occurred after its cutoff date.1 This fundamental limitation leads to factual inaccuracies, commonly referred to as "hallucinations," where the model generates plausible but incorrect information.1 Moreover, without access to the outside world, LLMs are unable to perform meaningful real-world tasks, such as querying a database, sending an email, or interacting with an API.&lt;/p&gt;

&lt;p&gt;The pre-MCP integration landscape was characterized by a tangle of custom, brittle connections. Connecting $M$ models to $N$ tools required creating $M \times N$ bespoke integrations, a complexity problem that resulted in massive technical debt and an unsustainable maintenance overhead.3 Each new tool or model demanded significant engineering effort, hindering innovation and scalability. This bottleneck became particularly acute with the rise of "agentic AI"—systems designed to pursue goals and take actions autonomously on behalf of a user.5 The absence of a standard communication protocol was a primary barrier to the development and reliable deployment of these intelligent agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1.2. Introducing the Model Context Protocol as a Standardized Solution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Model Context Protocol (MCP) was introduced by Anthropic as an open standard to solve precisely these challenges.1 It provides a universal and standardized "language" for LLMs to communicate securely and bidirectionally with external tools, data sources, and services.1 The primary goal of MCP is to transform LLMs from static information processors into dynamic agents capable of retrieving current information, interacting with external systems, and executing concrete actions.1&lt;/p&gt;

&lt;p&gt;Architecturally, MCP collapses the $M \times N$ complexity integration problem to a linear complexity of $M + N$. Instead of each model needing a custom connector for each tool, each model integrates a single MCP client, and each tool is encapsulated by a single MCP server. This modular and standardized approach functions as a "USB-C for AI," allowing any compliant model to connect to any compliant tool without the need for custom integration code.3 The standard has gained rapid industry adoption, with major players like OpenAI, Microsoft, and Google, and a growing ecosystem of open-source connectors, attesting to its importance and effectiveness.3&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1.3. Thesis and Structure of this Paper&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The central thesis of this paper is that MCP is not merely an incremental improvement over existing function-calling techniques, but rather a fundamental architectural standard that enables the creation of secure, composable, and scalable AI systems. The adoption of MCP reflects a crucial maturation in the field of AI engineering, marking the transition from the "magic demo" phase, characterized by clever but fragile prompt engineering, to an era that demands robust, reliable, and maintainable systems. MCP manifests the application of proven software engineering principles—such as standard protocols, separation of concerns, and modularity—to the domain of LLM integration.&lt;/p&gt;

&lt;p&gt;To substantiate this thesis, this paper is structured as follows: it begins with a conceptual analysis, positioning MCP relative to other methodologies like RAG and orchestration frameworks. This is followed by a deep dive into the protocol's architecture. The core of the paper consists of two practical implementation tutorials of increasing complexity. Subsequently, a critical examination of production-level challenges, including security, scalability, and performance, is conducted. The paper concludes with a discussion of the protocol's future directions.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Fundamental Concepts and Comparative Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.1. From Prompt Crafting to Systemic Context Engineering&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Initial interaction with LLMs was dominated by "Prompt Engineering," the art of crafting the immediate instruction to guide the model to produce the desired output.11 However, this approach has significant limitations. A perfectly worded prompt is useless if the model lacks the necessary information (the context) to act on it correctly.11 This led to the evolution towards "Context Engineering," a broader discipline that focuses on designing and managing the entire informational environment available to the LLM at any given moment.13&lt;/p&gt;

&lt;p&gt;Prompt Engineering is, therefore, a subset of Context Engineering.13 While the former focuses on &lt;em&gt;what to tell&lt;/em&gt; the model, the latter is concerned with &lt;em&gt;what the model knows&lt;/em&gt; when the instruction is given. MCP is a primary tool for Context Engineering. It provides the structured and reliable mechanism to programmatically manage what the model "knows" by connecting it to external sources of truth and action capabilities.15 It allows developers to build systems, not just prompts, ensuring the LLM operates with relevant, up-to-date, and accurate information.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.2. Situating MCP: A Comparative Analysis with RAG and Orchestration Frameworks (ReAct/LangChain)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To fully understand MCP's role, it is crucial to distinguish it from other prominent technologies in the AI ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP vs. Retrieval-Augmented Generation (RAG):&lt;/strong&gt; RAG is a technique designed to augment LLM prompts with relevant knowledge retrieved from external data sources at query time. It is ideal for handling large volumes of &lt;em&gt;unstructured, text-rich knowledge&lt;/em&gt;, such as internal documents, articles, or knowledge bases.1 RAG enhances the model's knowledge base. In contrast, MCP is a communication protocol for bidirectional, structured interaction with &lt;em&gt;tools and services&lt;/em&gt;. It allows the LLM not only to retrieve specific data but also to execute actions, such as querying a real-time database or calling an API to perform a task.1&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP vs. ReAct/LangChain:&lt;/strong&gt; Frameworks like LangChain and patterns like ReAct (Reasoning and Acting) are &lt;em&gt;orchestration frameworks&lt;/em&gt; that define an agent's reasoning cycle (Thought, Action, Observation) within a single application process.17 They provide the control logic for the agent's "brain." MCP, on the other hand, is not an orchestration framework; it is a &lt;em&gt;communication protocol&lt;/em&gt; that standardizes the "Action" step. It decouples the agent's reasoning logic from the tool's implementation.17 Essentially, LangChain operates at the application layer, while MCP operates at the transport and integration layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synergy:&lt;/strong&gt; These technologies are not mutually exclusive; they are highly synergistic. An advanced workflow might involve an orchestrator like LangChain using the ReAct pattern. The agent might first use RAG to retrieve background documents from a knowledge base to understand the general context. Then, based on the retrieved information, it could use MCP to query a live API or database for real-time data and execute a specific action.16&lt;/p&gt;

&lt;p&gt;The following table provides a clear comparison to help engineers and architects select the appropriate technology for their use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Comparative Analysis of AI Integration Methodologies&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Methodology&lt;/th&gt;
&lt;th&gt;Primary Function&lt;/th&gt;
&lt;th&gt;Information Type&lt;/th&gt;
&lt;th&gt;Architectural Coupling&lt;/th&gt;
&lt;th&gt;Key Advantage&lt;/th&gt;
&lt;th&gt;Ideal Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Communication protocol for interaction with tools and services.&lt;/td&gt;
&lt;td&gt;Structured, real-time data, actions.&lt;/td&gt;
&lt;td&gt;Low (decoupled via client-server).&lt;/td&gt;
&lt;td&gt;Interoperability, security, scalability.&lt;/td&gt;
&lt;td&gt;Agents that need to execute actions (e.g., booking a reservation, querying an order database).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Augments LLM knowledge with retrieved data.&lt;/td&gt;
&lt;td&gt;Unstructured, text-rich, static or dynamic.&lt;/td&gt;
&lt;td&gt;Medium (retrieval logic is coupled with generation).&lt;/td&gt;
&lt;td&gt;Reduction of hallucinations, access to proprietary knowledge.&lt;/td&gt;
&lt;td&gt;Customer support chatbots answering based on an internal knowledge base.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ReAct/LangChain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Orchestration framework for the agent's reasoning cycle.&lt;/td&gt;
&lt;td&gt;Control logic, task state.&lt;/td&gt;
&lt;td&gt;High (agent logic and tool execution are in the same process).&lt;/td&gt;
&lt;td&gt;Rapid agent development, abstraction of complex logic.&lt;/td&gt;
&lt;td&gt;Building the control logic for agents performing multi-step tasks.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. A Deep Architectural Analysis of the Model Context Protocol&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The architecture of MCP is deliberately designed to enforce a strict separation of concerns, which is fundamental to its security and scalability. It is not just a client-server model but a federated, security-focused architecture where the Host acts as the "brain" and security gatekeeper, the Client as a communication "channel," and the Server as a sandboxed "tool."&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.1. The Tripartite Architecture: Roles of Host, Client, and Server&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The protocol is built around three core components that work in concert to facilitate secure and efficient communication.1&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP Host:&lt;/strong&gt; The Host is the main AI application the user interacts with, such as an IDE (e.g., Cursor), a chat interface (e.g., Claude.ai), or another agentic application.6 It acts as the central orchestrator, responsible for managing the overall user session, aggregating context from multiple clients, and, crucially, applying security and consent policies.22 The full conversation history resides exclusively on the Host, ensuring that individual servers do not have access to sensitive information beyond what is necessary for their tasks.22
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Client:&lt;/strong&gt; The Client resides within the Host and acts as the communication bridge to a single MCP Server.1 There is a one-to-one (1:1) relationship between a client and a server, which reinforces isolation.6 The client's responsibilities include establishing and managing the connection to its corresponding server, handling protocol negotiation (discussed below), and routing messages bidirectionally.22
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Server:&lt;/strong&gt; The Server is an external program that provides context or capabilities to the Host. It encapsulates a specific tool, database, API, or other data source.1 Servers are designed to be lightweight, composable, and focused on a single responsibility, promoting a microservices design.22 They can run locally on the same machine as the Host or remotely on a different machine, communicating over different transport layers.8&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture directly embodies the Principle of Least Privilege. By keeping the full session context on the Host and ensuring servers are isolated from each other and only receive the information necessary for a single request, the design fundamentally mitigates risks like the "confused deputy" problem and prevents a single compromised server from exposing the entire AI session.8 It is an architecture designed from the ground up to operate in a zero-trust environment, where individual servers are not inherently trusted.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.2. The Communication Backbone: JSON-RPC 2.0 and Transport Layers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Communication between MCP clients and servers is built on the JSON-RPC 2.0 standard.1 This protocol defines a simple structure for requests, responses, and notifications using JSON, which ensures interoperability across different programming languages and platforms.23&lt;/p&gt;

&lt;p&gt;MCP supports two primary transport layers to accommodate different deployment scenarios 1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard Input/Output (stdio):&lt;/strong&gt; This method is primarily used for servers that run locally as child processes of the Host. It offers low-latency, synchronous communication, ideal for tools that access the local file system or other resources on the same machine.1
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP + Server-Sent Events (SSE) / Streamable HTTP:&lt;/strong&gt; For remote servers, MCP utilizes HTTP-based protocols. Initially, SSE was the standard to allow servers to push real-time updates to clients. More recently, the protocol has evolved to support "Streamable HTTP," a more scalable, bidirectional model that uses chunked transfer encoding over a single HTTP connection. This evolution is crucial for cloud and serverless deployments (e.g., AWS Lambda), as it avoids the long-lived connections of SSE, which can be problematic in corporate network environments and ephemeral infrastructures.9&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.3. Fundamental Primitives: The Building Blocks of Context&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Servers expose their capabilities through a set of standardized "primitives." These are the types of context a server can offer to the Host.7&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools:&lt;/strong&gt; These are executable functions that the LLM can invoke. Servers expose a list of available tools (via tools/list), and the client can request the execution of one with specific arguments (via tools/call).21
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources:&lt;/strong&gt; These represent structured or unstructured data sources that the LLM can access. This could be the schema of a database, the content of a file, or the results of a query.21
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts:&lt;/strong&gt; These are reusable workflow templates or few-shot examples that the server can provide to guide the LLM on how to best interact with its tools or resources.7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition to these basic primitives, MCP defines advanced primitives that enable richer, bidirectional interactions, transforming the communication from a simple request-response cycle into a dynamic dialogue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sampling:&lt;/strong&gt; This powerful primitive allows a &lt;em&gt;server&lt;/em&gt; to request an LLM completion from the &lt;em&gt;client&lt;/em&gt;.21 This is extremely useful for servers that need LLM reasoning but should not hold their own API keys or model logic. It keeps model access, selection, billing, and security centralized on the Host, which is controlled by the user.9
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elicitation:&lt;/strong&gt; This allows a server to pause its execution and request additional information or clarification from the user via the Host.9 This facilitates interactive, "human-in-the-loop" workflows where user intervention is required to proceed with a complex task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.4. Protocol Lifecycle Management and Capability Negotiation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;MCP sessions are stateful, meaning the connection between a client and a server persists and has a defined lifecycle. This lifecycle begins with a crucial initialization handshake.21&lt;/p&gt;

&lt;p&gt;When a client connects to a server, it must first send an initialize request. In this request, the client announces the protocol versions it supports and the capabilities it offers (e.g., "I support the sampling primitive"). The server then responds with its own list of capabilities and the protocol version it will use for the session.22 If a compatible version cannot be agreed upon, the connection is cleanly terminated.28&lt;/p&gt;

&lt;p&gt;This capability negotiation process is fundamental to the protocol's extensibility and backward compatibility. It allows clients and servers to evolve independently, adding new features that can be discovered and utilized dynamically, without breaking older clients or servers that do not support them.22&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Building an MCP Server: A Step-by-Step Tutorial from Scratch (Python &amp;amp; FastMCP)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This section provides a practical guide to building a functional MCP server using Python, a ubiquitous language in AI and machine learning. We will use FastMCP, a lightweight and modern framework that abstracts away much of the protocol's complexity, allowing developers to focus on their tool's logic.26&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.1. Environment Setup and Project Initialization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;First, set up a Python virtual environment to isolate the project's dependencies.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create and activate a virtual environment:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; python &lt;span class="se"&gt;\-&lt;/span&gt;m venv mcp-env  
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;mcp-env/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Install the necessary libraries: FastMCP for the server and Uvicorn as the ASGI server to run it.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"fastmcp[server]"&lt;/span&gt; uvicorn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create the basic project structure. Create a directory for your project and, inside it, a main file, e.g., main.py.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;mkdir &lt;/span&gt;mcp&lt;span class="se"&gt;\_&lt;/span&gt;weather&lt;span class="se"&gt;\_&lt;/span&gt;server  
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;mcp&lt;span class="se"&gt;\_&lt;/span&gt;weather&lt;span class="se"&gt;\_&lt;/span&gt;server  
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;touch &lt;/span&gt;main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4.2. Defining the Service Contract: Input/Output Schemas with Pydantic&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A core principle of MCP is structured communication. Using schemas to define the inputs and outputs of your tools is crucial for data validation and ensuring robustness.4 FastMCP integrates natively with Pydantic for this purpose.&lt;/p&gt;

&lt;p&gt;In main.py, let's define a Pydantic schema for the input of our weather forecast tool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py  
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WeatherRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Schema for requesting weather information.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;  
    &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The city for which to get the weather forecast.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
    &lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The units for temperature (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;imperial&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4.3. Implementing and Registering a Custom Tool&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now, let's implement the tool's logic and register it with the MCP server. We will use FastMCP's &lt;a class="mentioned-user" href="https://dev.to/server"&gt;@server&lt;/a&gt;.tool decorator.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Import the necessary classes and instantiate the server.
&lt;/li&gt;
&lt;li&gt;Create an asynchronous function that will implement the tool's logic. The function signature will use the Pydantic model we just created to receive typed arguments.
&lt;/li&gt;
&lt;li&gt;Inside the function, you would call a real external API. For this example, we will simulate the call and return mock data.
&lt;/li&gt;
&lt;li&gt;The function's return must be a structured dictionary that MCP can transmit back to the client.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py (continued)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp.server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt;

&lt;span class="c1"&gt;# Assume the API key is in an environment variable
# API_KEY = os.getenv("WEATHER_API_KEY")
&lt;/span&gt;
&lt;span class="c1"&gt;# Create an instance of the MCP server
&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An MCP server to provide weather forecasts.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@server.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_current_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetches the current weather for a specified city.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WeatherRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WeatherRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    The core logic for the weather tool.
    In a real application, this would make an API call.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetching weather for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; units...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# API call simulation
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lisbon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;humidity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cloudy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;humidity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
            },
            {
                &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
                &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: weather_data
            }
        ]
    }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4.4. Exposing Structured Data via the Resource Primitive&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In addition to actionable tools, MCP servers can expose static or dynamic data resources. Let's add a resource that exposes the cities supported by our service. We will use the &lt;a class="mentioned-user" href="https://dev.to/server"&gt;@server&lt;/a&gt;.resource decorator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py (continued)
&lt;/span&gt;&lt;span class="nd"&gt;@server.resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supported_cities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Provides a list of cities with enhanced weather support.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_supported_cities&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Returns a list of supported cities.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lisbon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Porto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Faro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4.5. Complete Server Implementation and Local Testing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now, let's combine everything into a complete main.py file and add the code to run the server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py (final version)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp.server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;

&lt;span class="c1"&gt;# --- Schema Definitions ---
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WeatherRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Schema for requesting weather information.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The city for which to get the weather forecast.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The units for temperature (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;imperial&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Server Instance ---
&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An MCP server to provide weather forecasts.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Tool Definitions ---
&lt;/span&gt;&lt;span class="nd"&gt;@server.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_current_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetches the current weather for a specified city.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WeatherRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WeatherRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The core logic for the weather tool.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetching weather for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; units...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lisbon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;humidity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cloudy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;humidity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;},
            {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: weather_data}
        ]
    }

# --- Resource Definitions ---
@server.resource(
    name=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;supported_cities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    description=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;Provides&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;cities&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;enhanced&lt;/span&gt; &lt;span class="n"&gt;weather&lt;/span&gt; &lt;span class="n"&gt;support&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
)
async def get_supported_cities():
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns a list of supported cities.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    return {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;Lisbon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;Porto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;Faro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]}]}

# --- Entry Point for Execution ---
if __name__ == &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="n"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:
    # FastMCP integrates with Uvicorn to serve the application.
    # FastMCP&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; method handles the protocol initialization logic.
    server.run()
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run your server locally, use the following command in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your MCP server is now running and listening for connections via stdio. An MCP client (like Cursor or a custom client) can now connect to this process to discover and invoke the get_current_weather tool and the supported_cities resource.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. Advanced Schema Definition with Protocol Buffers for High-Performance Servers&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While JSON and Pydantic are excellent for prototyping and many use cases, high-performance and enterprise production environments often demand more efficiency. This section explores the use of Protocol Buffers (Protobuf) as a superior alternative for schema definition and data serialization in MCP systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.1. Rationale for Protobuf in Production MCP Systems&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;JSON, being text-based, has drawbacks in high-load scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Payload Size:&lt;/strong&gt; JSON messages are more verbose than binary formats, consuming more bandwidth.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serialization/Deserialization Speed:&lt;/strong&gt; Parsing text is computationally more intensive than parsing pre-compiled binary formats.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type Validation:&lt;/strong&gt; Type validation occurs at runtime, which can introduce overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Protocol Buffers, a binary serialization format developed by Google, addresses these limitations. It offers smaller payloads, faster processing, and strict schema enforcement through compile-time generated code, making it ideal for high-performance microservices.29 Adopting Protobuf represents a maturation step in an MCP server's implementation, moving it from a prototype to an enterprise-grade solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.2. Creating a .proto Service Definition&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Protobuf workflow begins with defining your services and messages in a .proto file. This file serves as a language-agnostic contract for your data.&lt;/p&gt;

&lt;p&gt;Let's create a bookstore.proto file for a bookstore service. This file will define the RPC (Remote Procedure Call) methods and message structures. Crucially, we will include Google API annotations, which allow the same .proto file to be used for generating gRPC servers and REST gateways, a concept we will extend to generate MCP servers.31&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocol Buffers&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# bookstore.proto
&lt;/span&gt;&lt;span class="n"&gt;syntax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;proto3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;package&lt;/span&gt; &lt;span class="n"&gt;bookstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/api/annotations.proto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;# Option for the generated Go package
&lt;/span&gt;&lt;span class="n"&gt;option&lt;/span&gt; &lt;span class="n"&gt;go_package&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generated/go/bookstore/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;# The Bookstore service definition
&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="n"&gt;BookstoreService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# Gets a book by ID
&lt;/span&gt;  &lt;span class="n"&gt;rpc&lt;/span&gt; &lt;span class="nc"&gt;GetBook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GetBookRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;returns &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;option &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/books/{book_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Creates a new book
&lt;/span&gt;  &lt;span class="n"&gt;rpc&lt;/span&gt; &lt;span class="nc"&gt;CreateBook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CreateBookRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;returns &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;option &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/books&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
      &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;book&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# The Book message structure
&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;Book&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;book_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;author&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;int32&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# The request message for GetBook
&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;GetBookRequest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;book_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# The request message for CreateBook
&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;CreateBookRequest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Book&lt;/span&gt; &lt;span class="n"&gt;book&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;5.3. Automating MCP Server Generation with a Custom protoc Plugin&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The power of the Protobuf ecosystem lies in its compiler, protoc, and its ability to be extended with custom plugins. Let's describe the process of creating a protoc-gen-mcp plugin that reads a .proto file, identifies which RPCs should be exposed as MCP tools, and automatically generates the Python server code. This approach creates a "single source of truth" architecture.31&lt;/p&gt;

&lt;p&gt;Step 1: Define Custom MCP Annotations&lt;br&gt;&lt;br&gt;
First, we extend Protobuf with our own options to mark the RPCs. We create a file mcp_annotations.proto.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocol Buffers&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# mcp_annotations.proto
&lt;/span&gt;&lt;span class="n"&gt;syntax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;proto3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;package&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/protobuf/descriptor.proto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;# Extend method options with our MCP options
&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;protobuf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MethodOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;MCPOptions&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="n"&gt;MCPOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# If true, this RPC method will be exposed as an MCP tool
&lt;/span&gt;  &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;enabled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we can use this annotation in our bookstore.proto:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Protocol Buffers&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# bookstore.proto (updated)
#... (imports and messages as before)
&lt;/span&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp_annotations.proto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="n"&gt;BookstoreService&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;rpc&lt;/span&gt; &lt;span class="nc"&gt;GetBook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GetBookRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;returns &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;option &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/books/{book_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nf"&gt;option &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt; &lt;span class="c1"&gt;# Mark for MCP
&lt;/span&gt;  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;#...
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 2: Plugin Logic (in Go)&lt;br&gt;&lt;br&gt;
The plugin is an executable that reads a CodeGeneratorRequest from protoc via stdin and writes a CodeGeneratorResponse to stdout. The main logic involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parsing the provided .proto file descriptor.
&lt;/li&gt;
&lt;li&gt;Iterating over all services and methods.
&lt;/li&gt;
&lt;li&gt;For each method, checking if it has our (mcp.v1.tool).enabled = true annotation.
&lt;/li&gt;
&lt;li&gt;If the annotation is present, extracting metadata: method name, input message fields (for the tool's parameters), and the output message.
&lt;/li&gt;
&lt;li&gt;Using a templating system (e.g., Go's text/template) to generate the Python server code (similar to our FastMCP example) based on the extracted metadata.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 3: Generation Pipeline&lt;br&gt;&lt;br&gt;
The final workflow is orchestrated by a shell script (generate.sh). This script runs protoc multiple times with different plugins to generate all necessary artifacts from the single .proto file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;# Output directories&lt;/span&gt;
&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./proto
&lt;span class="nv"&gt;GO_OUT_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./generated/go
&lt;span class="nv"&gt;PYTHON_MCP_OUT_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./generated/mcp

&lt;span class="c"&gt;# Run protoc to generate gRPC stubs (Go)&lt;/span&gt;
protoc &lt;span class="nt"&gt;--proto_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--go_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GO_OUT_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--go-grpc_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GO_OUT_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/bookstore.proto

&lt;span class="c"&gt;# Run protoc to generate the REST gateway (using grpc-gateway)&lt;/span&gt;
protoc &lt;span class="nt"&gt;--proto_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--grpc-gateway_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GO_OUT_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/bookstore.proto

&lt;span class="c"&gt;# Run protoc with our custom plugin to generate the MCP server (Python)&lt;/span&gt;
protoc &lt;span class="nt"&gt;--proto_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--plugin&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;protoc-gen-mcp&lt;span class="o"&gt;=&lt;/span&gt;./bin/protoc-gen-mcp &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="nt"&gt;--mcp_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PYTHON_MCP_OUT_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
       &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROTO_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/bookstore.proto

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Code generation complete."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This workflow represents a highly sophisticated software engineering and DevOps practice. Instead of maintaining separate implementations for gRPC, REST, and MCP, a single, version-controlled .proto file defines the canonical service contract. This drastically reduces code duplication, eliminates synchronization issues between interfaces, and enforces consistency across the entire system—an immense benefit for managing complex microservice ecosystems.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;6. Production-Level Considerations: Security, Scalability, and Performance&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Transitioning an MCP prototype to a robust production system requires rigorous attention to security, scalability, and performance. This section details the risks and best practices for deploying MCP in enterprise environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.1. A Taxonomy of MCP Security Risks and Mitigation Strategies&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;MCP's ability to connect LLMs to external systems introduces attack vectors that must be managed proactively. The following table summarizes key vulnerabilities and recommended controls.8&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 2: MCP Security Vulnerabilities and Recommended Controls&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vulnerability&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Affected Component&lt;/th&gt;
&lt;th&gt;Recommended Control(s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Confused Deputy Problem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A server executes actions with its own elevated privileges on behalf of a low-privilege user.&lt;/td&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Implement end-to-end authentication and authorization (OAuth 2.1), ensuring the server acts with the &lt;em&gt;user's&lt;/em&gt; privileges, not its own.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Command Injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On local servers, malicious inputs are executed as operating system commands.&lt;/td&gt;
&lt;td&gt;Server (Local)&lt;/td&gt;
&lt;td&gt;Rigorously validate and sanitize all user inputs. Run local servers in sandboxed environments with minimal privileges.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompt/Tool Injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A malicious user or compromised server tricks the LLM into invoking the wrong tool or performing unintended actions.&lt;/td&gt;
&lt;td&gt;Host, Client, Server&lt;/td&gt;
&lt;td&gt;The Host should allow users to confirm critical actions. Use only trusted, digitally signed servers. Implement SAST/SCA scanning in server development pipelines.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Exfiltration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A malicious server exploits tool calls or the sampling primitive to leak sensitive session data.&lt;/td&gt;
&lt;td&gt;Server, Client&lt;/td&gt;
&lt;td&gt;The Host should strictly control which servers can request sampling. The Client should allow the user to approve or reject sampling requests. Limit data passed to servers to the minimum necessary.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Supply Chain Risks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Use of third-party MCP servers that are malicious, vulnerable, or unmaintained.&lt;/td&gt;
&lt;td&gt;Host&lt;/td&gt;
&lt;td&gt;Use a trusted server registry. Pin server versions and notify users of updates. Require MCP components to be digitally signed by their developers.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.2. Architectural Patterns for Scaling MCP Servers&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To handle high traffic, MCP servers must be designed for horizontal scalability and resilience.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load Balancing:&lt;/strong&gt; A load balancer in front of multiple server instances is essential. For stateful operations, strategies like consistent hashing can be used to maintain session affinity, ensuring requests from the same agent are routed to the same server instance.33
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal Scalability:&lt;/strong&gt; The lightweight, focused design of MCP servers makes them ideal for horizontal scaling. Using container orchestrators like Kubernetes, you can configure the Horizontal Pod Autoscaler (HPA) to automatically add or remove server replicas based on load metrics like requests per second or CPU utilization.33
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed State Management:&lt;/strong&gt; To enable horizontal scaling, servers should be designed to be stateless. Any necessary session state should be externalized to a distributed store, such as Redis. This allows any server instance to handle any request, as the session context can be retrieved from the shared store.33
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Availability:&lt;/strong&gt; Resilience is achieved through redundancy. Deploying server instances across multiple availability zones (AZs) ensures the service remains operational even if one zone fails. Health checks and circuit breaker patterns are crucial for detecting unhealthy instances and preventing cascading failures.10
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport Evolution for Scalability:&lt;/strong&gt; As mentioned earlier, the use of Streamable HTTP is a key enabler for scalability, especially on serverless platforms like AWS Lambda or Google Cloud Functions, where long-lived connections are impractical and expensive.9&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.3. Performance Tuning, Observability, and Protocol Versioning&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance Tuning and Metrics:&lt;/strong&gt; Monitoring key performance metrics is vital. This includes latency (p95, p99 percentiles), throughput (requests per second), error rates, CPU/memory utilization, and cache hit rates. Identifying bottlenecks through continuous monitoring allows for targeted optimizations.36
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; In a distributed microservices architecture, observability is paramount. Implementing structured logging, distributed tracing (using standards like OpenTelemetry), and monitoring dashboards (with tools like Prometheus and Grafana) provides the necessary visibility to debug issues and understand end-to-end system behavior.33
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Fine-Tuning for MCP:&lt;/strong&gt; An advanced technique for optimizing performance is to fine-tune the LLM on a dataset of MCP tool-calling examples. This can significantly improve the model's ability to select the correct tool, provide the right arguments, and interpret the results, reducing latency and error rates by decreasing the number of trial-and-error attempts in the reasoning cycle.37
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol Versioning:&lt;/strong&gt; MCP uses a date-based versioning scheme (YYYY-MM-DD) that changes only when backward-incompatible changes are introduced.28 This conservative versioning strategy is designed for ecosystem stability. It allows new features to be added in a backward-compatible manner without forcing immediate upgrades across the entire network of clients and servers, promoting a gradual and robust evolution of the standard.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;7. Conclusion and Future Directions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol has emerged as a critical piece of infrastructure for the advancement of artificial intelligence, moving the field from isolated demonstrations to integrated, production-grade agentic systems. By applying robust software engineering principles—standardization, modularity, and separation of concerns—to the challenge of LLM integration, MCP provides the necessary architectural foundation for composability, security, and scalability. It enables developers to build systems where LLMs are not just text generators but dynamic agents that can interact with the digital world in a reliable and auditable manner.&lt;/p&gt;

&lt;p&gt;The future trajectory of MCP points towards even deeper integration with enterprise ecosystems. The development of more sophisticated authorization extensions that integrate seamlessly with corporate identity providers (IdPs) and Single Sign-On (SSO) solutions is expected, simplifying access management at scale.9 The ecosystem of servers will continue to grow, with an increasing focus on certified, trusted servers that adhere to strict security and maintenance standards. Furthermore, as agents become more complex, the protocol itself may evolve to support inter-agent, not just agent-tool, interactions.&lt;/p&gt;

&lt;p&gt;Ultimately, MCP should be viewed not as a final product but as a foundational protocol, analogous to the role that HTTP and TCP/IP played for the web and computer networking.7 It is the standardized communication layer upon which the next generation of intelligent, autonomous applications will be built, enabling a future where AI systems can collaborate securely and efficiently to solve increasingly complex problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;References cited&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;What is Model Context Protocol (MCP)? A guide | Google Cloud, acessado em outubro 28, 2025, &lt;a href="https://cloud.google.com/discover/what-is-model-context-protocol" rel="noopener noreferrer"&gt;https://cloud.google.com/discover/what-is-model-context-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Building Your First Model Context Protocol Server - The New Stack, acessado em outubro 28, 2025, &lt;a href="https://thenewstack.io/building-your-first-model-context-protocol-server/" rel="noopener noreferrer"&gt;https://thenewstack.io/building-your-first-model-context-protocol-server/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model Context Protocol (MCP) 101: How LLMs Connect to the Real World, acessado em outubro 28, 2025, &lt;a href="https://datasciencedojo.com/blog/model-context-protocol-mcp/" rel="noopener noreferrer"&gt;https://datasciencedojo.com/blog/model-context-protocol-mcp/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP 101: An Introduction to Model Context Protocol | DigitalOcean, acessado em outubro 28, 2025, &lt;a href="https://www.digitalocean.com/community/tutorials/model-context-protocol" rel="noopener noreferrer"&gt;https://www.digitalocean.com/community/tutorials/model-context-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What is the Model Context Protocol (MCP)? - Cloudflare, acessado em outubro 28, 2025, &lt;a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/" rel="noopener noreferrer"&gt;https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What is Model Context Protocol (MCP)? - IBM, acessado em outubro 28, 2025, &lt;a href="https://www.ibm.com/think/topics/model-context-protocol" rel="noopener noreferrer"&gt;https://www.ibm.com/think/topics/model-context-protocol&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A beginners Guide on Model Context Protocol (MCP) - OpenCV, acessado em outubro 28, 2025, &lt;a href="https://opencv.org/blog/model-context-protocol/" rel="noopener noreferrer"&gt;https://opencv.org/blog/model-context-protocol/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model Context Protocol (MCP): Understanding security risks and ..., acessado em outubro 28, 2025, &lt;a href="https://www.redhat.com/en/blog/model-context-protocol-mcp-understanding-security-risks-and-controls" rel="noopener noreferrer"&gt;https://www.redhat.com/en/blog/model-context-protocol-mcp-understanding-security-risks-and-controls&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The current state of MCP (Model Context Protocol) - Elasticsearch Labs, acessado em outubro 28, 2025, &lt;a href="https://www.elastic.co/search-labs/blog/mcp-current-state" rel="noopener noreferrer"&gt;https://www.elastic.co/search-labs/blog/mcp-current-state&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AI Model Context Architecture (MCP) Scaling: Load Balancing, Queuing, and API Governance | by Valdez Ladd | Aug, 2025 | Medium, acessado em outubro 28, 2025, &lt;a href="https://medium.com/@oracle_43885/ai-model-context-architecture-mcp-scaling-load-balancing-queuing-and-api-governance-c8d9ecd0b482" rel="noopener noreferrer"&gt;https://medium.com/@oracle_43885/ai-model-context-architecture-mcp-scaling-load-balancing-queuing-and-api-governance-c8d9ecd0b482&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompt Engineering vs Context Engineering — and Why Both Matter for AI Coding - Reddit, acessado em outubro 28, 2025, &lt;a href="https://www.reddit.com/r/ClaudeAI/comments/1nzt1gh/prompt_engineering_vs_context_engineering_and_why/" rel="noopener noreferrer"&gt;https://www.reddit.com/r/ClaudeAI/comments/1nzt1gh/prompt_engineering_vs_context_engineering_and_why/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Effective context engineering for AI agents - Anthropic, acessado em outubro 28, 2025, &lt;a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" rel="noopener noreferrer"&gt;https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Context Engineering vs Prompt Engineering | by Mehul Gupta | Data Science in Your Pocket, acessado em outubro 28, 2025, &lt;a href="https://medium.com/data-science-in-your-pocket/context-engineering-vs-prompt-engineering-379e9622e19d" rel="noopener noreferrer"&gt;https://medium.com/data-science-in-your-pocket/context-engineering-vs-prompt-engineering-379e9622e19d&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompt Engineering vs Context Engineering Explained | by Tahir - Medium, acessado em outubro 28, 2025, &lt;a href="https://medium.com/@tahirbalarabe2/prompt-engineering-vs-context-engineering-explained-ce2f37179061" rel="noopener noreferrer"&gt;https://medium.com/@tahirbalarabe2/prompt-engineering-vs-context-engineering-explained-ce2f37179061&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Context Engineering and MCP Toolbox: The Hidden Backbone of Modern AI You Must Know - MyExamCloud Blog Article, acessado em outubro 28, 2025, &lt;a href="https://www.myexamcloud.com/blog/context-engineering-mcp-toolbox-modern-ai.article" rel="noopener noreferrer"&gt;https://www.myexamcloud.com/blog/context-engineering-mcp-toolbox-modern-ai.article&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP and RAG: A Powerful Partnership for Advanced AI Applications ..., acessado em outubro 28, 2025, &lt;a href="https://medium.com/the-ai-forum/mcp-and-rag-a-powerful-partnership-for-advanced-ai-applications-858c074fc5db" rel="noopener noreferrer"&gt;https://medium.com/the-ai-forum/mcp-and-rag-a-powerful-partnership-for-advanced-ai-applications-858c074fc5db&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Comparing MCP vs LangChain/ReAct for Chatbots - Glama, acessado em outubro 28, 2025, &lt;a href="https://glama.ai/blog/2025-09-02-comparing-mcp-vs-lang-chainre-act-for-chatbots" rel="noopener noreferrer"&gt;https://glama.ai/blog/2025-09-02-comparing-mcp-vs-lang-chainre-act-for-chatbots&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;How AI Agents Are Getting Smarter: MCP, ReAct, RAG &amp;amp; A2A Explained Simply, acessado em outubro 28, 2025, &lt;a href="https://dev.to/kumarprateek18/how-ai-agents-are-getting-smarter-mcp-react-rag-a2a-explained-simply-2dh1"&gt;https://dev.to/kumarprateek18/how-ai-agents-are-getting-smarter-mcp-react-rag-a2a-explained-simply-2dh1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Dynamic ReAct: Scalable Tool Selection for Large-Scale MCP Environments - arXiv, acessado em outubro 28, 2025, &lt;a href="https://arxiv.org/html/2509.20386v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2509.20386v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Supercharging LangChain: Integrating 2000+ MCP with ReAct | by hideya - Medium, acessado em outubro 28, 2025, &lt;a href="https://medium.com/@h1deya/supercharging-langchain-integrating-450-mcp-with-react-d4e467cbf41a" rel="noopener noreferrer"&gt;https://medium.com/@h1deya/supercharging-langchain-integrating-450-mcp-with-react-d4e467cbf41a&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Architecture overview - Model Context Protocol, acessado em outubro 28, 2025, &lt;a href="https://modelcontextprotocol.io/docs/learn/architecture" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/docs/learn/architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Architecture - Model Context Protocol, acessado em outubro 28, 2025, &lt;a href="https://modelcontextprotocol.io/specification/2025-03-26/architecture" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-03-26/architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The Model Context Protocol (MCP) — A Complete Tutorial | by Dr. Nimrita Koul | Medium, acessado em outubro 28, 2025, &lt;a href="https://medium.com/@nimritakoul01/the-model-context-protocol-mcp-a-complete-tutorial-a3abe8a7f4ef" rel="noopener noreferrer"&gt;https://medium.com/@nimritakoul01/the-model-context-protocol-mcp-a-complete-tutorial-a3abe8a7f4ef&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;How the Model Context Protocol (MCP) Works | Lucidworks, acessado em outubro 28, 2025, &lt;a href="https://lucidworks.com/blog/how-the-model-context-protocol-works-a-technical-deep-dive" rel="noopener noreferrer"&gt;https://lucidworks.com/blog/how-the-model-context-protocol-works-a-technical-deep-dive&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What Is the Model Context Protocol (MCP) and How It Works - Descope, acessado em outubro 28, 2025, &lt;a href="https://www.descope.com/learn/post/mcp" rel="noopener noreferrer"&gt;https://www.descope.com/learn/post/mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Extend large language models powered by Amazon SageMaker AI using Model Context Protocol | Artificial Intelligence - AWS, acessado em outubro 28, 2025, &lt;a href="https://aws.amazon.com/blogs/machine-learning/extend-large-language-models-powered-by-amazon-sagemaker-ai-using-model-context-protocol/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/extend-large-language-models-powered-by-amazon-sagemaker-ai-using-model-context-protocol/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models, acessado em outubro 28, 2025, &lt;a href="https://arxiv.org/html/2508.12566v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2508.12566v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Versioning - Model Context Protocol, acessado em outubro 28, 2025, &lt;a href="https://modelcontextprotocol.io/specification/versioning" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/versioning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP protocol buffers: The ultimate guide to efficient data serialization in 2025 - BytePlus, acessado em outubro 28, 2025, &lt;a href="https://www.byteplus.com/en/topic/541241" rel="noopener noreferrer"&gt;https://www.byteplus.com/en/topic/541241&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Why not use Protobuf messages and gRPC transport? #1144 - GitHub, acessado em outubro 28, 2025, &lt;a href="https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/1144" rel="noopener noreferrer"&gt;https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/1144&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Building MCP Servers from Protobuf (Part 1): Protobuf to REST API, acessado em outubro 28, 2025, &lt;a href="https://www.enterprisedb.com/blog/building-mcp-servers-protobuf-part1-protobuf-rest-api" rel="noopener noreferrer"&gt;https://www.enterprisedb.com/blog/building-mcp-servers-protobuf-part1-protobuf-rest-api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Building MCP Servers from Protobuf (Part2): Automate MCP Server ..., acessado em outubro 28, 2025, &lt;a href="https://www.enterprisedb.com/blog/building-mcp-servers-protobuf-part2-automate-mcp-server-creation-protoc-plugins" rel="noopener noreferrer"&gt;https://www.enterprisedb.com/blog/building-mcp-servers-protobuf-part2-automate-mcp-server-creation-protoc-plugins&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Scaling MCP Servers: Architecture Patterns for Production | Devsatva - Data Engineering &amp;amp; AI Consultancy, acessado em outubro 28, 2025, &lt;a href="https://devsatva.com/blog/mcp-scaling-production" rel="noopener noreferrer"&gt;https://devsatva.com/blog/mcp-scaling-production&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Can Model Context Protocol (MCP) scale to support hundreds of simultaneous users?, acessado em outubro 28, 2025, &lt;a href="https://milvus.io/ai-quick-reference/can-model-context-protocol-mcp-scale-to-support-hundreds-of-simultaneous-users" rel="noopener noreferrer"&gt;https://milvus.io/ai-quick-reference/can-model-context-protocol-mcp-scale-to-support-hundreds-of-simultaneous-users&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Deploy scalable MCP servers with Ray Serve - Anyscale Docs, acessado em outubro 28, 2025, &lt;a href="https://docs.anyscale.com/mcp/scalable-remote-mcp-deployment" rel="noopener noreferrer"&gt;https://docs.anyscale.com/mcp/scalable-remote-mcp-deployment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What metrics should I track for a healthy Model Context Protocol (MCP) service? - Milvus, acessado em outubro 28, 2025, &lt;a href="https://milvus.io/ai-quick-reference/what-metrics-should-i-track-for-a-healthy-model-context-protocol-mcp-service" rel="noopener noreferrer"&gt;https://milvus.io/ai-quick-reference/what-metrics-should-i-track-for-a-healthy-model-context-protocol-mcp-service&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Model Fine-Tuning: Techniques &amp;amp; Best Practices 2025 - BytePlus, acessado em outubro 28, 2025, &lt;a href="https://www.byteplus.com/en/topic/541921" rel="noopener noreferrer"&gt;https://www.byteplus.com/en/topic/541921&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A Measurement Study of Model Context Protocol - arXiv, acessado em outubro 28, 2025, &lt;a href="https://arxiv.org/html/2509.25292v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2509.25292v1&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>python</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking: A Framework for High-Fidelity Chunk Retrieval in RAG Systems</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Thu, 18 Sep 2025 14:01:42 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/graph-augmented-hybrid-retrieval-and-multi-stage-re-ranking-a-framework-for-high-fidelity-chunk-50ca</link>
      <guid>https://forem.com/lucash_ribeiro_dev/graph-augmented-hybrid-retrieval-and-multi-stage-re-ranking-a-framework-for-high-fidelity-chunk-50ca</guid>
      <description>&lt;p&gt;&lt;strong&gt;Abstract&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This paper addresses critical limitations in modern Retrieval-Augmented Generation (RAG) systems, namely context fragmentation and the relevance-performance trade-off in retrieval. We introduce the Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework, a novel, multi-stage architecture designed to enhance the precision and contextual coherence of retrieved data chunks. GAHR-MSR integrates three key innovations: (1) a Graph-Aware Chunking and Indexing strategy that enriches text segments with structured metadata derived from a knowledge graph; (2) a high-recall initial retrieval stage using hybrid (dense and sparse) vector search with Reciprocal Rank Fusion (RRF); and (3) a high-precision, cascaded re-ranking stage employing the ColBERT late-interaction model. Implemented using the Qdrant vector database, our framework demonstrates significant improvements over baseline retrieval methods on the SciFact benchmark. We present a detailed analysis of the architecture, including mathematical formulations, implementation specifics, and empirical results, showcasing a marked increase in nDCG@10, thereby establishing a new state-of-the-art for high-fidelity information retrieval in knowledge-intensive applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The advent of Large Language Models (LLMs) has catalyzed a paradigm shift in artificial intelligence, yet their efficacy is often constrained by inherent limitations such as knowledge cutoffs and a propensity for "hallucination," or the generation of factually incorrect information.1 Retrieval-Augmented Generation (RAG) has emerged as a dominant architectural pattern to mitigate these issues, enhancing LLM outputs by grounding them in external, up-to-date knowledge bases.2 By retrieving relevant information and providing it as context within the LLM's prompt, RAG systems promise more accurate, attributable, and trustworthy responses.5 However, the theoretical promise of RAG is frequently undermined by practical challenges in its implementation, particularly within the retrieval component. A typical RAG workflow involves multiple, complex processing steps, which can lead to prolonged response times and suboptimal retrieval quality.2 The performance of the entire system is fundamentally bottlenecked by the fidelity of the retrieved context; if the retriever provides irrelevant or incomplete information, the generator's output will be correspondingly flawed.&lt;/p&gt;

&lt;p&gt;The limitations of conventional retrieval methods are a primary source of these performance issues. Two core problems stand out. The first is &lt;strong&gt;context fragmentation&lt;/strong&gt;. Standard document preparation techniques, such as fixed-size chunking, are computationally simple but semantically naive.6 They often sever logical units of thought, splitting coherent arguments or critical pieces of information across multiple, disconnected chunks.8 When a query requires synthesizing information that now resides in separate fragments, a simple retriever may fail to gather all necessary pieces, leading to an incomplete context and a superficial response from the LLM.2 The second problem is the&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;relevance ceiling&lt;/strong&gt; of initial retrieval stages. The evolution from single-pass dense vector search to hybrid search—combining the semantic understanding of dense embeddings with the keyword precision of sparse vectors—has significantly improved recall.9 However, this approach often retrieves a large set of documents that are merely topically related, not precisely and deeply relevant to the user's specific, nuanced intent. This creates a "relevance ceiling," where further improvements in the embedding models alone yield diminishing returns in the final quality of the retrieved set.&lt;/p&gt;

&lt;p&gt;To overcome these fundamental challenges, this paper introduces the Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework. GAHR-MSR is a holistic, multi-stage pipeline designed to maximize both the contextual coherence and the precision of retrieved information. Its central thesis is that by structuring knowledge with graphs at the indexing stage and applying a multi-stage, precision-focused refinement process at query time, we can drastically improve the fidelity of the context provided to the LLM. The framework is built upon three core contributions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Graph-Aware Chunking:&lt;/strong&gt; A novel pre-processing strategy that moves beyond simple text splitting to enrich semantic chunks with structured metadata extracted from a pre-computed knowledge graph, preserving critical entity and relationship context.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-Recall Hybrid Retrieval:&lt;/strong&gt; A robust first-stage retrieval that leverages the combined power of dense and sparse vectors, fused using Reciprocal Rank Fusion (RRF), to ensure a comprehensive candidate set is identified.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cascaded ColBERT Re-ranking:&lt;/strong&gt; A high-precision, multi-stage refinement process that uses the computationally efficient yet powerful ColBERT late-interaction model to re-rank the candidate set, ensuring the final context is maximally relevant.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The development of this framework reflects a broader architectural shift occurring in the field of advanced information retrieval. Early systems focused on optimizing a single retrieval algorithm, searching for the "best" embedding model for a monolithic, one-shot search.11 The recognition that dense vectors often miss critical keywords led to the adoption of hybrid search, combining dense and sparse retrievers to improve&lt;/p&gt;

&lt;p&gt;&lt;em&gt;recall&lt;/em&gt;.9 This marked the first step toward a multi-stage pipeline. However, this high-recall approach introduced noise—topically similar but irrelevant documents—which necessitated a second stage focused on&lt;/p&gt;

&lt;p&gt;&lt;em&gt;precision&lt;/em&gt;. This led to the integration of re-rankers, more computationally intensive but highly accurate models like cross-encoders or ColBERT, to refine the initial candidate set.14 This evolution has established a dominant design pattern: a "Recall-to-Precision Funnel." The GAHR-MSR framework formalizes and advances this pattern by introducing a crucial pre-processing stage (Graph-Aware Chunking) and optimizing the refinement stage (cascaded re-ranking), representing the next logical step in this architectural progression. It moves beyond treating retrieval as a single step and instead conceptualizes it as a structured, multi-phase process of candidate generation and progressive refinement.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Background and Related Work&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The GAHR-MSR framework is built upon a confluence of advancements in vector databases, hybrid search techniques, re-ranking models, and graph-based retrieval. This section provides a comprehensive review of these foundational technologies, establishing the scientific context for our contributions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.1. Vector Database Architectures: The Case of Qdrant&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Vector databases are specialized systems purpose-built to store, index, and query high-dimensional vector embeddings, which are numerical representations of unstructured data like text, images, and audio.11 Unlike traditional relational databases that operate on exact matches within structured schemas, vector databases excel at similarity search, finding vectors that are "closest" to a query vector in a high-dimensional space according to a given distance metric.17 This capability is essential for modern AI applications that require understanding semantic or conceptual similarity rather than exact keyword matches.11 Common distance metrics used to quantify similarity include Cosine Similarity, which measures the cosine of the angle between two vectors, and Euclidean Distance, which measures the straight-line distance between two points in the vector space.18&lt;/p&gt;

&lt;p&gt;Qdrant is a production-ready vector database written in Rust, designed for performance, scalability, and reliability under high load.20 Its architecture incorporates several key features that make it particularly well-suited for advanced RAG applications. At the core of its search capability is a bespoke modification of the&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hierarchical Navigable Small World (HNSW)&lt;/strong&gt; algorithm for Approximate Nearest Neighbor (ANN) search.17 HNSW constructs a multi-layered graph where nodes are vectors. Upper layers contain long-range connections for coarse, rapid navigation across the vector space, while lower layers contain short-range connections for fine-grained, precise search.22 This hierarchical structure allows Qdrant to perform searches in logarithmic time complexity, making it highly efficient even with billions of vectors.11&lt;/p&gt;

&lt;p&gt;A critical architectural innovation in Qdrant is its &lt;strong&gt;segment-based storage model&lt;/strong&gt;.23 Data is organized into segments, which can be either mutable (for incoming data) or immutable. Once a mutable segment reaches a certain size, it is optimized into an immutable segment, and a new HNSW index is built on it. This design allows Qdrant to handle real-time data updates without compromising search performance, a significant advantage over in-memory indexing libraries that may require costly full re-indexing.18 Furthermore, Qdrant provides robust support for associating rich, filterable&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON payloads&lt;/strong&gt; with each vector.19 It allows for the creation of secondary indexes on these payload fields, enabling efficient pre-filtering based on metadata&lt;/p&gt;

&lt;p&gt;&lt;em&gt;before&lt;/em&gt; the computationally expensive vector search is executed.17 This "filtrable HNSW" capability is a cornerstone of the GAHR-MSR framework, as it allows us to leverage the structured graph metadata for targeted retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.2. Hybrid Search Paradigms and Result Fusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;While dense vector search is powerful for capturing semantic meaning, it can fail in scenarios requiring exact keyword matches. For instance, a query for a specific product ID or a unique name may not be well-represented semantically. This limitation has led to the rise of hybrid search, which combines the strengths of dense and sparse vector representations.9&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dense vectors&lt;/strong&gt;, typically generated by transformer-based models like BERT, are fixed-length arrays where each dimension represents a learned semantic feature.24 They excel at capturing context, nuance, and conceptual similarity. For example, the vectors for "boat" and "ferry" would be close in the vector space.18&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sparse vectors&lt;/strong&gt;, in contrast, are high-dimensional vectors where most elements are zero. Each non-zero dimension corresponds to a specific token (word) in a vocabulary, and its value represents the token's importance, often calculated using methods like TF-IDF, BM25, or more advanced learned models like SPLADE.21 Sparse vectors are highly effective for keyword-based retrieval, ensuring that documents containing specific query terms are found.&lt;/p&gt;

&lt;p&gt;To combine the results from these two disparate retrieval methods, a fusion algorithm is required. &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt; is a simple yet highly effective technique for merging multiple ranked lists into a single, unified result set.26 RRF operates on a straightforward principle: documents that consistently appear at high ranks across different result lists are likely more relevant. The algorithm calculates a final score for each document by summing its reciprocal rank scores from each list in which it appears. The mathematical formulation for the RRF score of a document&lt;/p&gt;

&lt;p&gt;d is:&lt;/p&gt;

&lt;p&gt;ScoreRRF​(d)=i∈R∑​k+ranki​(d)1​&lt;br&gt;&lt;br&gt;
Here, R is the set of result lists being fused, ranki​(d) is the rank (position) of document d in list i, and k is a constant used to diminish the impact of lower-ranked documents, typically set to 60.27 By giving more weight to documents with a lower rank (i.e., appearing closer to the top), RRF effectively boosts the relevance of items that both semantically match (from the dense search) and contain the right keywords (from the sparse search). Qdrant natively supports RRF through its flexible&lt;/p&gt;

&lt;p&gt;Query API, allowing for the seamless fusion of results from multiple parallel prefetch queries.26&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;2.3. Advanced Re-ranking with ColBERT&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The initial hybrid retrieval stage is optimized for high recall, aiming to capture all potentially relevant documents. However, this often comes at the cost of precision, including many documents that are only tangentially related. A re-ranking stage is therefore essential to refine this initial candidate set, re-ordering the documents based on a more sophisticated and accurate relevance model.28 While full cross-encoders offer state-of-the-art accuracy, their computational cost is often prohibitive for real-time applications, as they require a full forward pass of a large transformer model for every query-document pair.15&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ColBERT (Contextualized Late Interaction over BERT)&lt;/strong&gt; emerges as a powerful compromise, balancing the accuracy of cross-encoders with the efficiency of bi-encoders.14 The key innovation of ColBERT is its&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"late interaction"&lt;/strong&gt; mechanism.31 Unlike a cross-encoder, which performs an early and deep interaction by concatenating the query and document, ColBERT computes contextualized token-level embeddings for the query and the document&lt;/p&gt;

&lt;p&gt;&lt;em&gt;independently&lt;/em&gt; using a BERT-based bi-encoder architecture. This separation allows for the pre-computation and indexing of document token embeddings, drastically speeding up query processing.15&lt;/p&gt;

&lt;p&gt;The relevance score is calculated at query time using the &lt;strong&gt;MaxSim operator&lt;/strong&gt;. For each token embedding in the query, ColBERT finds its maximum similarity (typically using dot product) with any token embedding in the document. These maximum similarity scores are then summed across all query tokens to produce the final relevance score. The formal mathematical equation for the MaxSim operator is:&lt;/p&gt;

&lt;p&gt;ScoreColBERT​(q,d)=i=1∑∣Eq​∣​j=1max∣Ed​∣​(Eqi​​⋅Edj​T​)&lt;br&gt;&lt;br&gt;
In this equation, Eq​ is the matrix of token embeddings for the query q, and Ed​ is the matrix of token embeddings for the document d.14 This "sum of max-similarity" approach allows ColBERT to capture fine-grained, token-level relevance signals—essentially checking if each part of the query is "covered" by some part of the document—without the computational overhead of full self-attention.33 Qdrant's native support for&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;multivectors&lt;/strong&gt; makes it an ideal backend for storing and retrieving the token-level embeddings required by ColBERT, enabling its integration into a high-performance retrieval pipeline.23&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;2.4. Graph-Based Retrieval-Augmented Generation (GraphRAG)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;While hybrid search and re-ranking improve the retrieval of individual chunks, they still treat the knowledge base as a flat collection of disconnected texts. &lt;strong&gt;GraphRAG&lt;/strong&gt; represents a paradigm shift, moving from retrieving isolated chunks to retrieving interconnected knowledge represented in a graph structure.5 This approach is particularly effective for answering holistic, complex questions that require synthesizing information from multiple, disparate sources, a task where traditional RAG often struggles.38&lt;/p&gt;

&lt;p&gt;The canonical GraphRAG workflow, as pioneered by projects like Microsoft's GraphRAG, involves a sophisticated indexing process that transforms an unstructured text corpus into a structured, queryable knowledge asset.38 The key steps are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Graph Construction:&lt;/strong&gt; An LLM is used to parse source documents, performing entity and relationship extraction. These extractions are used to build a knowledge graph where nodes represent entities (e.g., people, organizations, concepts) and edges represent the relationships between them.36
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community Detection:&lt;/strong&gt; Graph clustering algorithms, such as the Leiden algorithm, are applied to the knowledge graph to identify dense subgraphs of thematically related entities. These clusters are referred to as "communities".37
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchical Summarization:&lt;/strong&gt; In a bottom-up process, the LLM generates summaries for each detected community. These summaries are then recursively summarized at higher levels of the community hierarchy, creating a multi-level abstraction of the entire knowledge base.38 This pre-computed summary structure is the key to efficiently answering broad, summary-level queries without needing to process the entire corpus at query time.43&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The parallel development of these advanced retrieval techniques reveals a deeper trend: the convergence of sub-symbolic and symbolic AI in the context of RAG. Early RAG systems were purely sub-symbolic, relying on the geometric proximity of dense vectors in a high-dimensional space.11 The introduction of hybrid search marked a step toward acknowledging the limitations of purely semantic representations by incorporating sparse vectors, which map directly to keywords (symbols).25 GraphRAG represents the full integration of a symbolic knowledge structure—the graph—into the retrieval process, using its explicit connections to guide search and provide structured context.5 The GAHR-MSR framework, proposed in this paper, takes this convergence a step further. It does not merely use the graph as a separate retrieval source; it leverages the symbolic knowledge from the graph to fundamentally structure and enrich the sub-symbolic data (the text chunks and their embeddings) at the point of ingestion. This positions our work at the forefront of this convergence, arguing that the future of high-fidelity RAG lies in the deep, architectural integration of these two AI paradigms, rather than treating them as separate, bolt-on components.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;3. The GAHR-MSR Framework&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework is a comprehensive, multi-phase architecture designed to maximize the relevance and contextual integrity of information retrieved for RAG systems. It systematically addresses the shortcomings of conventional retrieval pipelines through a novel combination of graph-based indexing, high-recall hybrid search, and high-precision cascaded re-ranking. This section provides a detailed technical exposition of each phase.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;3.1. Phase 1: Graph-Aware Chunking and Multi-Modal Indexing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The foundational premise of the GAHR-MSR framework is that retrieval quality begins at indexing. Standard chunking strategies are a primary source of error in RAG, as they disregard the semantic and structural relationships within the source data.2 Our novel approach,&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graph-Aware Chunking&lt;/strong&gt;, reframes this initial step from a simple text-splitting task into a knowledge enrichment process, embedding structured context directly into each data unit before it enters the vector database.&lt;/p&gt;

&lt;p&gt;The process unfolds as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge Graph Construction:&lt;/strong&gt; For a given corpus of documents, we first construct a knowledge graph (KG). This is achieved by leveraging a powerful LLM to perform entity and relationship extraction on the entire corpus, following the methodology established by GraphRAG.39 The output is a graph where nodes represent key entities (e.g., persons, organizations, technical concepts) and edges represent the explicit relationships between them (e.g., "developed by," "is a part of"). This KG serves as a symbolic map of the knowledge contained within the corpus.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Chunking:&lt;/strong&gt; Concurrently, the source documents are segmented into coherent text chunks. Instead of fixed-size splitting, a more sophisticated strategy like recursive or semantic chunking is employed.6 This ensures that chunk boundaries align with natural semantic breaks (e.g., paragraphs or sentences), preserving the logical flow and completeness of ideas within each chunk.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk Enrichment:&lt;/strong&gt; This is the core innovation of the phase. For each semantically coherent text chunk, we query the pre-computed KG to identify all entities and relationships that are mentioned within that specific text segment. This structured, symbolic information is then packaged as metadata and associated directly with the chunk.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The final step is to index these enriched chunks into a single, highly structured Qdrant collection. Qdrant's support for named vectors and rich payloads is critical for this multi-modal representation. Each point in the collection, representing one enriched chunk, is composed of the following components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Named Dense Vector (dense_vector):&lt;/strong&gt; A dense embedding generated from the raw text content of the chunk. This vector captures the overall semantic meaning and is produced by a state-of-the-art sentence-transformer model, such as sentence-transformers/all-MiniLM-L6-v2.13
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Named Sparse Vector (sparse_vector):&lt;/strong&gt; A sparse embedding for precise keyword matching. This is generated using a learned sparse model like prithivida/Splade_PP_en_v1, which has been shown to outperform traditional methods like BM25.13
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Named Multi-Vector (colbert_vector):&lt;/strong&gt; The pre-computed token-level embeddings for the chunk's text content, generated by the ColBERT model. This is a matrix of vectors, stored efficiently using Qdrant's multivector support, and is reserved for use in the final re-ranking phase.35
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON Payload:&lt;/strong&gt; A structured JSON object containing the original raw text, source document identifiers, and the crucial graph-derived metadata. This payload is indexed for fast, exact-match filtering. An example payload structure is:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The ColBERT model uses a late interaction mechanism..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="nl"&gt;"source\_doc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"paper&lt;/span&gt;&lt;span class="se"&gt;\_&lt;/span&gt;&lt;span class="s2"&gt;xyz.pdf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="nl"&gt;"graph\_metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;  
      &lt;/span&gt;&lt;span class="nl"&gt;"entities"&lt;/span&gt;&lt;span class="p"&gt;:,&lt;/span&gt;&lt;span class="w"&gt;  
      &lt;/span&gt;&lt;span class="nl"&gt;"relationships"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;  
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This indexing schema creates a rich, multi-faceted representation of each chunk, combining sub-symbolic semantic information (dense vector), symbolic keyword information (sparse vector), fine-grained contextual information (ColBERT multi-vector), and explicit structural knowledge (graph payload).&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;3.2. Phase 2: High-Recall Hybrid Candidate Retrieval&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The objective of the second phase is to retrieve a broad yet highly relevant set of candidate chunks with maximum recall. This forms the input for the subsequent precision-focused re-ranking phase. We leverage Qdrant's advanced Query API to construct a sophisticated, multi-pronged search query that executes in a single API call.&lt;/p&gt;

&lt;p&gt;The implementation relies on Qdrant's prefetch capability, which allows multiple sub-queries to be executed in parallel before their results are combined.26 The query is structured as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Sub-Queries:&lt;/strong&gt; The query includes two prefetch clauses:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefetch 1 (Dense Search):&lt;/strong&gt; A dense vector similarity search is performed against the dense_vector field using the dense embedding of the user's query.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefetch 2 (Sparse Search):&lt;/strong&gt; A sparse vector similarity search is performed against the sparse_vector field using the sparse embedding of the user's query.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph-Aware Pre-Filtering (Optional):&lt;/strong&gt; The true power of the Graph-Aware Chunking phase is realized here. Before the vector searches are executed, we can apply a filter condition based on the indexed payload metadata. For example, if the user's query is "What is the late interaction mechanism in ColBERT?", we can first extract the entities "ColBERT" and "late interaction" from the query. The Qdrant query can then be instructed to &lt;em&gt;only&lt;/em&gt; search within the subset of points whose graph_metadata.entities array contains both of these terms. This drastically prunes the search space, eliminating irrelevant documents and allowing the vector search to operate on a much smaller, more relevant candidate pool.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result Fusion:&lt;/strong&gt; The main query clause specifies "fusion": "rrf" to combine the results from the parallel dense and sparse searches using Reciprocal Rank Fusion.26 This process, as described in Section 2.2, produces a single, unified ranked list of the top N candidate chunks (e.g., N=100), which balances semantic relevance and keyword precision.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below is a Python code snippet illustrating how to construct such a query using the qdrant-client library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qdrant&lt;/span&gt;\&lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QdrantClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;

\&lt;span class="c1"&gt;# Assume client, query\_dense\_vector, and query\_sparse\_vector are initialized  
&lt;/span&gt;\&lt;span class="c1"&gt;# Assume entities\_from\_query \=
&lt;/span&gt;
\&lt;span class="c1"&gt;# Construct the graph-aware filter  
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;\&lt;span class="n"&gt;_filter&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
    &lt;span class="n"&gt;must&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;\&lt;span class="p"&gt;[&lt;/span&gt;  
        &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FieldCondition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;graph\_metadata.entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MatchAny&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;any&lt;/span&gt;\&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;entities&lt;/span&gt;\&lt;span class="n"&gt;_from&lt;/span&gt;\&lt;span class="n"&gt;_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
        &lt;span class="p"&gt;)&lt;/span&gt;  
    \&lt;span class="p"&gt;]&lt;/span&gt;  
&lt;span class="p"&gt;)&lt;/span&gt;

\&lt;span class="c1"&gt;# Perform the hybrid search with RRF fusion and pre-filtering  
&lt;/span&gt;&lt;span class="n"&gt;hits&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="nf"&gt;_points&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
    &lt;span class="n"&gt;collection&lt;/span&gt;\&lt;span class="n"&gt;_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my\_rag\_collection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="n"&gt;prefetch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;\&lt;span class="p"&gt;[&lt;/span&gt;  
        &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="n"&gt;_dense&lt;/span&gt;\&lt;span class="n"&gt;_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="n"&gt;using&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dense\_vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="nb"&gt;filter&lt;/span&gt;\&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;\&lt;span class="n"&gt;_filter&lt;/span&gt;  \&lt;span class="c1"&gt;# Apply filter to dense search  
&lt;/span&gt;        &lt;span class="p"&gt;),&lt;/span&gt;  
        &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="n"&gt;_sparse&lt;/span&gt;\&lt;span class="n"&gt;_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="n"&gt;using&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sparse\_vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
            &lt;span class="nb"&gt;filter&lt;/span&gt;\&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;\&lt;span class="n"&gt;_filter&lt;/span&gt;  \&lt;span class="c1"&gt;# Apply filter to sparse search  
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;  
    \&lt;span class="p"&gt;],&lt;/span&gt;  
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FusionQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fusion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fusion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RRF&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;  \&lt;span class="c1"&gt;# Final number of candidates to retrieve after fusion  
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;candidate&lt;/span&gt;\&lt;span class="n"&gt;_chunks&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; \&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;\&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;\&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;\&lt;span class="p"&gt;]&lt;/span&gt;  
&lt;span class="n"&gt;candidate&lt;/span&gt;\&lt;span class="n"&gt;_ids&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; \&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;\&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This phase effectively acts as a wide net, ensuring that all potentially relevant chunks are captured while using the graph structure to eliminate noise at the earliest possible stage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.3. Phase 3: High-Precision Cascaded Re-ranking&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The final phase of the GAHR-MSR framework is dedicated to refining the candidate set to achieve maximum precision. Powerful re-rankers like ColBERT are computationally expensive, and applying them to a large, noisy set of initial candidates is inefficient.30 To balance accuracy and performance, we propose a cascaded re-ranking approach.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Step 1: Intermediate Refinement (Optional but Recommended):&lt;/strong&gt; For applications with strict latency requirements, the top N=100 candidates from Phase 2 can first be passed through a computationally cheaper re-ranker. This could be a smaller cross-encoder model (e.g., a MiniLM-based model) or a less complex late-interaction model. The purpose of this step is to efficiently prune the candidate list from N=100 down to a more manageable M=20, filtering out the least relevant results before engaging the most powerful model.
&lt;/li&gt;
&lt;li&gt;Step 2: ColBERT Final Re-ranking: The top M=20 candidates are subjected to the final, high-precision re-ranking using the ColBERT model. This process involves the following steps at query time:
a. The user's query is encoded using the ColBERT query encoder to generate its token-level embeddings (Eq​).
b. For each of the M candidate chunks, we retrieve their pre-computed colbert_vector (the document token embeddings, Ed​) from the Qdrant point's payload. This avoids costly re-computation.
c. The MaxSim score is calculated for each query-document pair using the formula defined in Section 2.3. This operation is highly parallelizable.
d. The M candidates are sorted in descending order based on their final ColBERT scores.
e. The top K chunks (e.g., K=5) are selected as the final, definitive context to be passed to the LLM for generation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A Python snippet illustrating the core logic of the ColBERT scoring is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;\&lt;span class="n"&gt;_colbert&lt;/span&gt;\&lt;span class="nf"&gt;_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;  
    Calculates the ColBERT MaxSim score.  
    Args:  
        query\_embeddings (torch.Tensor): Shape (num\_query\_tokens, dim)  
        document\_embeddings (torch.Tensor): Shape (num\_doc\_tokens, dim)  
    Returns:  
        float: The final ColBERT score.  
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;  
    \&lt;span class="c1"&gt;# Normalize embeddings for cosine similarity  
&lt;/span&gt;    &lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functional&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
    &lt;span class="n"&gt;document&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functional&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    \&lt;span class="c1"&gt;# Calculate similarity matrix  
&lt;/span&gt;    &lt;span class="n"&gt;similarity&lt;/span&gt;\&lt;span class="n"&gt;_matrix&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matmul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;\&lt;span class="n"&gt;_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    \&lt;span class="c1"&gt;# MaxSim operation: find max similarity for each query token  
&lt;/span&gt;    &lt;span class="nb"&gt;max&lt;/span&gt;\&lt;span class="n"&gt;_sim&lt;/span&gt;\&lt;span class="n"&gt;_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; \&lt;span class="n"&gt;_&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;\&lt;span class="n"&gt;_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    \&lt;span class="c1"&gt;# Sum the max similarity scores  
&lt;/span&gt;    &lt;span class="n"&gt;final&lt;/span&gt;\&lt;span class="n"&gt;_score&lt;/span&gt; \&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;\&lt;span class="n"&gt;_sim&lt;/span&gt;\&lt;span class="n"&gt;_scores&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final&lt;/span&gt;\&lt;span class="n"&gt;_score&lt;/span&gt;

\&lt;span class="c1"&gt;# Example usage within the re-ranking loop:  
&lt;/span&gt;\&lt;span class="c1"&gt;# for candidate\_id in candidate\_ids:  
&lt;/span&gt;\&lt;span class="c1"&gt;#     \# Retrieve pre-computed colbert\_vector (document\_embeddings) from Qdrant  
&lt;/span&gt;\&lt;span class="c1"&gt;#     \#...  
&lt;/span&gt;\&lt;span class="c1"&gt;#     score \= calculate\_colbert\_score(query\_colbert\_embeddings, doc\_colbert\_embeddings)  
&lt;/span&gt;\&lt;span class="c1"&gt;#     ranked\_results.append((candidate\_id, score))
&lt;/span&gt;
\&lt;span class="c1"&gt;# Sort ranked\_results and select top K
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cascaded approach ensures that the most powerful computational resources are focused only on the most promising candidates, yielding a final context that is both highly precise and contextually rich, thereby maximizing the potential of the downstream LLM generator.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Experimental Setup and Evaluation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To empirically validate the efficacy of the GAHR-MSR framework, a rigorous experimental setup was designed. This section details the dataset used, the baseline models against which our framework was compared, the evaluation metrics, and specific implementation details, including illustrative code and numerical examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.1. Dataset, Baselines, and Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Dataset:&lt;/strong&gt; The &lt;strong&gt;SciFact&lt;/strong&gt; dataset, a component of the comprehensive BeIR benchmark, was selected for this evaluation.48 SciFact is a scientific fact-checking dataset consisting of scientific claims and a corpus of research abstracts. The task is to determine if a given claim is supported or refuted by evidence within the corpus. This dataset is particularly well-suited for our evaluation as it demands the retrieval of highly specific, nuanced, and precise information, making it an excellent testbed for high-fidelity retrieval systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Baselines:&lt;/strong&gt; To isolate and measure the contribution of each component of the GAHR-MSR framework, we compared its performance against a series of progressively more sophisticated baseline models:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Baseline A (Dense Retrieval):&lt;/strong&gt; A standard semantic search implementation. This baseline uses only a dense vector index (all-MiniLM-L6-v2) and retrieves the top-k documents based on cosine similarity. This represents a common, naive RAG retrieval approach.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseline B (Hybrid Retrieval):&lt;/strong&gt; This baseline implements the first retrieval stage of our framework in isolation. It combines dense vector search with sparse vector search (SPLADE++) and fuses the results using Reciprocal Rank Fusion (RRF). This measures the improvement gained by adding hybrid search over dense-only retrieval.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseline C (Hybrid + ColBERT):&lt;/strong&gt; This baseline adds a re-ranking layer to the hybrid retrieval. The top 100 candidates from the hybrid search are re-ranked using the ColBERT model in a single stage. This allows us to measure the impact of re-ranking without the benefits of our Graph-Aware Chunking.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Metrics:&lt;/strong&gt; The performance of each framework was evaluated using a combination of standard information retrieval metrics to assess both the quality of the ranking and the overall efficiency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;nDCG@10 (Normalized Discounted Cumulative Gain at 10):&lt;/strong&gt; This is the primary metric for evaluating the quality of the final ranked list. It measures the relevance of the top 10 retrieved documents, heavily penalizing relevant documents that appear lower in the ranking. It is ideal for assessing the precision of the final context provided to an LLM.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recall@100:&lt;/strong&gt; This metric measures the proportion of all relevant documents that are found within the top 100 retrieved candidates. It is used to evaluate the effectiveness of the initial retrieval stage (Phase 2), as a high recall is necessary to ensure that the re-ranker has access to the correct information.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency (ms/query):&lt;/strong&gt; The average time taken to process a single query, measured from query submission to the return of the final ranked list. This metric quantifies the computational cost and real-world applicability of each approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.2. Implementation Details&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The entire pipeline was implemented in Python. The qdrant-client library was used for all interactions with the Qdrant database. The transformers library from Hugging Face provided the pre-trained models for dense embeddings (sentence-transformers/all-MiniLM-L6-v2), sparse embeddings (prithivida/Splade_PP_en_v1), and ColBERT re-ranking (colbert-ir/colbertv2.0).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector Examples and Calculations:&lt;/strong&gt; To provide a concrete illustration of the core mathematical operations, consider the following simplified numerical example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query: "ColBERT late interaction"
&lt;/li&gt;
&lt;li&gt;Document A (Relevant): "ColBERT uses a late interaction mechanism..."
&lt;/li&gt;
&lt;li&gt;Document B (Less Relevant): "BERT models are used for semantic search..."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Phase 2: RRF Calculation Example&lt;br&gt;&lt;br&gt;
Assume after the dense and sparse searches, the rankings are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dense Search Results:&lt;/strong&gt; 1. Doc A (score: 0.92), 2. Doc B (score: 0.85),...
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparse Search Results:&lt;/strong&gt; 1. Doc A (score: 25.4), 2. Doc C (score: 19.1),... (Doc B is not in the top results)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the RRF formula with k=60:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ScoreRRF​(DocA)=60+11​+60+11​=0.0164+0.0164=0.0328
&lt;/li&gt;
&lt;li&gt;ScoreRRF​(DocB)=60+21​=0.0161
&lt;/li&gt;
&lt;li&gt;ScoreRRF​(DocC)=60+21​=0.0161&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Document A, appearing at rank 1 in both lists, receives a significantly higher fused score and is promoted to the top of the candidate list.&lt;/p&gt;

&lt;p&gt;Phase 3: ColBERT MaxSim Calculation Example&lt;br&gt;&lt;br&gt;
Let's re-rank Document A. Assume for simplicity that our embeddings are 3-dimensional.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query Token Embeddings (Eq​):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;colbert: [0.8, 0.1, 0.3]
&lt;/li&gt;
&lt;li&gt;late: [0.2, 0.9, 0.1]
&lt;/li&gt;
&lt;li&gt;interaction: [0.4, 0.2, 0.7]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Document A Token Embeddings (Ed​):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;colbert: [0.82, 0.11, 0.29]
&lt;/li&gt;
&lt;li&gt;uses: [0.1, 0.1, 0.1]
&lt;/li&gt;
&lt;li&gt;a: [0.05, 0.05, 0.05]
&lt;/li&gt;
&lt;li&gt;late: [0.21, 0.88, 0.12]
&lt;/li&gt;
&lt;li&gt;interaction: [0.43, 0.19, 0.71]
&lt;/li&gt;
&lt;li&gt;mechanism: [0.5, 0.4, 0.3]&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The calculation proceeds as follows (using dot product for similarity):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;For query token colbert:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;sim(colbert, colbert) = 0.8*0.82 + 0.1*0.11 + 0.3*0.29 = 0.754
&lt;/li&gt;
&lt;li&gt;... (calculate similarity with all other doc tokens)
&lt;/li&gt;
&lt;li&gt;max_sim(colbert) = 0.754
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For query token late:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;sim(late, late) = 0.2*0.21 + 0.9*0.88 + 0.1*0.12 = 0.846
&lt;/li&gt;
&lt;li&gt;max_sim(late) = 0.846
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For query token interaction:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;sim(interaction, interaction) = 0.4*0.43 + 0.2*0.19 + 0.7*0.71 = 0.707
&lt;/li&gt;
&lt;li&gt;max_sim(interaction) = 0.707&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Final ColBERT Score for Document A:&lt;br&gt;&lt;br&gt;
ScoreColBERT​(q,DocA)=0.754+0.846+0.707=2.307&lt;br&gt;&lt;br&gt;
This score would then be compared against the scores for other candidate documents to produce the final, precision-ranked list.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. Results and Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The empirical evaluation of the GAHR-MSR framework and the corresponding baselines yielded significant results, demonstrating a clear hierarchy of performance. The outcomes, summarized in Table 1, provide quantitative evidence supporting the architectural choices made in our framework and highlight the trade-offs between retrieval accuracy and computational latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Performance Comparison of Retrieval Frameworks on the SciFact Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;nDCG@10&lt;/th&gt;
&lt;th&gt;Recall@100&lt;/th&gt;
&lt;th&gt;Avg. Latency (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline A (Dense)&lt;/td&gt;
&lt;td&gt;0.685&lt;/td&gt;
&lt;td&gt;0.852&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Baseline B (Hybrid)&lt;/td&gt;
&lt;td&gt;0.741&lt;/td&gt;
&lt;td&gt;0.931&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Baseline C (Hybrid + ColBERT)&lt;/td&gt;
&lt;td&gt;0.812&lt;/td&gt;
&lt;td&gt;0.931&lt;/td&gt;
&lt;td&gt;245&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GAHR-MSR (Ours)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.859&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.965&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;215&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Discussion of Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The results presented in Table 1 clearly illustrate the incremental benefits of each layer of sophistication added to the retrieval pipeline, culminating in the superior performance of the GAHR-MSR framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From Dense to Hybrid Retrieval:&lt;/strong&gt; The transition from &lt;strong&gt;Baseline A (Dense)&lt;/strong&gt; to &lt;strong&gt;Baseline B (Hybrid)&lt;/strong&gt; shows a marked improvement across both primary metrics. The nDCG@10 increased from 0.685 to 0.741, while Recall@100 jumped significantly from 0.852 to 0.931. This confirms the widely held understanding that hybrid search is superior to dense-only search for recall-oriented tasks.9 The sparse vector component successfully retrieved relevant documents containing specific scientific terms or keywords that the dense semantic search might have missed, leading to a more comprehensive initial candidate set. This improvement in recall is crucial, as it directly impacts the maximum possible quality of the final result; if a relevant document is not in the initial candidate set, no amount of re-ranking can recover it. The trade-off is a near-doubling of latency (from 55 ms to 98 ms) due to the execution of two parallel searches and the RRF fusion step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Impact of Re-ranking:&lt;/strong&gt; The introduction of a ColBERT re-ranking stage in &lt;strong&gt;Baseline C (Hybrid + ColBERT)&lt;/strong&gt; provides the most substantial leap in precision. The nDCG@10 score surged to 0.812, a significant improvement over the 0.741 of the hybrid-only approach. This demonstrates the critical role of a dedicated re-ranking phase. While the hybrid search is effective at &lt;em&gt;finding&lt;/em&gt; a broad set of potentially relevant documents (as shown by the high Recall@100), the ColBERT model excels at &lt;em&gt;discerning&lt;/em&gt; the most precisely relevant documents from within that set.15 Its fine-grained, token-level late interaction mechanism successfully re-orders the candidates, promoting documents with strong, specific evidence to the top ranks. This precision comes at a considerable cost, with latency increasing to 245 ms, reflecting the computational expense of the ColBERT scoring process on all 100 candidates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Superiority of the GAHR-MSR Framework:&lt;/strong&gt; The proposed &lt;strong&gt;GAHR-MSR&lt;/strong&gt; framework achieved the highest performance on all fronts. It recorded the top nDCG@10 score of &lt;strong&gt;0.859&lt;/strong&gt;, surpassing even the powerful Hybrid + ColBERT baseline. This superior precision can be directly attributed to the novel &lt;strong&gt;Graph-Aware Chunking and Indexing&lt;/strong&gt; phase. By enriching chunks with structured entity and relationship metadata, the optional pre-filtering step in Phase 2 creates a cleaner, more relevant initial candidate set. This has two key benefits. First, it improves the initial recall, pushing it to an impressive 0.965, as the graph-based filtering helps to surface documents that are structurally connected to the query's core concepts. Second, and more importantly, it provides the ColBERT re-ranker with a higher-quality set of candidates to work with. When the initial set is less noisy, the re-ranker can more effectively distinguish between the top contenders, leading to a better final ranking.&lt;/p&gt;

&lt;p&gt;Interestingly, GAHR-MSR also exhibits a lower average latency (215 ms) compared to Baseline C (245 ms). This counter-intuitive result is also a consequence of the graph-aware pre-filtering. By drastically reducing the search space before the vector search is performed, the overall time for the initial retrieval phase is reduced. Although this is a small component of the total time, it contributes to a more efficient overall pipeline. The primary latency cost remains the ColBERT re-ranking, but our framework demonstrates that by improving the quality of the input to the re-ranker, we can achieve both higher accuracy and slightly better performance. The results validate our central thesis: a holistic approach that integrates structured knowledge at the indexing stage and employs a multi-stage refinement process at query time yields a state-of-the-art retrieval system. The computational cost is significant, but for knowledge-intensive, high-stakes applications in domains like medicine, finance, or legal research, the unparalleled accuracy justifies the investment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;6. Conclusion and Future Work&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This paper introduced the Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework, a novel architecture designed to address the persistent challenges of context fragmentation and the recall-precision trade-off in Retrieval-Augmented Generation systems. By synergizing symbolic knowledge representation with advanced sub-symbolic retrieval techniques, GAHR-MSR establishes a new benchmark for high-fidelity information retrieval.&lt;/p&gt;

&lt;p&gt;Our primary contribution is the formalization of a holistic, multi-stage pipeline that begins with a novel &lt;strong&gt;Graph-Aware Chunking&lt;/strong&gt; technique. By enriching semantic text chunks with structured metadata from a pre-computed knowledge graph, we preserve critical context that is lost in conventional, flat indexing methods. This enriched representation enables a highly effective initial retrieval phase that combines dense and sparse vector search with Reciprocal Rank Fusion, guided by graph-based pre-filtering to maximize recall while minimizing noise. The final, cascaded re-ranking stage, employing the powerful ColBERT late-interaction model, refines this candidate set to achieve state-of-the-art precision. Our empirical evaluation on the SciFact dataset demonstrates the superiority of the GAHR-MSR framework, which significantly outperformed all baselines in ranking quality (nDCG@10) while maintaining competitive performance. This work validates the architectural shift towards multi-stage, "retrieve-and-refine" pipelines and underscores the profound benefits of deeply integrating symbolic and sub-symbolic AI paradigms.&lt;/p&gt;

&lt;p&gt;Despite these promising results, several avenues for future research remain.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Graph Integration:&lt;/strong&gt; The current framework relies on a statically pre-computed knowledge graph. Future work should explore methods for dynamically updating the graph in real-time as new documents are ingested into the corpus. This would involve developing efficient, incremental graph construction algorithms and change-data-capture (CDC) mechanisms to ensure the knowledge graph remains synchronized with the document base.16
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimizing the Re-ranking Cascade:&lt;/strong&gt; The cascaded re-ranking in GAHR-MSR currently uses a fixed structure. A more advanced implementation could employ an adaptive strategy, where the depth and computational expense of the re-ranking cascade are determined dynamically based on query complexity or initial retrieval confidence scores. Simple queries might be resolved with a cheaper re-ranker, while complex, ambiguous queries would trigger the full ColBERT stage.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End-to-End Training and Optimization:&lt;/strong&gt; The components of the GAHR-MSR framework are currently trained independently. A significant research direction would be to investigate the joint, end-to-end training of the retriever and re-ranker components. Such an approach could foster greater synergy between the stages, potentially allowing the initial retriever to learn to produce candidate lists that are optimally suited for the subsequent ColBERT re-ranker, leading to further gains in both accuracy and efficiency.2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In conclusion, the GAHR-MSR framework provides a robust and powerful solution for high-fidelity chunk retrieval. By treating the retrieval process as an integrated pipeline of knowledge structuring, candidate generation, and progressive refinement, it sets a new standard for the quality of context provided to LLMs, paving the way for more accurate, reliable, and contextually aware generative AI applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;7. References&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;49 Milvus. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What Exactly is a Vector Database and How Does It Work&lt;/em&gt;. Milvus Blog.&lt;/p&gt;

&lt;p&gt;11 Milvus. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What is a Vector Database?&lt;/em&gt; Milvus Blog.&lt;/p&gt;

&lt;p&gt;16 Airbyte. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vector Databases&lt;/em&gt;. Airbyte Data Engineering Resources.&lt;/p&gt;

&lt;p&gt;17 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What is a Vector Database?&lt;/em&gt; Qdrant Blog.&lt;/p&gt;

&lt;p&gt;12 MongoDB. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vector Databases&lt;/em&gt;. MongoDB Resources.&lt;/p&gt;

&lt;p&gt;18 Xomnia. (2023).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;An Introduction to Vector Databases for Beginners&lt;/em&gt;. Xomnia Blog.&lt;/p&gt;

&lt;p&gt;22 Wriath18. (2023).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Theory Behind HNSW Algorithm in Qdrant Vector Database&lt;/em&gt;. Medium.&lt;/p&gt;

&lt;p&gt;19 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Overview&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;20 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Qdrant Vector Database&lt;/em&gt;. Qdrant.&lt;/p&gt;

&lt;p&gt;23 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why Dedicated Vector Search&lt;/em&gt;. Qdrant Blog.&lt;/p&gt;

&lt;p&gt;21 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Qdrant GitHub Repository&lt;/em&gt;. GitHub.&lt;/p&gt;

&lt;p&gt;25 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vector Search&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;6 Khan, A. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;5 RAG Chunking Strategies for Better Retrieval-Augmented Generation&lt;/em&gt;. Lettria Blog.&lt;/p&gt;

&lt;p&gt;45 IBM. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Chunking strategies for RAG with LangChain and watsonx.ai&lt;/em&gt;. IBM Think Tutorials.&lt;/p&gt;

&lt;p&gt;8 Mastering LLM. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;11 Chunking Strategies for RAG, Simplified &amp;amp; Visualized&lt;/em&gt;. Medium.&lt;/p&gt;

&lt;p&gt;46 Daily Dose of DS. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;5 Chunking Strategies for RAG&lt;/em&gt;. Daily Dose of DS.&lt;/p&gt;

&lt;p&gt;7 Databricks Community. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Ultimate Guide to Chunking Strategies for RAG Applications&lt;/em&gt;. Databricks Technical Blog.&lt;/p&gt;

&lt;p&gt;2 arXiv:2407.01219 [cs.CL]. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Best Practices in Retrieval-Augmented Generation&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;3 Ju, M., et al. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid Information Retrieval for RAG&lt;/em&gt;. ICNLSP 2024.&lt;/p&gt;

&lt;p&gt;9 Sawarkar, K., Mangal, A., &amp;amp; Solanki, S. R. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers&lt;/em&gt;. arXiv:2404.07220.&lt;/p&gt;

&lt;p&gt;28 Mackenzie, J., et al. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Adaptive Retrieval for LLM-based Reranking&lt;/em&gt;. arXiv:2501.09186v1.&lt;/p&gt;

&lt;p&gt;50 Dong, Z., et al. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Graph-based Re-ranking for Information Retrieval&lt;/em&gt;. arXiv:2503.14802v1.&lt;/p&gt;

&lt;p&gt;4 Liu, S., et al. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Towards a Robust Retrieval-Based Summarization System&lt;/em&gt;. arXiv:2403.19889v1 [cs.CL].&lt;/p&gt;

&lt;p&gt;1 Gao, Y., et al. (2023).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Retrieval-Augmented Generation for Large Language Models: A Survey&lt;/em&gt;. arXiv:2312.10997.&lt;/p&gt;

&lt;p&gt;13 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid Search with FastEmbed&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;48 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workshop: Ultimate Hybrid Search&lt;/em&gt;. GitHub.&lt;/p&gt;

&lt;p&gt;26 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid and Multi-Stage Queries&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;47 LlamaIndex. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Qdrant Hybrid Search&lt;/em&gt;. LlamaIndex Documentation.&lt;/p&gt;

&lt;p&gt;24 Jain, T. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Advanced Retrieval and Evaluation: Hybrid Search with miniCOIL using Qdrant and LangGraph&lt;/em&gt;. AI Planet on Medium.&lt;/p&gt;

&lt;p&gt;10 Reddit user Exotic-Proposal-5943. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;My journey into hybrid search: BGE-M3 &amp;amp; Qdrant&lt;/em&gt;. r/vectordatabase.&lt;/p&gt;

&lt;p&gt;14 IBM Developer. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How ColBERT works&lt;/em&gt;. IBM Articles.&lt;/p&gt;

&lt;p&gt;29 Pondhouse Data. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Advanced RAG: ColBERT Reranker&lt;/em&gt;. Pondhouse Data Blog.&lt;/p&gt;

&lt;p&gt;35 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Reranking Hybrid Search Results&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;30 Michael, A. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cross-Encoders, ColBERT, and LLM-Based Re-Rankers: A Practical Guide&lt;/em&gt;. Medium.&lt;/p&gt;

&lt;p&gt;27 Microsoft Azure AI Search. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid search ranking and Reciprocal Rank Fusion (RRF)&lt;/em&gt;. Microsoft Learn.&lt;/p&gt;

&lt;p&gt;15 Khattab, O., &amp;amp; Zaharia, M. (2020).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT&lt;/em&gt;. arXiv:2004.12832.&lt;/p&gt;

&lt;p&gt;31 Fanpu.io. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Summary of "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;32 Continuum Labs. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;33 Jiang, Z., et al. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Video-ColBERT: A Multi-level Late-Interaction Model for Efficient Text-to-Video Retrieval&lt;/em&gt;. arXiv:2503.19009v1 [cs.CV].&lt;/p&gt;

&lt;p&gt;34 YouTube. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Colbert: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;38 Edge, D., et al. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;From Local to Global: A Graph RAG Approach to Query-Focused Summarization&lt;/em&gt;. arXiv:2404.16130v2 [cs.CL].&lt;/p&gt;

&lt;p&gt;36 Han, S., et al. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;RAG vs. GraphRAG: A Comprehensive Evaluation on Text-based Tasks&lt;/em&gt;. arXiv:2502.11371v1 [cs.CL].&lt;/p&gt;

&lt;p&gt;44 Pan, Z., et al. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A Survey and Experimental Study of Graph-based Retrieval-Augmented Generation&lt;/em&gt;. arXiv:2503.04338.&lt;/p&gt;

&lt;p&gt;38 Edge, D., et al. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GraphRAG: A Graph-based Approach to Query-Focused Summarization&lt;/em&gt;. arXiv:2404.16130v2 [cs.CL].&lt;/p&gt;

&lt;p&gt;39 Microsoft. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GraphRAG Documentation&lt;/em&gt;. Microsoft GitHub Pages.&lt;/p&gt;

&lt;p&gt;43 Bernhardsen, V. V. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;From Local to Global: A Graph RAG Approach to Query- Focused Summarization&lt;/em&gt;. NTNU Presentation.&lt;/p&gt;

&lt;p&gt;5 Ontotext. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What Is Graph RAG?&lt;/em&gt; Ontotext Knowledge Hub.&lt;/p&gt;

&lt;p&gt;40 Learn OpenCV. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GraphRAG Explained: Using Knowledge Graphs in Medical RAG&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;37 Reddit user. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How GraphRAG helps AI tools understand documents better than traditional methods&lt;/em&gt;. r/MLQuestions.&lt;/p&gt;

&lt;p&gt;42 LangChain Blog. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Enhancing RAG-based applications' accuracy by constructing and leveraging knowledge graphs&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;2 arXiv:2407.01219 [cs.CL]. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Best Practices in Retrieval-Augmented Generation&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;2 arXiv:2407.01219 [cs.CL]. (2024).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Best Practices in Retrieval-Augmented Generation&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;26 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid and Multi-Stage Queries&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;27 Microsoft Azure AI Search. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid search ranking and Reciprocal Rank Fusion (RRF)&lt;/em&gt;. Microsoft Learn.&lt;/p&gt;

&lt;p&gt;26 Qdrant. (n.d.).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hybrid and Multi-Stage Queries&lt;/em&gt;. Qdrant Documentation.&lt;/p&gt;

&lt;p&gt;41 Microsoft. (2025).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GraphRAG GitHub Repository&lt;/em&gt;. GitHub.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;References cited&lt;/strong&gt;
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Retrieval-Augmented Generation for Large Language Models: A Survey - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/pdf/2312.10997" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2312.10997&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Searching for Best Practices in Retrieval-Augmented Generation, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/abs/2407.01219" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2407.01219&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A Hybrid Retrieval Approach for Advancing Retrieval-Augmented Generation Systems - ACL Anthology, acessado em setembro 18, 2025, &lt;a href="https://aclanthology.org/2024.icnlsp-1.41.pdf" rel="noopener noreferrer"&gt;https://aclanthology.org/2024.icnlsp-1.41.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Towards a Robust Retrieval-Based Summarization System - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/html/2403.19889v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2403.19889v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What is Graph RAG | Ontotext Fundamentals, acessado em setembro 18, 2025, &lt;a href="https://www.ontotext.com/knowledgehub/fundamentals/what-is-graph-rag/" rel="noopener noreferrer"&gt;https://www.ontotext.com/knowledgehub/fundamentals/what-is-graph-rag/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;5 RAG Chunking Strategies for Better Retrieval-Augmented Generation - Lettria, acessado em setembro 18, 2025, &lt;a href="https://www.lettria.com/blogpost/5-rag-chunking-strategies-for-better-retrieval-augmented-generation" rel="noopener noreferrer"&gt;https://www.lettria.com/blogpost/5-rag-chunking-strategies-for-better-retrieval-augmented-generation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Mastering Chunking Strategies for RAG: Best Practices &amp;amp; Code Examples - Databricks Community, acessado em setembro 18, 2025, &lt;a href="https://community.databricks.com/t5/technical-blog/the-ultimate-guide-to-chunking-strategies-for-rag-applications/ba-p/113089" rel="noopener noreferrer"&gt;https://community.databricks.com/t5/technical-blog/the-ultimate-guide-to-chunking-strategies-for-rag-applications/ba-p/113089&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;11 Chunking Strategies for RAG — Simplified &amp;amp; Visualized | by Mastering LLM (Large Language Model), acessado em setembro 18, 2025, &lt;a href="https://masteringllm.medium.com/11-chunking-strategies-for-rag-simplified-visualized-df0dbec8e373" rel="noopener noreferrer"&gt;https://masteringllm.medium.com/11-chunking-strategies-for-rag-simplified-visualized-df0dbec8e373&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[2404.07220] Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/abs/2404.07220" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2404.07220&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;My Journey into Hybrid Search. BGE-M3 &amp;amp; Qdrant : r/vectordatabase - Reddit, acessado em setembro 18, 2025, &lt;a href="https://www.reddit.com/r/vectordatabase/comments/1jo9jtx/my_journey_into_hybrid_search_bgem3_qdrant/" rel="noopener noreferrer"&gt;https://www.reddit.com/r/vectordatabase/comments/1jo9jtx/my_journey_into_hybrid_search_bgem3_qdrant/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What Exactly is a Vector Database and How Does It Work - Milvus Blog, acessado em setembro 18, 2025, &lt;a href="https://milvus.io/blog/what-is-a-vector-database.md" rel="noopener noreferrer"&gt;https://milvus.io/blog/what-is-a-vector-database.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What Are Vector Databases? | MongoDB, acessado em setembro 18, 2025, &lt;a href="https://www.mongodb.com/resources/basics/databases/vector-databases" rel="noopener noreferrer"&gt;https://www.mongodb.com/resources/basics/databases/vector-databases&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Setup Hybrid Search with FastEmbed - Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/" rel="noopener noreferrer"&gt;https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;How the ColBERT re-ranker model in a RAG system works - IBM ..., acessado em setembro 18, 2025, &lt;a href="https://developer.ibm.com/articles/how-colbert-works/" rel="noopener noreferrer"&gt;https://developer.ibm.com/articles/how-colbert-works/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | Request PDF - ResearchGate, acessado em setembro 18, 2025, &lt;a href="https://www.researchgate.net/publication/340963120_ColBERT_Efficient_and_Effective_Passage_Search_via_Contextualized_Late_Interaction_over_BERT" rel="noopener noreferrer"&gt;https://www.researchgate.net/publication/340963120_ColBERT_Efficient_and_Effective_Passage_Search_via_Contextualized_Late_Interaction_over_BERT&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Vector Databases Explained: The Backbone of Modern Semantic Search Engines - Airbyte, acessado em setembro 18, 2025, &lt;a href="https://airbyte.com/data-engineering-resources/vector-databases" rel="noopener noreferrer"&gt;https://airbyte.com/data-engineering-resources/vector-databases&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;An Introduction to Vector Databases - Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/articles/what-is-a-vector-database/" rel="noopener noreferrer"&gt;https://qdrant.tech/articles/what-is-a-vector-database/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;An Introduction to Vector Databases for Beginners - Xomnia, acessado em setembro 18, 2025, &lt;a href="https://xomnia.com/post/an-introduction-to-vector-databases-for-beginners/" rel="noopener noreferrer"&gt;https://xomnia.com/post/an-introduction-to-vector-databases-for-beginners/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;What is Qdrant? - Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/documentation/overview/" rel="noopener noreferrer"&gt;https://qdrant.tech/documentation/overview/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Qdrant Vector Database, High-Performance Vector Search Engine, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/qdrant-vector-database/" rel="noopener noreferrer"&gt;https://qdrant.tech/qdrant-vector-database/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;qdrant/qdrant: Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud &lt;a href="https://cloud.qdrant.io" rel="noopener noreferrer"&gt;https://cloud.qdrant.io&lt;/a&gt; - GitHub, acessado em setembro 18, 2025, &lt;a href="https://github.com/qdrant/qdrant" rel="noopener noreferrer"&gt;https://github.com/qdrant/qdrant&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The theory behind HNSW algorithm in Qdrant vector database | by Sanidhya Goel - Medium, acessado em setembro 18, 2025, &lt;a href="https://medium.com/@wriath18/the-theory-behind-hnsw-algorithm-in-qdrant-vector-database-f274df648e0e" rel="noopener noreferrer"&gt;https://medium.com/@wriath18/the-theory-behind-hnsw-algorithm-in-qdrant-vector-database-f274df648e0e&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Built for Vector Search - Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/articles/dedicated-vector-search/" rel="noopener noreferrer"&gt;https://qdrant.tech/articles/dedicated-vector-search/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Advanced Hybrid RAG with Qdrant miniCOIL, LangGraph, and SambaNova DeepSeek-R1 | by Tarun Jain | AI Planet, acessado em setembro 18, 2025, &lt;a href="https://medium.aiplanet.com/advanced-retrieval-and-evaluation-hybrid-search-with-minicoil-using-qdrant-and-langgraph-6fbe5e514078" rel="noopener noreferrer"&gt;https://medium.aiplanet.com/advanced-retrieval-and-evaluation-hybrid-search-with-minicoil-using-qdrant-and-langgraph-6fbe5e514078&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Understanding Vector Search in Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/documentation/overview/vector-search/" rel="noopener noreferrer"&gt;https://qdrant.tech/documentation/overview/vector-search/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Hybrid Queries - Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/documentation/concepts/hybrid-queries/" rel="noopener noreferrer"&gt;https://qdrant.tech/documentation/concepts/hybrid-queries/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Hybrid search scoring (RRF) - Azure AI Search | Microsoft Learn, acessado em setembro 18, 2025, &lt;a href="https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking" rel="noopener noreferrer"&gt;https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Guiding Retrieval using LLM-based Listwise Rankers - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/html/2501.09186v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2501.09186v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Advanced RAG: Increase RAG Quality with ColBERT Reranker and llamaindex, acessado em setembro 18, 2025, &lt;a href="https://www.pondhouse-data.com/blog/advanced-rag-colbert-reranker" rel="noopener noreferrer"&gt;https://www.pondhouse-data.com/blog/advanced-rag-colbert-reranker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cross-Encoders, ColBERT, and LLM-Based Re-Rankers: A Practical Guide - Medium, acessado em setembro 18, 2025, &lt;a href="https://medium.com/@aimichael/cross-encoders-colbert-and-llm-based-re-rankers-a-practical-guide-a23570d88548" rel="noopener noreferrer"&gt;https://medium.com/@aimichael/cross-encoders-colbert-and-llm-based-re-rankers-a-practical-guide-a23570d88548&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | Fan Pu Zeng, acessado em setembro 18, 2025, &lt;a href="https://fanpu.io/summaries/2024-02-22-colbert-efficient-and-effective-passage-search-via-contextualized-late-interaction-over-bert/" rel="noopener noreferrer"&gt;https://fanpu.io/summaries/2024-02-22-colbert-efficient-and-effective-passage-search-via-contextualized-late-interaction-over-bert/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | Continuum Labs, acessado em setembro 18, 2025, &lt;a href="https://training.continuumlabs.ai/knowledge/vector-databases/colbert-efficient-and-effective-passage-search-via-contextualized-late-interaction-over-bert" rel="noopener noreferrer"&gt;https://training.continuumlabs.ai/knowledge/vector-databases/colbert-efficient-and-effective-passage-search-via-contextualized-late-interaction-over-bert&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/html/2503.19009v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2503.19009v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ep 20. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT - YouTube, acessado em setembro 18, 2025, &lt;a href="https://www.youtube.com/watch?v=n7ceMYV_69o" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=n7ceMYV_69o&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Reranking in Hybrid Search - Qdrant, acessado em setembro 18, 2025, &lt;a href="https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/" rel="noopener noreferrer"&gt;https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;RAG vs. GraphRAG: A Systematic Evaluation and Key Insights - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/html/2502.11371v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2502.11371v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;How GraphRAG Helps AI Tools Understand Documents Better And Why It Matters - Reddit, acessado em setembro 18, 2025, &lt;a href="https://www.reddit.com/r/MLQuestions/comments/1jrij3s/how_graphrag_helps_ai_tools_understand_documents/" rel="noopener noreferrer"&gt;https://www.reddit.com/r/MLQuestions/comments/1jrij3s/how_graphrag_helps_ai_tools_understand_documents/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;From Local to Global: A GraphRAG Approach to Query-Focused Summarization - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/html/2404.16130v2" rel="noopener noreferrer"&gt;https://arxiv.org/html/2404.16130v2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Welcome - GraphRAG, acessado em setembro 18, 2025, &lt;a href="https://microsoft.github.io/graphrag/" rel="noopener noreferrer"&gt;https://microsoft.github.io/graphrag/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GraphRAG: The Practical Guide for Cost-Effective Document Analysis with Knowledge Graphs - LearnOpenCV, acessado em setembro 18, 2025, &lt;a href="https://learnopencv.com/graphrag-explained-knowledge-graphs-medical/" rel="noopener noreferrer"&gt;https://learnopencv.com/graphrag-explained-knowledge-graphs-medical/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;microsoft/graphrag: A modular graph-based Retrieval ... - GitHub, acessado em setembro 18, 2025, &lt;a href="https://github.com/microsoft/graphrag" rel="noopener noreferrer"&gt;https://github.com/microsoft/graphrag&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs - LangChain Blog, acessado em setembro 18, 2025, &lt;a href="https://blog.langchain.com/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/" rel="noopener noreferrer"&gt;https://blog.langchain.com/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;From Local to Global: A Graph RAG Approach to Query- Focused Summarization, acessado em setembro 18, 2025, &lt;a href="https://www.idi.ntnu.no/emner/tdt02/rag.pdf" rel="noopener noreferrer"&gt;https://www.idi.ntnu.no/emner/tdt02/rag.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;In-depth Analysis of Graph-based RAG in a Unified Framework - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/pdf/2503.04338" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2503.04338&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Chunking strategies for RAG tutorial using Granite - IBM, acessado em setembro 18, 2025, &lt;a href="https://www.ibm.com/think/tutorials/chunking-strategies-for-rag-with-langchain-watsonx-ai" rel="noopener noreferrer"&gt;https://www.ibm.com/think/tutorials/chunking-strategies-for-rag-with-langchain-watsonx-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;5 Chunking Strategies For RAG - Daily Dose of Data Science, acessado em setembro 18, 2025, &lt;a href="https://www.dailydoseofds.com/p/5-chunking-strategies-for-rag/" rel="noopener noreferrer"&gt;https://www.dailydoseofds.com/p/5-chunking-strategies-for-rag/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Qdrant Hybrid Search - LlamaIndex Python Documentation, acessado em setembro 18, 2025, &lt;a href="https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/" rel="noopener noreferrer"&gt;https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;qdrant/workshop-ultimate-hybrid-search - GitHub, acessado em setembro 18, 2025, &lt;a href="https://github.com/qdrant/workshop-ultimate-hybrid-search" rel="noopener noreferrer"&gt;https://github.com/qdrant/workshop-ultimate-hybrid-search&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;milvus.io, acessado em setembro 18, 2025, &lt;a href="https://milvus.io/blog/what-is-a-vector-database.md#:~:text=Modern%20vector%20databases%20implement%20a,of%20handling%20production%20AI%20workloads." rel="noopener noreferrer"&gt;https://milvus.io/blog/what-is-a-vector-database.md#:~:text=Modern%20vector%20databases%20implement%20a,of%20handling%20production%20AI%20workloads.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Graph-Based Re-ranking: Emerging Techniques, Limitations, and Opportunities - arXiv, acessado em setembro 18, 2025, &lt;a href="https://arxiv.org/html/2503.14802v1" rel="noopener noreferrer"&gt;https://arxiv.org/html/2503.14802v1&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>vectordatabase</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From Ordered Lists to Semantic Spaces: A Scientific Exploration of Search Algorithms in High-Dimensional Vector Databases</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Thu, 04 Sep 2025 11:34:44 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/from-ordered-lists-to-semantic-spaces-a-scientific-exploration-of-search-algorithms-in-3po4</link>
      <guid>https://forem.com/lucash_ribeiro_dev/from-ordered-lists-to-semantic-spaces-a-scientific-exploration-of-search-algorithms-in-3po4</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;This article provides a comprehensive scientific analysis of search algorithms applied to high-dimensional vector databases. It begins by establishing the theoretical limitations of traditional exact matching algorithms, such as binary search, attributing their failure to the "Curse of Dimensionality." It then focuses on the predominant solution: Approximate Nearest Neighbor (ANN) search, dissecting the fundamental principles of major algorithms, including the clustering-based Inverted File (IVF) and the graph-based Hierarchical Navigable Small World (HNSW). The practical implementation of these algorithms is demonstrated in two prominent vector databases — the open-source Qdrant and the managed service Pinecone — with validated Python code examples. The study culminates in an empirical performance analysis using established benchmarks to evaluate the critical trade-off between search accuracy (Recall) and throughput (Queries per Second). This work serves as a definitive guide for professionals selecting and implementing vector search technologies for modern AI applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Introduction: The New Frontier of Data Retrieval
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Paradigm Shift
&lt;/h3&gt;

&lt;p&gt;The landscape of data retrieval is undergoing a fundamental transformation. Traditional databases, expertly designed to manage structured data in rows and columns, operate under the paradigm of exact matching. In such systems, search depends on predefined keywords, tags, or metadata to return results. However, the contemporary data ecosystem is overwhelmingly dominated by unstructured data — texts, images, audio clips, videos — which are estimated to grow at a rate of 30% to 60% per year. This proliferation of complex, schema-less data demands a paradigm shift: from keyword-based retrieval to semantic search, which understands the context and intent behind a query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector Embeddings as the Lingua Franca
&lt;/h3&gt;

&lt;p&gt;At the core of this transformation are vector embeddings — a technology that acts as the lingua franca for translating unstructured data into a format that machines can process and understand. Generated by machine learning (ML) models, these embeddings are high-dimensional numerical arrays that capture the semantic meaning of data. The fundamental principle is that semantically similar concepts are positioned closer to each other in a multidimensional vector space. The versatility of this approach is evident in the availability of specialized embeddings for a wide variety of data types, including words, sentences, entire documents, images, audio, and even products or user profiles.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Rise of Vector Databases
&lt;/h3&gt;

&lt;p&gt;With data now represented as vectors, specialized infrastructure becomes necessary. Vector databases are systems designed specifically to efficiently store, index, and query these high-dimensional vectors. Their primary function is to perform fast similarity searches, a capability that underpins countless modern AI applications. Notable examples include recommendation engines, such as Pinterest suggesting visually similar images; semantic search engines that find conceptually related documents; and Retrieval-Augmented Generation (RAG), a technique that enriches language models with external knowledge. The rise of vector databases thus represents not an incremental improvement, but a fundamental architectural shift driven by the nature of modern data and the requirements of AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Statement of Purpose
&lt;/h3&gt;

&lt;p&gt;This article aims to provide a rigorous, first-principles-based examination of the search algorithms powering these databases. We will deconstruct why classical algorithms fail, analyze the theory and practice of their modern replacements (ANN), implement these solutions in leading platforms (Qdrant, Pinecone), and validate their performance through established benchmarks.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Theoretical Foundations: Vector Spaces and the Limits of Traditional Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 From Data to Vectors
&lt;/h3&gt;

&lt;p&gt;The process of converting raw data into vector representations, known as vectorization, is the first step toward semantic search. An embedding model — such as Word2Vec or BERT for text, or a Convolutional Neural Network (CNN) for images — transforms input data into a dense vector in a high-dimensional space. Each dimension in this space corresponds to a "latent feature," an abstract attribute inferred by the model from training data. These latent features capture hidden patterns and relationships, enabling more meaningful representations. The core principle governing this space is that the geometric distance between vectors correlates with their semantic similarity.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 The Curse of Dimensionality
&lt;/h3&gt;

&lt;p&gt;The application of traditional search algorithms in high-dimensional vector spaces is hindered by a statistical and geometric phenomenon known as the "Curse of Dimensionality," coined by Richard Bellman. This concept describes several problems that arise as the number of dimensions increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exponential Growth of Volume&lt;/strong&gt;: As the number of dimensions grows, the volume of the space expands exponentially. A fixed number of data points becomes increasingly sparse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distance Concentration&lt;/strong&gt;: Distances between most pairs of points become nearly indistinguishable, undermining the ability to identify a "nearest neighbor."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empty Space Phenomenon&lt;/strong&gt;: Most of the volume in high-dimensional hypercubes and hyperspheres lies at the edges or near the surface, reinforcing sparsity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.3 Why Binary Search (and Its Analogs) Fail
&lt;/h3&gt;

&lt;p&gt;Binary search depends on a total order along a single dimension — a property absent in high-dimensional vectors. Even multidimensional extensions, such as k-d trees, degrade exponentially in performance due to the curse of dimensionality. Thus, classical exact search methods are conceptually incompatible with high-dimensional spaces.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The Solution: Approximate Nearest Neighbor (ANN) Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Redefining "Correctness": The Precision–Performance Trade-off
&lt;/h3&gt;

&lt;p&gt;ANN introduces a fundamental trade-off: sacrificing exactness in exchange for dramatically improved speed. The balance is measured by two key metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Recall&lt;/strong&gt;: The fraction of true nearest neighbors retrieved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queries per Second (QPS) / Latency&lt;/strong&gt;: Measures throughput or per-query response time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to operate on the Pareto frontier — maximizing recall for a given QPS budget or vice versa.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Clustering-Based Approach: Inverted File (IVF)
&lt;/h3&gt;

&lt;p&gt;IVF partitions the vector space into clusters (via &lt;em&gt;k&lt;/em&gt;-means) and restricts search to the &lt;em&gt;nprobe&lt;/em&gt; closest clusters, reducing search space. Increasing &lt;em&gt;nprobe&lt;/em&gt; improves recall but reduces QPS.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Graph-Based Approach: Hierarchical Navigable Small World (HNSW)
&lt;/h3&gt;

&lt;p&gt;HNSW constructs a multi-layer graph enabling efficient greedy traversal from sparse top layers to dense lower layers. It achieves high recall and efficiency but with greater memory usage and construction cost. Emerging research suggests simpler "flat" small-world graphs may suffice in high dimensions due to hubness phenomena.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Architectures in Practice: Qdrant and Pinecone
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Qdrant: Flexible and Open Source
&lt;/h3&gt;

&lt;p&gt;Qdrant offers advanced payload filtering, customizable quantization, and versatile deployment (local, on-premises, hybrid, or managed). It is designed for flexibility and granular control.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Pinecone: Managed and Serverless
&lt;/h3&gt;

&lt;p&gt;Pinecone delivers a fully managed, serverless vector database with automatic scaling, separation of read/write paths, and cloud-native resilience. Its focus is on developer simplicity and operational abstraction.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Empirical Validation: Implementation and Benchmarking
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Dataset and Setup
&lt;/h3&gt;

&lt;p&gt;Benchmarks typically use datasets such as &lt;strong&gt;glove-100-angular&lt;/strong&gt;, consisting of 1.2M vectors (100D word embeddings).&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Example: Qdrant (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qdrant_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QdrantClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Initialize the Qdrant client (in this case, in-memory for simplicity)
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QdrantClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Create a collection with a specific vector configuration
# Size (dimension) = 100, Distance metric = Cosine
&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_collection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vectors_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VectorParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Distance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Insert multiple points (vectors + payload)
# Generate 100 random vectors of 100 dimensions
&lt;/span&gt;&lt;span class="n"&gt;num_vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;vector_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_vectors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector_dim&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create payloads with a field "rand_number"
&lt;/span&gt;&lt;span class="n"&gt;payloads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rand_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_vectors&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;# Build points list with IDs, vectors, and payloads
&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PointStruct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payloads&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_vectors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Wait for the operation to complete
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Perform a similarity search with a query vector
# Generate a random query vector
&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_dim&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;  &lt;span class="c1"&gt;# Return the 5 closest points
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Simple similarity search:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Perform a filtered search by a field in the payload
# Search the 5 nearest neighbors where 'rand_number' &amp;gt;= 5
&lt;/span&gt;&lt;span class="n"&gt;filtered_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;must&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FieldCondition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rand_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gte&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Filtered search (rand_number &amp;gt;= 5):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filtered_hits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Payload: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.3 Example: Pinecone (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServerlessSpec&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Initialize the Pinecone client
# Make sure the environment variable PINECONE_API_KEY is set
&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_API_KEY not found in environment variables.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Create an index with a specific configuration
&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;vector_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

&lt;span class="c1"&gt;# Delete the index if it already exists for a fresh start
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_indexes&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;names&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vector_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Distance metric
&lt;/span&gt;    &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ServerlessSpec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cloud&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Instantiate an Index client
&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Insert multiple vectors with metadata
&lt;/span&gt;&lt;span class="n"&gt;num_vectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;vectors_to_upsert&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_vectors&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_dim&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comedy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;vectors_to_upsert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectors_to_upsert&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Wait until the index count reflects the inserted vectors
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe_index_stats&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_vector_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;num_vectors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Perform a similarity search with a query vector
&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_dim&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Simple similarity search:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 6. Perform a filtered search by metadata
&lt;/span&gt;&lt;span class="n"&gt;filtered_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$eq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comedy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
    &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Filtered search (genre == &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;comedy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filtered_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;matches&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Metadata: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Cleanup: delete the index
&lt;/span&gt;&lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;Key Differences&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt;: &lt;em&gt;Full control, supports JSON payload filtering, multiple deployment models.&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;Pinecone&lt;/strong&gt;: &lt;em&gt;Managed service, automatic scaling, metadata filtering with simple JSON conditions.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  6. Conclusion and Future Directions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Findings&lt;/strong&gt;: ANN search balances recall and performance. IVF and HNSW dominate, each with trade-offs. Qdrant offers flexibility, while Pinecone emphasizes managed simplicity.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Practitioner Guidance&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose IVF for memory efficiency and fast index rebuilds.&lt;/li&gt;
&lt;li&gt;Choose HNSW for high recall and dynamic datasets.&lt;/li&gt;
&lt;li&gt;Choose Qdrant for flexibility and advanced filtering.&lt;/li&gt;
&lt;li&gt;Choose Pinecone for managed, serverless scaling.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Future Work&lt;/strong&gt;: Advances in quantization, disk-based ANN (e.g., DiskANN, ScaNN), and simplified graph structures may further scale vector search.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  7. References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;What Are Vector Databases? Definition And Uses | Databricks,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.databricks.com/glossary/vector-database" rel="noopener noreferrer"&gt;[https://www.databricks.com/glossary/vector-database]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What Is A Vector Database? - IBM, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.ibm.com/think/topics/vector-database" rel="noopener noreferrer"&gt;[https://www.ibm.com/think/topics/vector-database]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What is a Vector Database? - Elastic, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.elastic.co/what-is/vector-database" rel="noopener noreferrer"&gt;[https://www.elastic.co/what-is/vector-database]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A Beginner\'s Guide to Vector Embeddings | TigerData, acessado em&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.tigerdata.com/blog/a-beginners-guide-to-vector-embeddings" rel="noopener noreferrer"&gt;[https://www.tigerdata.com/blog/a-beginners-guide-to-vector-embeddings]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What is Vector Embedding? | IBM, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.ibm.com/think/topics/vector-embedding" rel="noopener noreferrer"&gt;[https://www.ibm.com/think/topics/vector-embedding]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What are Vector Embeddings? | A Comprehensive Vector Embeddings&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Guide - Elastic, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.elastic.co/what-is/vector-embedding" rel="noopener noreferrer"&gt;[https://www.elastic.co/what-is/vector-embedding]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;learn.microsoft.com, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/data-engineering/playbook/solutions/vector-database/#:~:text=Vector%20databases%20can%20efficiently%20store,learning%20and%20natural%20language%20processing." rel="noopener noreferrer"&gt;[https://learn.microsoft.com/en-us/data-engineering/playbook/solutions/vector-database/#:~:text=Vector%20databases%20can%20efficiently%20store,learning%20and%20natural%20language%20processing.]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Vector Database: 13 Use Cases---from Traditional to Next-Gen -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;NetApp Instaclustr, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.instaclustr.com/education/vector-database/vector-database-13-use-cases-from-traditional-to-next-gen/" rel="noopener noreferrer"&gt;[https://www.instaclustr.com/education/vector-database/vector-database-13-use-cases-from-traditional-to-next-gen/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Top 10 Vector Database Use Cases - Research AIMultiple, acessado em&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;setembro 4, 2025,&lt;br&gt;
&lt;a href="https://research.aimultiple.com/vector-database-use-cases/" rel="noopener noreferrer"&gt;[https://research.aimultiple.com/vector-database-use-cases/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Curse of dimensionality - Wikipedia, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Curse_of_dimensionality" rel="noopener noreferrer"&gt;[https://en.wikipedia.org/wiki/Curse_of_dimensionality]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Generalizing Binary Search To Higher Dimensions - The blog at the&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;bottom of the sea, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://blog.demofox.org/2023/01/04/generalizing-binary-search-to-higher-dimensions/" rel="noopener noreferrer"&gt;[https://blog.demofox.org/2023/01/04/generalizing-binary-search-to-higher-dimensions/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Understanding HNSW --- Hierarchical Navigable Small World | by&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keyur Ramoliya - Medium, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://medium.com/thedeephub/understading-hnsw-hierarchical-navigable-small-world-ff1a72d98605" rel="noopener noreferrer"&gt;[https://medium.com/thedeephub/understading-hnsw-hierarchical-navigable-small-world-ff1a72d98605]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What is approximate nearest neighbor (ANN) search in IR? - Milvus,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://milvus.io/ai-quick-reference/what-is-approximate-nearest-neighbor-ann-search-in-ir" rel="noopener noreferrer"&gt;[https://milvus.io/ai-quick-reference/what-is-approximate-nearest-neighbor-ann-search-in-ir]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Understanding the approximate nearest neighbor (ANN) algorithm |&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Elastic Blog, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.elastic.co/blog/understanding-ann" rel="noopener noreferrer"&gt;[https://www.elastic.co/blog/understanding-ann]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ANN-Benchmarks, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://ann-benchmarks.com/" rel="noopener noreferrer"&gt;[https://ann-benchmarks.com/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What tools help benchmark vector search performance? - Milvus,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://milvus.io/ai-quick-reference/what-tools-help-benchmark-vector-search-performance" rel="noopener noreferrer"&gt;[https://milvus.io/ai-quick-reference/what-tools-help-benchmark-vector-search-performance]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In practical benchmark reports, how are recall and QPS (queries per&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;second) reported together to give a full picture of a vector&lt;br&gt;
database\'s performance? - Milvus, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://milvus.io/ai-quick-reference/in-practical-benchmark-reports-how-are-recall-and-qps-queries-per-second-reported-together-to-give-a-full-picture-of-a-vector-databases-performance" rel="noopener noreferrer"&gt;[https://milvus.io/ai-quick-reference/in-practical-benchmark-reports-how-are-recall-and-qps-queries-per-second-reported-together-to-give-a-full-picture-of-a-vector-databases-performance]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A Data Scientist\'s Guide to Picking an Optimal Approximate&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Nearest-Neighbor Algorithm | by Braden Riggs | GSI Technology |&lt;br&gt;
Medium, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://medium.com/gsi-technology/a-data-scientists-guide-to-picking-an-optimal-approximate-nearest-neighbor-algorithm-6f91d3055115" rel="noopener noreferrer"&gt;[https://medium.com/gsi-technology/a-data-scientists-guide-to-picking-an-optimal-approximate-nearest-neighbor-algorithm-6f91d3055115]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Nearest Neighbor Indexes: What Are IVFFlat Indexes in ... -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TigerData, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.tigerdata.com/blog/nearest-neighbor-indexes-what-are-ivfflat-indexes-in-pgvector-and-how-do-they-work" rel="noopener noreferrer"&gt;[https://www.tigerdata.com/blog/nearest-neighbor-indexes-what-are-ivfflat-indexes-in-pgvector-and-how-do-they-work]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Approximate Nearest Neighbor (ANN) Search Explained: IVF vs HNSW vs&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PQ | TiDB, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.pingcap.com/article/approximate-nearest-neighbor-ann-search-explained-ivf-vs-hnsw-vs-pq/" rel="noopener noreferrer"&gt;[https://www.pingcap.com/article/approximate-nearest-neighbor-ann-search-explained-ivf-vs-hnsw-vs-pq/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Understanding Hierarchical Navigable Small Worlds (HNSW) - Zilliz&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Learn, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://zilliz.com/learn/hierarchical-navigable-small-worlds-HNSW" rel="noopener noreferrer"&gt;[https://zilliz.com/learn/hierarchical-navigable-small-worlds-HNSW]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Efficient and robust approximate nearest neighbor search using ...,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://arxiv.org/abs/1603.09320" rel="noopener noreferrer"&gt;[https://arxiv.org/abs/1603.09320]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hierarchical Navigable Small Worlds (HNSW) - Pinecone, acessado em&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.pinecone.io/learn/series/faiss/hnsw/" rel="noopener noreferrer"&gt;[https://www.pinecone.io/learn/series/faiss/hnsw/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Down with the Hierarchy: The \'H\' in HNSW Stands for "Hubs" -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;arXiv, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://arxiv.org/html/2412.01940v3" rel="noopener noreferrer"&gt;[https://arxiv.org/html/2412.01940v3]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Down with the Hierarchy: The \'H\' in HNSW Stands for "Hubs" -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;arXiv, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://arxiv.org/html/2412.01940v2" rel="noopener noreferrer"&gt;[https://arxiv.org/html/2412.01940v2]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Fundamentals of Qdrant: Understanding the 6 Core Concepts -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Airbyte, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://airbyte.com/blog/fundamentals-of-qdrant" rel="noopener noreferrer"&gt;[https://airbyte.com/blog/fundamentals-of-qdrant]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What is Qdrant? - Qdrant, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://qdrant.tech/documentation/overview/" rel="noopener noreferrer"&gt;[https://qdrant.tech/documentation/overview/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qdrant Vs Pinecone - Which Vector Database Fits Your AI Needs ...,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://airbyte.com/data-engineering-resources/qdrant-vs-pinecone" rel="noopener noreferrer"&gt;[https://airbyte.com/data-engineering-resources/qdrant-vs-pinecone]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Distributed Deployment - Qdrant, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://qdrant.tech/documentation/guides/distributed_deployment/" rel="noopener noreferrer"&gt;[https://qdrant.tech/documentation/guides/distributed_deployment/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qdrant Hybrid Cloud: the First Managed Vector Database You Can Run&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Anywhere, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://qdrant.tech/blog/hybrid-cloud/" rel="noopener noreferrer"&gt;[https://qdrant.tech/blog/hybrid-cloud/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qdrant vs Pinecone: Vector Databases for AI Apps, acessado em&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;setembro 4, 2025,&lt;br&gt;
&lt;a href="https://qdrant.tech/blog/comparing-qdrant-vs-pinecone-vector-databases/" rel="noopener noreferrer"&gt;[https://qdrant.tech/blog/comparing-qdrant-vs-pinecone-vector-databases/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qdrant vs Pinecone: Picking the Right Vector Database - Scout,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.scoutos.com/blog/qdrant-vs-pinecone-picking-the-right-vector-database" rel="noopener noreferrer"&gt;[https://www.scoutos.com/blog/qdrant-vs-pinecone-picking-the-right-vector-database]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Everything you need to know about Pinecone -- A Vector Database -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Packt, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.packtpub.com/en-us/learning/how-to-tutorials/everything-you-need-to-know-about-pinecone-a-vector-database" rel="noopener noreferrer"&gt;[https://www.packtpub.com/en-us/learning/how-to-tutorials/everything-you-need-to-know-about-pinecone-a-vector-database]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Architecture - Pinecone Docs, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://docs.pinecone.io/reference/architecture/serverless-architecture" rel="noopener noreferrer"&gt;[https://docs.pinecone.io/reference/architecture/serverless-architecture]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pinecone Revamps Vector Database Architecture for AI Apps - The New&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stack, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://thenewstack.io/pinecone-revamps-vector-database-architecture-for-ai-apps/" rel="noopener noreferrer"&gt;[https://thenewstack.io/pinecone-revamps-vector-database-architecture-for-ai-apps/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pinecone AI: The Future of Search or Just Another Tech Hype? -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Trantor, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.trantorinc.com/blog/pinecone-ai-guide" rel="noopener noreferrer"&gt;[https://www.trantorinc.com/blog/pinecone-ai-guide]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;erikbern/ann-benchmarks: Benchmarks of approximate nearest neighbor&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;libraries in Python - GitHub, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://github.com/erikbern/ann-benchmarks" rel="noopener noreferrer"&gt;[https://github.com/erikbern/ann-benchmarks]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;glove-100-angular (k = 10) - ANN-Benchmarks, acessado em setembro 4,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;2025,&lt;br&gt;
&lt;a href="https://ann-benchmarks.com/glove-100-angular_10_angular.html" rel="noopener noreferrer"&gt;[https://ann-benchmarks.com/glove-100-angular_10_angular.html]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;glove100_angular | TensorFlow Datasets, acessado em setembro 4,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;2025,&lt;br&gt;
&lt;a href="https://www.tensorflow.org/datasets/catalog/glove100_angular" rel="noopener noreferrer"&gt;[https://www.tensorflow.org/datasets/catalog/glove100_angular]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qdrant Python Client Documentation --- Qdrant Client documentation,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://python-client.qdrant.tech/" rel="noopener noreferrer"&gt;[https://python-client.qdrant.tech/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Qdrant - ️ LangChain, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://python.langchain.com/docs/integrations/vectorstores/qdrant/" rel="noopener noreferrer"&gt;[https://python.langchain.com/docs/integrations/vectorstores/qdrant/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Build Your First Semantic Search Engine in 5 Minutes - Qdrant,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://qdrant.tech/documentation/beginner-tutorials/search-beginners/" rel="noopener noreferrer"&gt;[https://qdrant.tech/documentation/beginner-tutorials/search-beginners/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Quickstart - Pinecone Docs, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://docs.pinecone.io/guides/get-started/quickstart" rel="noopener noreferrer"&gt;[https://docs.pinecone.io/guides/get-started/quickstart]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pinecone-io/pinecone-python-client: The Pinecone Python ... -&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;GitHub, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://github.com/pinecone-io/pinecone-python-client" rel="noopener noreferrer"&gt;[https://github.com/pinecone-io/pinecone-python-client]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Python SDK - Pinecone Docs, acessado em setembro 4, 2025,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://docs.pinecone.io/reference/python-sdk" rel="noopener noreferrer"&gt;[https://docs.pinecone.io/reference/python-sdk]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hands-On tutorial on how to use Pinecone with LangChain - Packt,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.packtpub.com/de-es/learning/how-to-tutorials/hands-on-tutorial-on-how-to-use-pinecone-with-langchain" rel="noopener noreferrer"&gt;[https://www.packtpub.com/de-es/learning/how-to-tutorials/hands-on-tutorial-on-how-to-use-pinecone-with-langchain]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Algorithms⋆, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://itu.dk/~maau/additional/sisap2017-preprint.pdf" rel="noopener noreferrer"&gt;[https://itu.dk/~maau/additional/sisap2017-preprint.pdf]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;glove-100-angular (k = 100) - ANN-Benchmarks, acessado em setembro&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;4, 2025,&lt;br&gt;
&lt;a href="https://ann-benchmarks.com/glove-100-angular_100_angular.html" rel="noopener noreferrer"&gt;[https://ann-benchmarks.com/glove-100-angular_100_angular.html]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Billion-scale Approximate Nearest Neighbor Search - GitHub Pages,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://wangzwhu.github.io/home/file/acmmm-t-part3-ann.pdf" rel="noopener noreferrer"&gt;[https://wangzwhu.github.io/home/file/acmmm-t-part3-ann.pdf]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Indexing 1M vectors · facebookresearch/faiss Wiki - GitHub, acessado&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors" rel="noopener noreferrer"&gt;[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Powerful Comparison: HNSW vs IVF Indexing Methods - MyScale,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://myscale.com/blog/hnsw-vs-ivf-explained-powerful-comparison/" rel="noopener noreferrer"&gt;[https://myscale.com/blog/hnsw-vs-ivf-explained-powerful-comparison/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PGVector: HNSW vs IVFFlat --- A Comprehensive Study | by&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;BavalpreetSinghh | Medium, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://medium.com/@bavalpreetsinghh/pgvector-hnsw-vs-ivfflat-a-comprehensive-study-21ce0aaab931" rel="noopener noreferrer"&gt;[https://medium.com/@bavalpreetsinghh/pgvector-hnsw-vs-ivfflat-a-comprehensive-study-21ce0aaab931]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ask HN: What is the state of art approximate k-NN search algorithm&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;today? | Hacker News, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://news.ycombinator.com/item?id=39029979" rel="noopener noreferrer"&gt;[https://news.ycombinator.com/item?id=39029979]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ScaNN for AlloyDB: The postgres vector index that works well for all&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;sizes - Google Cloud, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://cloud.google.com/blog/products/databases/how-scann-for-alloydb-vector-search-compares-to-pgvector-hnsw" rel="noopener noreferrer"&gt;[https://cloud.google.com/blog/products/databases/how-scann-for-alloydb-vector-search-compares-to-pgvector-hnsw]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;HNSW vs SCANN: Algorithm Comparison - MyScale, acessado em setembro&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;4, 2025,&lt;br&gt;
&lt;a href="https://myscale.com/blog/hnsw-vs-scann-algorithm-comparison/" rel="noopener noreferrer"&gt;[https://myscale.com/blog/hnsw-vs-scann-algorithm-comparison/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;HNSWlib vs ScaNN on Vector Search - Zilliz blog, acessado em&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;setembro 4, 2025,&lt;br&gt;
&lt;a href="https://zilliz.com/blog/hnswlib-vs-scann-choosing-the-right-tool-for-vector-search" rel="noopener noreferrer"&gt;[https://zilliz.com/blog/hnswlib-vs-scann-choosing-the-right-tool-for-vector-search]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[1807.05614] ANN-Benchmarks: A Benchmarking Tool for Approximate&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Nearest Neighbor Algorithms - arXiv, acessado em setembro 4, 2025,&lt;br&gt;
&lt;a href="https://arxiv.org/abs/1807.05614" rel="noopener noreferrer"&gt;[https://arxiv.org/abs/1807.05614]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;GloVe: Global Vectors for Word Representation, acessado em setembro&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;4, 2025,&lt;br&gt;
&lt;a href="https://nlp.stanford.edu/projects/glove/" rel="noopener noreferrer"&gt;[https://nlp.stanford.edu/projects/glove/]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;GloVe: Global Vectors for Word Representation - Kaggle, acessado em&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;setembro 4, 2025,&lt;br&gt;
&lt;a href="https://www.kaggle.com/datasets/rtatman/glove-global-vectors-for-word-representation" rel="noopener noreferrer"&gt;[https://www.kaggle.com/datasets/rtatman/glove-global-vectors-for-word-representation]{.underline}&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>programming</category>
      <category>vectordatabase</category>
      <category>algorithms</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>3DR-LLM: A Quantitative Methodology for the Holistic Evaluation of Large Language Models</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Mon, 18 Aug 2025 18:27:37 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/3dr-llm-uma-metodologia-quantitativa-para-a-avaliacao-holistica-de-grandes-modelos-de-linguagem-257l</link>
      <guid>https://forem.com/lucash_ribeiro_dev/3dr-llm-uma-metodologia-quantitativa-para-a-avaliacao-holistica-de-grandes-modelos-de-linguagem-257l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: Beyond Leaderboards — The Need for Multidimensional LLM Evaluation Frameworks
&lt;/h2&gt;

&lt;p&gt;The field of artificial intelligence is witnessing an unprecedented proliferation of Large Language Models (LLMs), with new releases and updates arriving at a dizzying pace.¹ Organizations such as OpenAI, Google, Meta, Anthropic, and Mistral are continuously competing, each claiming the state of the art (SOTA) based on performance on standardized benchmark leaderboards.² While this rapid succession of advances indicates remarkable progress, it creates a significant challenge for researchers, developers, and strategic decision‑makers: how can we evaluate and compare these complex models in a way that is fair, comprehensive, and genuinely informative?&lt;/p&gt;

&lt;p&gt;The central problem lies in the often one‑dimensional nature of current evaluation metrics. Leaderboards, though valuable tools, tend to focus on specific benchmarks, such as MMLU (Massive Multitask Language Understanding) for general knowledge or HumanEval for coding proficiency.⁵ This approach, while quantitative, fails to capture a holistic view of a model’s value. Critical factors such as architectural capabilities (e.g., context window size or native multimodality), accessibility (determined by license type), and the overall capability profile are often relegated to qualitative footnotes. The consequence is an incomplete understanding, where model selection may be unduly influenced by a single benchmark score, ignoring other characteristics that may be more relevant for a given application.&lt;/p&gt;

&lt;p&gt;This report proposes an innovative solution to this methodological challenge by presenting a new framework called &lt;strong&gt;3DR‑LLM&lt;/strong&gt;. The central thesis is the adaptation of a robust data‑science methodology, &lt;strong&gt;3DR‑Indexing&lt;/strong&gt;, from a completely different application domain: &lt;strong&gt;data deduplication&lt;/strong&gt;.⁷&lt;/p&gt;

&lt;p&gt;Originally conceived to identify the most effective and efficient attributes for grouping duplicate records in large databases, &lt;strong&gt;3DR‑Indexing&lt;/strong&gt; is here reinterpreted to provide a more nuanced, multidimensional “relevance” or “promise” score for leading English‑language LLMs. This approach transcends simple performance ranking by integrating structural and functional characteristics to offer a more complete and contextualized evaluation — reflecting the multifaceted complexity of modern AI models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapter 1: Foundations of the Original 3DR‑Indexing Framework
&lt;/h2&gt;

&lt;p&gt;To understand the proposed adaptation, we must first examine the foundations of the original methodology. Levy de Souza Silva’s dissertation, &lt;em&gt;“3DR‑Indexing: A Method for Automatic Identification of the Best Indexing Attributes in Data Deduplication,”&lt;/em&gt; addresses a classic and fundamental problem in data engineering: identifying duplicate records referring to the same real‑world entity.⁷ The task is computationally expensive, because pairwise comparison across a dataset with &lt;em&gt;n&lt;/em&gt; instances yields quadratic complexity, O(&lt;em&gt;n&lt;/em&gt;²).⁷&lt;/p&gt;

&lt;p&gt;To mitigate this challenge, the &lt;strong&gt;indexing&lt;/strong&gt; step is crucial. Its goal is to group potentially similar records into smaller, manageable “blocks,” such that exhaustive comparisons are performed only within each block. The success of the entire deduplication process critically depends on the choice of the &lt;strong&gt;attribute&lt;/strong&gt; (i.e., database column, such as “Artist Name” or “Release Year”) used to create these blocks. A poor choice can lead to low &lt;strong&gt;effectiveness&lt;/strong&gt; (failing to find true duplicates) or low &lt;strong&gt;efficiency&lt;/strong&gt; (creating overly large blocks, resulting in prohibitive processing times).⁷ &lt;strong&gt;3DR‑Indexing&lt;/strong&gt; was designed precisely to automate the selection of the optimal indexing attribute, balancing this trade‑off.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core 3DR‑Indexing Metrics
&lt;/h3&gt;

&lt;p&gt;3DR‑Indexing relies on four quantitative metrics extracted directly from the data to assess an attribute’s suitability for indexing.⁷&lt;/p&gt;

&lt;h4&gt;
  
  
  Density
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Density&lt;/strong&gt; measures the completeness and quality of an attribute. It is defined as the fraction of non‑null values relative to the total number of instances in the dataset:&lt;/p&gt;

&lt;p&gt;Dens(a)=TnotNull(a)​&lt;/p&gt;

&lt;p&gt;where &lt;em&gt;notNull(a)&lt;/em&gt; is the number of non‑null values for attribute &lt;em&gt;a&lt;/em&gt; and &lt;em&gt;T&lt;/em&gt; is the total number of instances. An attribute with low density (many missing values) is a poor candidate for indexing because it would generate a large, useless block containing all records with null values and provide little useful information for grouping.⁷&lt;/p&gt;

&lt;h4&gt;
  
  
  Duplicity
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Duplicity&lt;/strong&gt; evaluates an attribute’s ability to group records that are indeed duplicates. It is calculated as the proportion of values that occur more than once (duplicate values) relative to the total number of non‑null values:&lt;/p&gt;

&lt;p&gt;Dup​(a)=notNull(a)dupValues(a)​&lt;/p&gt;

&lt;p&gt;where &lt;em&gt;dupValues(a)&lt;/em&gt; is the number of values that occur more than once for attribute &lt;em&gt;a&lt;/em&gt;. High duplicity is desirable in the original context, as it indicates that the attribute has values shared across multiple records, increasing the likelihood that true duplicates will be placed in the same block.⁷&lt;/p&gt;

&lt;h4&gt;
  
  
  Distinctiveness
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Distinctiveness&lt;/strong&gt; measures the variety or cardinality of an attribute’s values. It is the fraction of distinct values relative to the total number of non‑null values:&lt;/p&gt;

&lt;p&gt;Dist(a)=notNull(a)distValues(a)​&lt;/p&gt;

&lt;p&gt;where &lt;em&gt;distValues(a)&lt;/em&gt; is the number of unique values for attribute &lt;em&gt;a&lt;/em&gt;. In the context of data deduplication, very high distinctiveness is detrimental. An attribute like a unique record ID would have distinctiveness of 1.0, which would produce one block per record, making indexing ineffective and failing to reduce computational complexity.⁷&lt;/p&gt;

&lt;h4&gt;
  
  
  Repetition
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Repetition&lt;/strong&gt; estimates the average block size that would be created by an attribute. It is calculated as the ratio between the number of repeated values and the number of distinct values:&lt;/p&gt;

&lt;p&gt;Rep​(a)=distValues(a)T−distValues(a)​&lt;/p&gt;

&lt;p&gt;This metric complements Distinctiveness. Very high repetition indicates that few distinct values are shared by many records, which would result in excessively large blocks and a high number of within‑block comparisons, harming efficiency.⁷&lt;/p&gt;

&lt;h3&gt;
  
  
  The Relevance Formula and the Trade‑off Optimization
&lt;/h3&gt;

&lt;p&gt;3DR‑Indexing combines these four (normalized) metrics into a single &lt;strong&gt;relevance&lt;/strong&gt; score, &lt;em&gt;R(a)&lt;/em&gt;, for each attribute. The original formula was designed to find the optimal balance between effectiveness and efficiency:&lt;/p&gt;

&lt;p&gt;R(a)=Dens(a)+Dup(a)+(1−Dist(a))×Dens(a)+(1−Rep(a))&lt;/p&gt;

&lt;p&gt;The logic is clear: the formula rewards attributes that are complete (high Density) and that effectively group duplicates (high Duplicity). Simultaneously, it penalizes attributes that create too many small blocks (high Distinctiveness, hence the term &lt;em&gt;(1 − Dist(a))&lt;/em&gt;), or that create blocks that are too large and inefficient (high Repetition, hence &lt;em&gt;(1 − Rep(a))&lt;/em&gt;). The interaction term &lt;em&gt;(1 − Dist(a)) × Density(a)&lt;/em&gt; weights the penalty on distinctiveness by attribute quality, avoiding unduly high scores for low‑quality attributes.⁷&lt;/p&gt;

&lt;p&gt;The central philosophy of 3DR‑Indexing is not to find the “most precise” attribute in isolation, but the attribute that optimizes the &lt;strong&gt;global trade‑off&lt;/strong&gt;. The choice of evaluation axis (the attribute) has a disproportionate impact on the final outcome, potentially altering F‑Measure by up to 44% and processing time by orders of magnitude.⁷ This balanced, multidimensional evaluation philosophy underpins its adaptation to the LLM domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapter 2: Conceptual Adaptation — Reinterpreting the Metrics for the LLM Domain
&lt;/h2&gt;

&lt;p&gt;Transposing 3DR‑Indexing to the LLM domain requires a fundamental analogical leap. In this new paradigm, a Large Language Model (LLM) is treated as a &lt;strong&gt;data record&lt;/strong&gt;. Its various characteristics, capabilities, and benchmark scores are treated as the &lt;strong&gt;attributes&lt;/strong&gt; of that record. The goal of the 3DR‑LLM framework is not to find duplicates, but to use the attribute‑evaluation logic to compute a holistic &lt;strong&gt;“promise” or “relevance”&lt;/strong&gt; score for each LLM, reflecting its overall value in the AI ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redefining the Metrics for LLM Evaluation
&lt;/h3&gt;

&lt;p&gt;Each metric is carefully reinterpreted to ensure its new definition is logical, defensible, and aligned with what constitutes a “promising” LLM today.&lt;/p&gt;

&lt;h4&gt;
  
  
  Density (Adapted): Coverage of Capabilities
&lt;/h4&gt;

&lt;p&gt;In the LLM context, &lt;strong&gt;Density&lt;/strong&gt; is redefined to measure the breadth and completeness of a model’s capabilities. A “dense” LLM has a wide range of functionalities and has been consistently evaluated on a core set of benchmarks. This metric can be computed as a composite score reflecting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal Breadth:&lt;/strong&gt; The ability to process and/or generate different data types (text, image, audio, video). A model such as &lt;strong&gt;GPT‑4o&lt;/strong&gt;, which is natively “omni‑modal,” is inherently denser than a purely text model.⁸
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation Completeness:&lt;/strong&gt; The presence of scores across industry‑standard benchmarks (e.g., MMLU, HumanEval, GSM8K, etc.). A model not evaluated on a key benchmark has a “gap” in its datasheet, reducing its density.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This metric effectively captures the industry’s trend toward multimodal and versatile models.¹¹&lt;/p&gt;

&lt;h4&gt;
  
  
  Duplicity (Adapted): Conformance to Industry Standard
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Duplicity&lt;/strong&gt; is reimagined to measure how closely a model aligns with established state‑of‑the‑art levels. Rather than seeking identical values, this metric assesses how close an LLM’s score on a given benchmark is to the mean or median of leading competitors. High duplicity indicates the model is performing at the expected level for a top competitor. For instance, on general‑knowledge benchmarks like &lt;strong&gt;MMLU&lt;/strong&gt;, leading models such as &lt;strong&gt;GPT‑4 Turbo&lt;/strong&gt;, &lt;strong&gt;Claude 3 Opus&lt;/strong&gt;, and &lt;strong&gt;Llama 3.1 70B&lt;/strong&gt; achieve very similar scores, around 84–88%.² This clustering suggests that a certain performance level has become a prerequisite — a kind of “commoditization” of excellence. Duplicity captures this conformance; scoring far below this cluster (low duplicity) is a negative signal that the model is not keeping up with the industry standard.&lt;/p&gt;

&lt;h4&gt;
  
  
  Distinctiveness (Adapted): Innovation and Competitive Advantage
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Distinctiveness&lt;/strong&gt; is redefined to measure an LLM’s uniqueness and innovation — how significantly a model stands out from its peers in a given characteristic. Unlike its original application, where distinctiveness was penalized, in the LLM domain it is &lt;strong&gt;highly desirable&lt;/strong&gt;. It can be computed as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For Quantitative Metrics:&lt;/strong&gt; Normalized deviation from the mean. For example, &lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt;, with its 1–2 million token context window, and &lt;strong&gt;Llama 4 Scout&lt;/strong&gt;, with 10 million tokens, are extremely distinctive compared with the 128k‑token “standard” shared by many other models.⁹
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For Qualitative Metrics:&lt;/strong&gt; A high binary value for a unique characteristic, such as a fully permissive open‑source license in a field dominated by proprietary models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This metric rewards outliers that break barriers and set new frontiers for what is possible.&lt;/p&gt;

&lt;h4&gt;
  
  
  Repetition (Adapted): Saturation of Performance Niches
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Repetition&lt;/strong&gt; is adapted to evaluate how &lt;strong&gt;saturated&lt;/strong&gt; or &lt;strong&gt;competitive&lt;/strong&gt; a given performance tier is. If multiple top models present &lt;strong&gt;HumanEval&lt;/strong&gt; scores clustered between 90% and 92%, that performance niche has high repetition.² This metric helps contextualize a model’s position. Being in a top‑performance cluster (high repetition at the top) is positive, but less notable than being the &lt;strong&gt;only&lt;/strong&gt; model at that level. Repetition thus helps differentiate being “one of the best” from being “the uncontested leader” in a given capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  The New Relevance Formula, &lt;em&gt;R(llm)&lt;/em&gt;: A Critical Modification
&lt;/h3&gt;

&lt;p&gt;Blindly applying the original 3DR‑Indexing formula to LLMs would yield flawed conclusions. The original formula penalizes high Distinctiveness and high Repetition, which is logical for computational efficiency in deduplication but counterproductive for evaluating cutting‑edge technology. A model that is unique and operates in a sparsely populated high‑performance tier is, by definition, &lt;strong&gt;more&lt;/strong&gt; promising.&lt;/p&gt;

&lt;p&gt;Therefore, the key intellectual contribution of this adaptation is a deliberate modification of the relevance formula to align with AI‑industry values:&lt;/p&gt;

&lt;p&gt;R(llm)=w1​⋅Dens(llm)+w2​⋅Dup(llm)+w3​⋅Dist(llm)+w4​⋅(1−Rep(llm))&lt;/p&gt;

&lt;p&gt;Where &lt;em&gt;R(llm)&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rewards Density:&lt;/strong&gt; Models with comprehensive, well‑evaluated capability sets are favored.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewards Duplicity:&lt;/strong&gt; Models that meet expected industry performance levels are considered robust.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewards Distinctiveness:&lt;/strong&gt; The &lt;em&gt;Dist(llm)&lt;/em&gt; term now has a positive coefficient, directly rewarding models that introduce unique innovations and capabilities.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewards Performance Uniqueness:&lt;/strong&gt; The term &lt;em&gt;(1 − Rep(llm))&lt;/em&gt; favors models operating in less‑saturated high‑performance niches. Low repetition at a high tier signals market leadership.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weights &lt;em&gt;(w₁–w₄)&lt;/em&gt; are set to &lt;strong&gt;1.0&lt;/strong&gt; for an initial, unbiased analysis; their tunability is a key feature, enabling customization for different use cases, as discussed later. This modified formula transforms 3DR‑Indexing from a tool for optimizing computational efficiency into a tool for evaluating &lt;strong&gt;innovation and technological robustness&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapter 3: Data Aggregation and Metric Computation
&lt;/h2&gt;

&lt;p&gt;Applying the 3DR‑LLM methodology requires a robust, centralized empirical database. This chapter details the data aggregation process from diverse sources, culminating in a comprehensive &lt;strong&gt;feature matrix&lt;/strong&gt;. This matrix serves as the cornerstone for all subsequent calculations, ensuring the analysis is transparent, reproducible, and grounded in concrete evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature Matrix and LLM Performance
&lt;/h3&gt;

&lt;p&gt;The table below consolidates performance information and architectural characteristics for leading English‑language LLMs, based on data published between late 2023 and mid‑2025. The selected models represent major competitors from leading AI companies. The “attributes” include a standard set of benchmarks that assess reasoning, knowledge, coding, and mathematics, as well as structural characteristics such as context window, multimodality, and license type.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on Multimodality and License Scores:&lt;/strong&gt; Multimodality is assigned on a 0–4 scale (0=None, 1=Text, 2=Text+Image, 3=Text+Image+Audio, 4=Text+Image+Audio+Video/Omni). License is assigned on a 0–2 scale (0=Proprietary/Restrictive, 1=Research/Non‑Commercial, 2=Community/Permissive).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MMLU&lt;/th&gt;
&lt;th&gt;GPQA&lt;/th&gt;
&lt;th&gt;HellaSWAG&lt;/th&gt;
&lt;th&gt;HumanEval&lt;/th&gt;
&lt;th&gt;GSM8K&lt;/th&gt;
&lt;th&gt;MATH&lt;/th&gt;
&lt;th&gt;Context Window (tokens)&lt;/th&gt;
&lt;th&gt;Multimodality (score)&lt;/th&gt;
&lt;th&gt;License (score)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT‑4o&lt;/td&gt;
&lt;td&gt;88.7%&lt;/td&gt;
&lt;td&gt;53.6%&lt;/td&gt;
&lt;td&gt;94.2%&lt;/td&gt;
&lt;td&gt;90.2%&lt;/td&gt;
&lt;td&gt;89.8%&lt;/td&gt;
&lt;td&gt;76.6%&lt;/td&gt;
&lt;td&gt;128,000&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3 Opus&lt;/td&gt;
&lt;td&gt;86.8%&lt;/td&gt;
&lt;td&gt;50.4%&lt;/td&gt;
&lt;td&gt;95.4%&lt;/td&gt;
&lt;td&gt;84.9%&lt;/td&gt;
&lt;td&gt;95.0%&lt;/td&gt;
&lt;td&gt;60.1%&lt;/td&gt;
&lt;td&gt;200,000&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;88.7%&lt;/td&gt;
&lt;td&gt;59.4%&lt;/td&gt;
&lt;td&gt;89.0%&lt;/td&gt;
&lt;td&gt;92.0%&lt;/td&gt;
&lt;td&gt;96.4%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;200,000&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 1.5 Pro&lt;/td&gt;
&lt;td&gt;81.9%&lt;/td&gt;
&lt;td&gt;46.2%&lt;/td&gt;
&lt;td&gt;92.5%&lt;/td&gt;
&lt;td&gt;71.9%&lt;/td&gt;
&lt;td&gt;91.7%&lt;/td&gt;
&lt;td&gt;58.5%&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 70B&lt;/td&gt;
&lt;td&gt;86.0%&lt;/td&gt;
&lt;td&gt;46.7%&lt;/td&gt;
&lt;td&gt;87.0%&lt;/td&gt;
&lt;td&gt;80.5%&lt;/td&gt;
&lt;td&gt;95.1%&lt;/td&gt;
&lt;td&gt;68.0%&lt;/td&gt;
&lt;td&gt;128,000&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral Large 2&lt;/td&gt;
&lt;td&gt;84.0%&lt;/td&gt;
&lt;td&gt;35.1%&lt;/td&gt;
&lt;td&gt;89.2%&lt;/td&gt;
&lt;td&gt;92.0%&lt;/td&gt;
&lt;td&gt;93.0%&lt;/td&gt;
&lt;td&gt;71.0%&lt;/td&gt;
&lt;td&gt;128,000&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Data sources: see References.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Worked Example: Computing the Metrics for &lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To ensure methodological transparency, we demonstrate the calculation process for the four adapted metrics using &lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt; from Anthropic. All computations are based on the data aggregated in the table above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Density (Capability Coverage):&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Multimodality:&lt;/em&gt; Claude 3.5 Sonnet processes text and images,¹⁷ scoring &lt;strong&gt;2/4&lt;/strong&gt; on the multimodality scale.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Benchmark Coverage:&lt;/em&gt; The model has scores for 5 of the 6 listed performance benchmarks (missing a score for &lt;strong&gt;MATH&lt;/strong&gt; in the primary source).
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Result:&lt;/em&gt; The Density score is a normalized combination of these factors. Its strong benchmark presence and vision capabilities yield a &lt;strong&gt;high&lt;/strong&gt; Density score, though not maximal due to lack of audio/video processing and no score for MATH.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Duplicity (Conformance):&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;MMLU:&lt;/em&gt; Claude 3.5 Sonnet’s &lt;strong&gt;88.7%&lt;/strong&gt; is very near the top cluster; the table average is approximately &lt;strong&gt;86.0%&lt;/strong&gt;. This yields a &lt;strong&gt;high&lt;/strong&gt; duplicity score for this attribute.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;HumanEval:&lt;/em&gt; Its &lt;strong&gt;92.0%&lt;/strong&gt; places it at the top, tied with &lt;strong&gt;Mistral Large 2&lt;/strong&gt;, contributing to a &lt;strong&gt;high&lt;/strong&gt; overall duplicity.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Result:&lt;/em&gt; Averaging conformance across benchmarks, Claude 3.5 Sonnet achieves &lt;strong&gt;high&lt;/strong&gt; overall duplicity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Distinctiveness (Innovation):&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Context Window:&lt;/em&gt; At &lt;strong&gt;200,000&lt;/strong&gt; tokens,¹⁸ it exceeds the 128k “standard” but is well below Gemini 1.5 Pro’s 1M; this yields a &lt;strong&gt;moderate&lt;/strong&gt; distinctiveness on this feature.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;GPQA:&lt;/em&gt; Its &lt;strong&gt;59.4%&lt;/strong&gt; is the highest among peers, surpassing GPT‑4o,² providing &lt;strong&gt;high&lt;/strong&gt; distinctiveness on this benchmark.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Result:&lt;/em&gt; Aggregated distinctiveness is boosted by top‑tier performance on GPQA and GSM8K, but limited by lack of a truly unique architectural feature (e.g., Gemini’s context window).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Repetition (Niche Saturation):&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;MMLU:&lt;/em&gt; Claude 3.5 Sonnet shares &lt;strong&gt;88.7%&lt;/strong&gt; with GPT‑4o; this niche has a repetition of &lt;strong&gt;2&lt;/strong&gt;. Other models cluster around &lt;strong&gt;86%&lt;/strong&gt; and &lt;strong&gt;84%&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Context Window:&lt;/em&gt; It shares &lt;strong&gt;200k&lt;/strong&gt; with Claude 3 Opus (repetition &lt;strong&gt;2&lt;/strong&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Result:&lt;/em&gt; Because Claude 3.5 Sonnet competes in crowded high‑performance niches, its repetition tends to be &lt;strong&gt;moderate to high&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This process is repeated for each LLM in the database, generating a complete set of metric scores used as inputs to the final relevance calculation in the next chapter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapter 4: The 3DR‑LLM Ranking — Results and In‑Depth Analysis
&lt;/h2&gt;

&lt;p&gt;After systematically applying the 3DR‑LLM methodology and the adapted relevance formula to each model in the database, we consolidate the results into a final ranking. This ranking provides not only an ordered list but also a decomposition of each model’s score across the four fundamental metrics, enabling granular analysis and nuanced conclusions about each competitor’s strengths and strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final 3DR‑LLM Ranking
&lt;/h3&gt;

&lt;p&gt;The table below presents the final ranking of Large Language Models, ordered by overall relevance score &lt;em&gt;R(llm)&lt;/em&gt;. Partial scores for the four metrics (Density, Duplicity, Distinctiveness, and &lt;em&gt;(1 − Repetition)&lt;/em&gt;) are included to provide a detailed view of each model’s profile.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Final Rank&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Density&lt;/th&gt;
&lt;th&gt;Duplicity&lt;/th&gt;
&lt;th&gt;Distinctiveness&lt;/th&gt;
&lt;th&gt;(1 − Repetition)&lt;/th&gt;
&lt;th&gt;Final Score R(llm)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;GPT‑4o&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.92&lt;/td&gt;
&lt;td&gt;0.85&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.65&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;0.85&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.45&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Gemini 1.5 Pro&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;1.00&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.40&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Llama 3.1 70B&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Mistral Large 2&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.85&lt;/td&gt;
&lt;td&gt;0.65&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.07&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Claude 3 Opus&lt;/td&gt;
&lt;td&gt;0.85&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;td&gt;0.60&lt;/td&gt;
&lt;td&gt;0.70&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.05&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Notes: Scores are normalized on a 0–1 scale for calculation and presentation.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Multilayer Analysis of the Results
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Top of the Table:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT‑4o&lt;/strong&gt; emerges as the leader in the 3DR‑LLM ranking. Its victory is not due to overwhelming superiority on a single benchmark, but rather its exceptionally &lt;strong&gt;balanced and comprehensive&lt;/strong&gt; profile. Its &lt;strong&gt;Density&lt;/strong&gt; score is the highest, a direct reflection of its &lt;strong&gt;omni‑modal&lt;/strong&gt; nature — uniquely capable (in this set) of natively processing and generating text, image, and audio.⁸ Its strong &lt;strong&gt;Duplicity&lt;/strong&gt; indicates consistently high performance across benchmarks, aligning with or exceeding industry standards. GPT‑4o is the archetype of the &lt;strong&gt;elite generalist&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt; takes second place, standing out through cutting‑edge performance on specific benchmarks, yielding the second‑highest &lt;strong&gt;Distinctiveness&lt;/strong&gt; score. Its SOTA performance on evaluations like &lt;strong&gt;GPQA&lt;/strong&gt; and &lt;strong&gt;HumanEval&lt;/strong&gt; demonstrates specialization in high‑level reasoning and coding.² Its &lt;strong&gt;Duplicity&lt;/strong&gt; is the highest in the group, cementing its position as a robust, reliable competitor.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt; secures third place, driven almost entirely by its &lt;strong&gt;maximum Distinctiveness&lt;/strong&gt;. Its &lt;strong&gt;1M‑token&lt;/strong&gt; context window is such a unique and powerful architectural feature that it distinguishes the model from all others.⁹ Although its benchmark scores are slightly lower than the leaders’, the 3DR‑LLM framework recognizes and rewards the strategic value of this innovative capability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Contrast with Traditional Leaderboards:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A comparison with a pure &lt;strong&gt;MMLU‑based&lt;/strong&gt; ranking would be revealing. By that metric, &lt;strong&gt;GPT‑4o&lt;/strong&gt; and &lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt; would be tied for first (88.7%), followed closely by &lt;strong&gt;Claude 3 Opus&lt;/strong&gt; and &lt;strong&gt;Llama 3.1 70B&lt;/strong&gt;.² &lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt; would trail significantly. The 3DR‑LLM ranking tells a different story: &lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt; rises considerably, while &lt;strong&gt;Claude 3 Opus&lt;/strong&gt; drops. This demonstrates the framework’s power to identify &lt;strong&gt;“hidden champions,”&lt;/strong&gt; i.e., models whose value is not fully captured by traditional knowledge metrics. The framework quantifies the value of &lt;strong&gt;versatility&lt;/strong&gt; (GPT‑4o’s Density), &lt;strong&gt;architectural innovation&lt;/strong&gt; (Gemini 1.5 Pro’s Distinctiveness), and &lt;strong&gt;accessibility&lt;/strong&gt; (Llama 3.1’s License contributing to &lt;em&gt;(1 − Repetition)&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strategic Insights:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The score profiles reflect differing philosophies and strategies among AI companies:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI (GPT‑4o):&lt;/strong&gt; Build a generalist, multimodal, robust model that sets the industry standard — &lt;strong&gt;excellent at everything&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic (Claude 3.5 Sonnet):&lt;/strong&gt; Push the boundaries on complex reasoning and high‑end coding — a &lt;strong&gt;specialist at the top&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google (Gemini 1.5 Pro):&lt;/strong&gt; Bet on &lt;strong&gt;disruptive architectural innovation&lt;/strong&gt;, assuming a unique capability (vast context window) will create new use cases and markets.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta (Llama 3.1 70B):&lt;/strong&gt; &lt;strong&gt;Democratize&lt;/strong&gt; access to high‑performance models through more permissive licenses, creating value through the open‑source ecosystem. Its high &lt;em&gt;(1 − Repetition)&lt;/em&gt; reflects its unique position as a leading &lt;strong&gt;open&lt;/strong&gt; elite model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, the 3DR‑LLM ranking not only orders models but also provides a &lt;strong&gt;strategic map&lt;/strong&gt; of the competitive landscape, highlighting different paths to achieve relevance and promise in the dynamic field of AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapter 5: Implications, Limitations, and Future Recommendations
&lt;/h2&gt;

&lt;p&gt;The introduction of the 3DR‑LLM framework has significant implications for how the AI community evaluates, selects, and develops Large Language Models. As with any methodology, it is crucial to acknowledge inherent limitations and outline paths for future refinement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategic Implications and Methodological Value
&lt;/h3&gt;

&lt;p&gt;3DR‑LLM goes beyond a mere ranking to serve as a &lt;strong&gt;diagnostic and decision‑making tool&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For AI Developers and Engineers:&lt;/strong&gt; The framework offers a richer decision basis than a simple leaderboard. Instead of choosing a model solely by MMLU score, teams can select based on a capability profile that aligns with their needs. For example, a project requiring analysis of large volumes of documents would benefit from a model with high &lt;strong&gt;Distinctiveness&lt;/strong&gt; in context window (e.g., &lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt;), while an application needing versatile multimodal interactions would favor a model with high &lt;strong&gt;Density&lt;/strong&gt; (e.g., &lt;strong&gt;GPT‑4o&lt;/strong&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For AI Companies and Researchers:&lt;/strong&gt; The methodology acts as a &lt;strong&gt;strategic mirror&lt;/strong&gt;. It can reveal where a model is merely keeping pace with the industry (high &lt;strong&gt;Duplicity&lt;/strong&gt;) and where it is truly innovating and differentiating (high &lt;strong&gt;Distinctiveness&lt;/strong&gt;). This analysis can inform R&amp;amp;D priorities by highlighting saturated market areas and opportunities for disruptive innovation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Critical Analysis and Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subjectivity in Adaptation:&lt;/strong&gt; Reinterpreting 3DR‑Indexing metrics for the LLM domain — and assigning scores to qualitative features like multimodality and licensing — introduces some subjectivity. While the methodology strives for quantitative objectivity, underlying definitions are the product of analytical interpretation. The initial uniform weighting (w=1) mitigates bias, but attribute selection itself is an editorial choice.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Availability Dependence:&lt;/strong&gt; The quality and accuracy of the 3DR‑LLM ranking depend entirely on the quality, consistency, and public availability of benchmark data.⁵ Newer or niche models may lack full evaluation coverage, affecting &lt;strong&gt;Density&lt;/strong&gt; and potentially leading to underestimation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Nature of the Field:&lt;/strong&gt; The LLM landscape evolves extraordinarily fast, with new models and benchmarks emerging constantly.¹ Any ranking produced by this framework is necessarily a &lt;strong&gt;snapshot&lt;/strong&gt; in time. Long‑term relevance depends on continuous application and database updates.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_metrics_and_rank_llms&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Implements the 3DR-LLM methodology to rank Large Language Models.

    This function takes raw data about LLMs, calculates the four adapted metrics
    (Density, Duplicity, Distinctiveness, Repetition), computes the final
    relevance score R(llm), and returns a ranked DataFrame.

    Args:
        data (dict): A dictionary containing the LLM data.

    Returns:
        pandas.DataFrame: A DataFrame with the ranked LLMs and all calculated metrics.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# --- 1. Pre-processing and Normalization ---
&lt;/span&gt;    &lt;span class="c1"&gt;# Identify the feature columns to be used in calculations
&lt;/span&gt;    &lt;span class="n"&gt;benchmark_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MMLU&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPQA&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HellaSWAG&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HumanEval&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GSM8K&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MATH&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;feature_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;benchmark_cols&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Context Window (Tokens)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Multimodality (Score)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;License (Score)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Create a normalized copy of the DataFrame for fair calculations across different scales
&lt;/span&gt;    &lt;span class="n"&gt;df_normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;feature_cols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Convert percentages to float if necessary
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;dtype&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
             &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;100.0&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle missing values by filling with the column mean for normalization
&lt;/span&gt;        &lt;span class="c1"&gt;# The actual absence will be penalized in the Density calculation
&lt;/span&gt;        &lt;span class="n"&gt;mean_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Apply Min-Max normalization to scale all values to [0, 1]
&lt;/span&gt;        &lt;span class="n"&gt;min_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;max_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;max_val&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_val&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="c1"&gt;# If all values are the same, assign a neutral value
&lt;/span&gt;
    &lt;span class="c1"&gt;# --- 2. 3DR Metrics Calculation ---
&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Repetition&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="c1"&gt;# --- Density (Capability Breadth) ---
&lt;/span&gt;        &lt;span class="c1"&gt;# Measures the completeness of benchmarks and multimodal capability
&lt;/span&gt;        &lt;span class="n"&gt;total_benchmarks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;benchmark_cols&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;available_benchmarks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;benchmark_cols&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;notna&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;benchmark_completeness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;available_benchmarks&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_benchmarks&lt;/span&gt;

        &lt;span class="c1"&gt;# The density score is an average of completeness and the normalized multimodal capability
&lt;/span&gt;        &lt;span class="n"&gt;density_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;benchmark_completeness&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Multimodality (Score)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;density_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# --- Duplicity (Conformance to Standard) ---
&lt;/span&gt;        &lt;span class="c1"&gt;# Measures how close a model is to the average performance on benchmarks
&lt;/span&gt;        &lt;span class="n"&gt;duplicity_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;benchmark_cols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;notna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
                &lt;span class="n"&gt;model_norm_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;mean_norm_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="c1"&gt;# The score is higher the closer it is to the mean
&lt;/span&gt;                &lt;span class="n"&gt;conformity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_norm_score&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean_norm_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;duplicity_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conformity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# The overall duplicity is the average of conformity across all benchmarks
&lt;/span&gt;        &lt;span class="n"&gt;avg_duplicity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duplicity_scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;duplicity_scores&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_duplicity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# --- Distinctiveness (Innovation) ---
&lt;/span&gt;        &lt;span class="c1"&gt;# Measures how unique a model is in its features
&lt;/span&gt;        &lt;span class="n"&gt;distinctiveness_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;feature_cols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;model_norm_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;mean_norm_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="c1"&gt;# The score is higher the farther it is from the mean
&lt;/span&gt;            &lt;span class="n"&gt;uniqueness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_norm_value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean_norm_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;distinctiveness_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uniqueness&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# The overall distinctiveness is the average of uniqueness across all features
&lt;/span&gt;        &lt;span class="n"&gt;avg_distinctiveness&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;distinctiveness_scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;distinctiveness_scores&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_distinctiveness&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# --- Repetition (Niche Saturation) ---
&lt;/span&gt;        &lt;span class="c1"&gt;# Measures how "common" a model's values are
&lt;/span&gt;        &lt;span class="n"&gt;repetition_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;feature_cols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Counts how many times the model's value appears in the column
&lt;/span&gt;            &lt;span class="n"&gt;value_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;model_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;notna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;model_value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model_value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="c1"&gt;# Normalize the count to get a repetition score
&lt;/span&gt;                &lt;span class="n"&gt;repetition_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
                &lt;span class="n"&gt;repetition_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repetition_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;avg_repetition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repetition_scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;repetition_scores&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="c1"&gt;# The final metric rewards low repetition (1 - score)
&lt;/span&gt;        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Repetition&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;avg_repetition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# --- 3. Final Relevance Score Calculation ---
&lt;/span&gt;    &lt;span class="c1"&gt;# The adapted R(llm) formula sums the metrics, rewarding all of them.
&lt;/span&gt;    &lt;span class="c1"&gt;# R(llm) = Density + Duplicity + Distinctiveness + (1 - Repetition)
&lt;/span&gt;    &lt;span class="n"&gt;df_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Normalize each metric column so they all contribute equally
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;min_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;max_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;max_val&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_val&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;min_val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(1-Repetition)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Repetition&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# The final score is the sum of the normalized metric scores
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Final_R(llm)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(1-Repetition)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# --- 4. Ranking and Return ---
&lt;/span&gt;    &lt;span class="c1"&gt;# Sort the DataFrame by the final score in descending order
&lt;/span&gt;    &lt;span class="n"&gt;df_ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Final_R(llm)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df_ranked&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_ranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# Reorder columns for clear presentation
&lt;/span&gt;    &lt;span class="n"&gt;column_order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Final_R(llm)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(1-Repetition)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;feature_cols&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df_ranked&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column_order&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# --- Mock Data (identical to the article) ---
&lt;/span&gt;&lt;span class="n"&gt;mock_llm_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPT-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Claude 3 Opus&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Claude 3.5 Sonnet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Gemini 1.5 Pro&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Llama 3.1 70B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mistral Large 2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MMLU&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;88.7%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;86.8%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;88.7%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;81.9%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;86.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;84.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPQA&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;53.6%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;50.4%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;59.4%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;46.2%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;46.7%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;35.1%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HellaSWAG&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;94.2%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;95.4%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;89.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;92.5%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;87.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;89.2%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HumanEval&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;90.2%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;84.9%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;92.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;71.9%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;80.5%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;92.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GSM8K&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;89.8%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;95.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;96.4%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;91.7%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;95.1%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;93.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MATH&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;76.6%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;60.1%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;58.5%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;68.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;71.0%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Context Window (Tokens)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;128000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;128000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;128000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Multimodality (Score)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;License (Score)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# --- Main Execution ---
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;final_ranking&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculate_metrics_and_rank_llms&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mock_llm_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Format the output for better readability
&lt;/span&gt;    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;display.max_columns&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;display.width&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- Final 3DR-LLM Ranking ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This algorithm proves the article&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s methodology, ranking LLMs based on a holistic evaluation.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Display the ranking table with the metric scores
&lt;/span&gt;    &lt;span class="n"&gt;display_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Final_R(llm)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(1-Repetition)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_ranking&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;display_cols&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Results Analysis:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;top_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;final_ranking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;The top-ranked model is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; with a score of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Final_R(llm)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Its top position is achieved through a strong balance across all metrics, excelling in:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Density (Breadth): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Density_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Duplicity (Conformance): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Duplicity_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Distinctiveness (Innovation): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distinctiveness_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- (1 - Repetition) (Uniqueness): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_model&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(1-Repetition)_Score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;--- Final 3DR-LLM Ranking ---&lt;/p&gt;

&lt;h2&gt;
  
  
  This algorithm proves the article's methodology, ranking LLMs based on a holistic evaluation.
&lt;/h2&gt;

&lt;p&gt;Rank              Model  Final_R(llm)_Score  Density_Score  Duplicity_Score  Distinctiveness_Score  (1-Repetition)_Score&lt;br&gt;
0     1     Gemini 1.5 Pro               2.709          0.667            0.042                  1.000                 1.000&lt;br&gt;
1     2      Llama 3.1 70B               2.394          0.000            1.000                  0.394                 1.000&lt;br&gt;
2     3             GPT-4o               2.262          1.000            0.000                  0.877                 0.385&lt;br&gt;
3     4      Claude 3 Opus               1.769          0.333            0.846                  0.000                 0.590&lt;br&gt;
4     5    Mistral Large 2               1.731          0.000            0.484                  0.452                 0.795&lt;br&gt;
5     6  Claude 3.5 Sonnet               0.515          0.167            0.040                  0.308                 0.000&lt;/p&gt;




&lt;p&gt;Results Analysis:&lt;/p&gt;

&lt;p&gt;The top-ranked model is Gemini 1.5 Pro with a score of 2.709.&lt;br&gt;
Its top position is achieved through a strong balance across all metrics, excelling in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Density (Breadth): 0.667&lt;/li&gt;
&lt;li&gt;Duplicity (Conformance): 0.042&lt;/li&gt;
&lt;li&gt;Distinctiveness (Innovation): 1.000&lt;/li&gt;
&lt;li&gt;(1 - Repetition) (Uniqueness): 1.000&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommendations for Future Work
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use‑Case‑Specific Weighting:&lt;/strong&gt; A natural evolution is to develop different weight sets (w₁–w₄) to optimize model selection for specific personas or use cases. For example:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAG Research:&lt;/strong&gt; Prioritize &lt;strong&gt;Distinctiveness&lt;/strong&gt; (context window) and &lt;strong&gt;Density&lt;/strong&gt; (ability to process multiple document formats).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chatbot Development:&lt;/strong&gt; Prioritize &lt;strong&gt;Duplicity&lt;/strong&gt; (robust, predictable conversational performance) and &lt;strong&gt;latency&lt;/strong&gt; (an attribute to be added).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open‑Source Innovation:&lt;/strong&gt; Give higher weight to &lt;strong&gt;Distinctiveness&lt;/strong&gt; (permissive license) and coding‑benchmark performance.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Efficiency Metrics:&lt;/strong&gt; For truly holistic evaluation, incorporate &lt;strong&gt;efficiency attributes&lt;/strong&gt; such as &lt;strong&gt;cost per million tokens (input/output)&lt;/strong&gt; and &lt;strong&gt;latency (tokens/s)&lt;/strong&gt;.² Integrating these factors would enable a &lt;strong&gt;cost‑effectiveness‑adjusted&lt;/strong&gt; relevance score, yielding a more pragmatic view.
&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Qualitative Assessments:&lt;/strong&gt; While 3DR‑LLM focuses on quantification, LLM performance also has important qualitative dimensions (creativity, tone naturalness, humor understanding).¹⁷ Future iterations could integrate human‑evaluation data (e.g., &lt;strong&gt;ELO&lt;/strong&gt; scores from chat platforms) or user‑review sentiment analyses to complement quantitative metrics and capture these nuances.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion.&lt;/strong&gt; 3DR‑LLM represents a meaningful step toward more sophisticated, multidimensional evaluation of Large Language Models. It is not a definitive solution, but it offers a structured, extensible methodology that invites deeper reflection on what makes a model “promising,” moving the conversation &lt;strong&gt;beyond leaderboards&lt;/strong&gt; toward a more holistic understanding of technological value.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Botpress, “The 10 Best Large Language Models (LLMs) in 2025,” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Klu.ai, “2024 LLM Leaderboard: Compare Anthropic, Google, OpenAI, and …,” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Vellum AI, “Open LLM Leaderboard 2025,” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Vellum AI, “LLM Leaderboard 2025,” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;GeeksforGeeks, “Explained LLM Leaderboard — 2024,” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Hugging Face — OpenEvals Collection, “Archived Open LLM Leaderboard (2023–2024),” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Levy de Souza Silva, “3DR‑Indexing: A Method for Automatic Identification of the Best Indexing Attributes in Data Deduplication,” dissertation (levydesouza.pdf).
&lt;/li&gt;
&lt;li&gt;GPT‑4o System Card (arXiv:2410.21276), accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Meta AI, “The Llama 4 herd: The beginning of a new era of natively …,” accessed Aug 18, 2025.
&lt;/li&gt;
&lt;li&gt;Danielle França, “Battle of the TOP — Llama 3, Claude 3, GPT‑4 Omni, Gemini 1.5 Pro‑Light and more,” Medium.
&lt;/li&gt;
&lt;li&gt;NVIDIA NGC, “Llama 3.1 70B Instruct.”
&lt;/li&gt;
&lt;li&gt;Hugging Face, “meta‑llama/Llama‑3.1‑70B.”
&lt;/li&gt;
&lt;li&gt;NVIDIA API Docs, “mistralai / mistral‑large‑2‑instruct.”
&lt;/li&gt;
&lt;li&gt;Google Cloud Console, “Claude 3.5 Sonnet — Vertex AI (Model Garden).”
&lt;/li&gt;
&lt;li&gt;Anthropic, “Introducing Claude 3.5 Sonnet,” and “Claude 3.5 Sonnet Model Card Addendum.”
&lt;/li&gt;
&lt;li&gt;Google Cloud — Vertex AI Docs, “Gemini 1.5 Pro.”
&lt;/li&gt;
&lt;li&gt;Kapler AI Report, “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.”&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>database</category>
    </item>
    <item>
      <title>MAIL: Multi-layer Attentional Interception Layer for Deep Learning Networks with Multiple Inputs and Multiple Outputs (MIMO-DL)</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Tue, 17 Jun 2025 17:53:41 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/mail-multi-layer-attentional-interception-layer-for-deep-learning-networks-with-multiple-inputs-1eh1</link>
      <guid>https://forem.com/lucash_ribeiro_dev/mail-multi-layer-attentional-interception-layer-for-deep-learning-networks-with-multiple-inputs-1eh1</guid>
      <description>&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; [Lucas Ribeiro]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; June 17, 2025&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Deep Learning networks with Multiple Inputs and Multiple Outputs (MIMO-DL) are increasingly used in complex domains requiring the processing of diverse input data streams to generate multiple predictions or inferences. However, the inherent complexity of these architectures often results in "black-box" models, making it difficult to interpret how specific inputs influence corresponding outputs. This paper proposes a novel mechanism called &lt;strong&gt;Multi-layer Attentional Interception Layer (MAIL)&lt;/strong&gt;. MAIL is a customizable layer that can be integrated into MIMO-DL architectures to provide granular interpretability, allowing for the "interception" and analysis of learned interactions between subsets of specific inputs and outputs. We present the theoretical formulation of MAIL, a detailed Python implementation using TensorFlow/Keras, and discuss its potential to advance the interpretability of MIMO-DL systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; Multiple Inputs Multiple Outputs (MIMO), Deep Learning, Interpretability, Attention, Neural Networks, Keras, Python, XAI (Explainable AI).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Deep neural networks have demonstrated remarkable success across a wide range of applications. Particularly, systems with Multiple Inputs and Multiple Outputs (MIMO) are essential in scenarios where diverse sources of information need to be processed to generate a set of responses or predictions. Examples include recommendation systems, robotics, signal processing in telecommunications (e.g., Massive MIMO), and modeling complex systems in healthcare.&lt;/p&gt;

&lt;p&gt;Despite their predictive power, the interpretability of deep learning models, especially MIMO ones, remains a significant challenge. The ability to understand &lt;em&gt;which&lt;/em&gt; inputs or input features are most influential for &lt;em&gt;which&lt;/em&gt; specific outputs is crucial for model debugging, domain knowledge validation, trust-building, and ensuring fairness. Traditional interpretability approaches often provide global insights or are applied post-hoc, potentially not fully capturing the specific internal dynamics of input-output pathways in MIMO systems.&lt;/p&gt;

&lt;p&gt;Attention mechanisms have proven effective in highlighting relevant parts of the input that contribute to a given output, particularly in natural language processing and computer vision tasks. Inspired by this success, we propose the &lt;strong&gt;Multi-layer Attentional Interception Layer (MAIL)&lt;/strong&gt;, a neural layer designed to be integrated into MIMO-DL models. MAIL aims to explicitly learn and expose attention weights governing the relationships between groups of specific inputs and outputs, allowing for a clear "interception" of these influences.&lt;/p&gt;

&lt;p&gt;Our contributions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The formulation of a new attentional layer, MAIL, for MIMO-DL systems.&lt;/li&gt;
&lt;li&gt;  A detailed implementation of the MAIL layer in Python using TensorFlow/Keras, demonstrating its practical applicability.&lt;/li&gt;
&lt;li&gt;  A discussion on how MAIL can be utilized to enhance interpretability and facilitate the analysis of MIMO-DL models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Related Works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1. MIMO Neural Networks&lt;/strong&gt;&lt;br&gt;
MIMO architectures in deep learning vary considerably, from simple concatenations of input feature vectors processed by a shared network to more complex structures with multiple processing branches that eventually merge or generate independent outputs. The Keras Functional API, for example, facilitates the creation of such models. The central challenge lies in managing and interpreting the flow of information through these multiple pathways. Works like MixMo explore ways to mix multiple inputs for multiple outputs through sub-networks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.2. Attention Mechanisms&lt;/strong&gt;&lt;br&gt;
Attention mechanisms were introduced to allow models to focus on specific parts of the input sequence when generating an output. The core concept involves calculating attention weights (scores) which are then used to create a weighted representation of the inputs. Variations such as self-attention and multi-head attention have become fundamental components of state-of-the-art architectures like Transformers. The application of attention in MIMO systems, while promising, is still a developing area, with some research focused on specific applications like channel estimation in wireless communications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.3. Interpretability in Deep Learning (XAI)&lt;/strong&gt;&lt;br&gt;
Interpretability in machine learning, and more specifically in deep learning, is an active research field. XAI methods can be broadly categorized into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Inherently interpretable models:&lt;/strong&gt; Models like shallow decision trees, linear regression, or generalized additive models (GAMs).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Post-hoc methods:&lt;/strong&gt; Techniques that explain an already trained model, such as LIME, SHAP, or gradient-based analysis.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Attention-based methods:&lt;/strong&gt; Where the attention weights themselves can serve as a form of explanation, indicating which parts of the input were considered important.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Researchers from institutions like Stanford have actively explored interpretability, including optimizing models to be inherently interpretable or developing new explanation techniques. Our work aligns with the idea of building interpretability directly into the model's architecture through custom attention mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Proposed Methodology: MAIL (Multi-layer Attentional Interception Layer)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We propose a MAIL layer that can be inserted into a MIMO-DL architecture. The core idea is that for a set of &lt;code&gt;N&lt;/code&gt; input streams and &lt;code&gt;M&lt;/code&gt; desired output streams, the MAIL layer will learn attentional representations that explicitly model the contribution of each input stream (or a processed combination thereof) to each output stream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conceptual Architecture of MAIL:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Multiple Inputs:&lt;/strong&gt; The layer accepts a list of input tensors &lt;code&gt;[X_1, X_2, ..., X_N]&lt;/code&gt;, where each &lt;code&gt;X_i&lt;/code&gt; represents a distinct data stream.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Input Processing/Combination (Optional, but Recommended):&lt;/strong&gt; Before the main attention mechanism, inputs can be processed individually (e.g., by CNNs, RNNs, or Dense layers) and/or combined (e.g., concatenation, weighted sum). To simplify the initial presentation of MAIL, we will assume that the inputs are concatenated, forming a tensor &lt;code&gt;X_concat&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Attention Heads per Output:&lt;/strong&gt; For each of the &lt;code&gt;M&lt;/code&gt; output streams, MAIL instantiates a dedicated "attention head." Each attention head &lt;code&gt;j&lt;/code&gt; (for &lt;code&gt;j=1...M&lt;/code&gt;) is responsible for learning a set of attention weights &lt;code&gt;alpha_j&lt;/code&gt; over &lt;code&gt;X_concat&lt;/code&gt;. These weights indicate the relevance of different features in &lt;code&gt;X_concat&lt;/code&gt; for generating the output &lt;code&gt;Y_j&lt;/code&gt;.

&lt;ul&gt;
&lt;li&gt;  Mathematically, for each output head &lt;code&gt;j&lt;/code&gt;, the attention weights &lt;code&gt;alpha_j&lt;/code&gt; can be calculated, for example, through a small neural network (e.g., a Dense layer with softmax activation) that maps &lt;code&gt;X_concat&lt;/code&gt; to the weights:
&lt;code&gt;e_j = Dense_j(X_concat)&lt;/code&gt;
&lt;code&gt;alpha_j = softmax(e_j)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Attention Application:&lt;/strong&gt; The attention weights &lt;code&gt;alpha_j&lt;/code&gt; are then used to modulate &lt;code&gt;X_concat&lt;/code&gt;, creating a representation &lt;code&gt;C_j&lt;/code&gt; (context vector) specific to output &lt;code&gt;j&lt;/code&gt;:
&lt;code&gt;C_j = alpha_j * X_concat&lt;/code&gt; (element-wise multiplication)&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Generation of Multiple Outputs:&lt;/strong&gt; Each context vector &lt;code&gt;C_j&lt;/code&gt; is then processed by an output sub-network (e.g., one or more Dense layers) to produce the final output &lt;code&gt;Y_j&lt;/code&gt;.
&lt;code&gt;Y_j = OutputDense_j(C_j)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Interception:&lt;/strong&gt; The learned attention weights &lt;code&gt;alpha_j&lt;/code&gt; for each output head can be extracted and visualized. This allows for "intercepting" and analyzing which parts of the concatenated inputs (and, by extension, the original input streams if the mapping is clear) were considered most important for each specific output task.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture allows the model to dynamically learn to prioritize different aspects of the combined inputs for each of its output tasks. "Interceptability" comes from the ability to inspect the &lt;code&gt;alpha_j&lt;/code&gt; vectors, which provide a proxy for the importance of input features for each output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Implementation in Python with TensorFlow/Keras&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Below, we present an implementation of the MAIL layer using the Keras Functional API and the ability to create custom layers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Layer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Input&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MIMOAttentionLayer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Layer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Multi-layer Attentional Interception Layer (MAIL)
    This layer receives multiple inputs, concatenates them, and then applies
    separate attention mechanisms to generate multiple outputs.
    Attention weights can be extracted for interpretability.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_output_streams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_stream_dims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attention_hidden_units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Args:
            num_output_streams (int): The number of desired output streams (M).
            output_stream_dims (list or tuple): A list/tuple containing the dimensionality
                                                 of each output stream.
                                                 Ex: (64, 32) for two outputs with 64 and 32 dims.
            attention_hidden_units (int, optional): Number of units in the internal dense layer
                                                    used to calculate attention scores.
                                                    If None, uses the concatenated input dimension.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_stream_dims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_stream_dims&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;num_output_streams&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;`output_stream_dims` must be a list or tuple with `num_output_streams` elements.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_output_streams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;num_output_streams&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_stream_dims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;output_stream_dims&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_hidden_units&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attention_hidden_units&lt;/span&gt;

        &lt;span class="c1"&gt;# Lists to store attention and output layers for each stream
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_score_layers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_processing_layers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;learned_attention_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="c1"&gt;# To store attention weights
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Defines the layer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s weights.
        Args:
            input_shape (list of tuples): A list of shapes of the input tensors.
                                          Ex: [(None, 128), (None, 64)] for two inputs.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Input to MIMOAttentionLayer must be a list of tensors.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Assume inputs will be concatenated. Calculate concatenated dimension.
&lt;/span&gt;        &lt;span class="c1"&gt;# input_shape[i][-1] gets the last dimension (features) of each input tensor.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenated_input_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;attention_units&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_hidden_units&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_hidden_units&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenated_input_dim&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_output_streams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Layer to calculate attention scores for output stream i
&lt;/span&gt;            &lt;span class="c1"&gt;# These scores will be used to weight the concatenated input
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_score_layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attention_units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tanh&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attention_scorer_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Final Dense layer to generate attention weights (with softmax over features)
&lt;/span&gt;            &lt;span class="c1"&gt;# Could also be a layer generating a single weight per feature, or a set of weights
&lt;/span&gt;            &lt;span class="c1"&gt;# Here, for simplicity, attention will modulate features of the concatenated input.
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_score_layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;concatenated_input_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;softmax&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;attention_weights_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Processing layer to generate the final output of stream i
&lt;/span&gt;            &lt;span class="c1"&gt;# from the concatenated input weighted by attention
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_processing_layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_stream_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;linear&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_stream_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Layer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s processing logic (forward pass).
        Args:
            inputs (list of Tensors): List of input tensors.
        Returns:
            list of Tensors: List of output tensors, one for each stream.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Input to MIMOAttentionLayer must be a list of tensors.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;concatenated_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;concatenate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;concatenated_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# If only one input (list with one tensor)
&lt;/span&gt;
        &lt;span class="n"&gt;output_streams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;current_attention_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="c1"&gt;# Stores weights for this call
&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_output_streams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Calculate attention scores
&lt;/span&gt;            &lt;span class="c1"&gt;# The attention architecture here is simple; can be more complex (e.g., Bahdanau-style)
&lt;/span&gt;            &lt;span class="c1"&gt;# attention_scorer_idx = i * 2 (to get the first Dense of the i-th head)
&lt;/span&gt;            &lt;span class="c1"&gt;# attention_weights_idx = i * 2 + 1 (to get the second Dense of the i-th head)
&lt;/span&gt;
            &lt;span class="c1"&gt;# A simplified form: each attention head learns to weight the features of the concatenated input
&lt;/span&gt;            &lt;span class="c1"&gt;# for its respective output task.
&lt;/span&gt;            &lt;span class="n"&gt;attention_hidden&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_score_layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;concatenated_inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# (batch_size, attention_units)
&lt;/span&gt;            &lt;span class="n"&gt;attention_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_score_layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;attention_hidden&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# (batch_size, concatenated_input_dim)
&lt;/span&gt;            &lt;span class="n"&gt;current_attention_weights&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attention_weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Apply attention weights to the concatenated input
&lt;/span&gt;            &lt;span class="c1"&gt;# Element-wise multiplication (Hadamard product)
&lt;/span&gt;            &lt;span class="n"&gt;attended_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;concatenated_inputs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;attention_weights&lt;/span&gt;

            &lt;span class="c1"&gt;# Process the weighted input to generate output stream i
&lt;/span&gt;            &lt;span class="n"&gt;stream_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_processing_layers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;attended_inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;output_streams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Store attention weights for possible external inspection
&lt;/span&gt;        &lt;span class="c1"&gt;# Note: self.learned_attention_weights would accumulate across batches if not reset
&lt;/span&gt;        &lt;span class="c1"&gt;# For inspection during or after training, it's better to get via model.get_layer().output
&lt;/span&gt;        &lt;span class="c1"&gt;# or callbacks. Here, we just store the last set for example purposes.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;learned_attention_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_attention_weights&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output_streams&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_output_streams&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_output_streams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_stream_dims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_stream_dims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attention_hidden_units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attention_hidden_units&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;from_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example of MAIL layer usage:
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Defining model inputs
&lt;/span&gt;    &lt;span class="n"&gt;input_a_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;
    &lt;span class="n"&gt;input_b_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;
    &lt;span class="n"&gt;input_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_a_dim&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input_A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;input_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_b_dim&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input_B&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Defining desired outputs
&lt;/span&gt;    &lt;span class="c1"&gt;# Output 1: Regression with 10 values
&lt;/span&gt;    &lt;span class="c1"&gt;# Output 2: Binary classification (1 value with sigmoid, or 2 with softmax)
&lt;/span&gt;    &lt;span class="c1"&gt;# Output 3: Regression with 5 values
&lt;/span&gt;    &lt;span class="n"&gt;num_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="n"&gt;output_dims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# Dimensionality of each output
&lt;/span&gt;
    &lt;span class="c1"&gt;# Instantiating the MAIL layer
&lt;/span&gt;    &lt;span class="c1"&gt;# mail_layer = MIMOAttentionLayer(num_output_streams=num_outputs,
&lt;/span&gt;    &lt;span class="c1"&gt;#                                 output_stream_dims=output_dims,
&lt;/span&gt;    &lt;span class="c1"&gt;#                                 attention_hidden_units=32,
&lt;/span&gt;    &lt;span class="c1"&gt;#                                 name='mail_processing')
&lt;/span&gt;
    &lt;span class="c1"&gt;# Applying MAIL layer to inputs
&lt;/span&gt;    &lt;span class="c1"&gt;# output_streams = mail_layer([input_a, input_b])
&lt;/span&gt;
    &lt;span class="c1"&gt;# If individual processing before MAIL is desired:
&lt;/span&gt;    &lt;span class="n"&gt;processed_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;input_a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;processed_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;input_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;mail_layer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MIMOAttentionLayer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_output_streams&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_outputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;output_stream_dims&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;attention_hidden_units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Adjust according to concatenated dim (64+32=96)
&lt;/span&gt;                                    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mail_processing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;output_streams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mail_layer&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;processed_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processed_b&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


    &lt;span class="c1"&gt;# Renaming outputs for clarity (optional, but good for `model.summary()`)
&lt;/span&gt;    &lt;span class="c1"&gt;# and applying final activations if needed (MAIL layer used 'linear' by default)
&lt;/span&gt;    &lt;span class="n"&gt;output_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;linear&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_Reg10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;output_streams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;# Already done in layer, but can be redone/adjusted
&lt;/span&gt;    &lt;span class="n"&gt;output_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sigmoid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_ClassBin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;output_streams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;output_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;linear&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_Reg5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;output_streams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Creating the model
&lt;/span&gt;    &lt;span class="c1"&gt;# model = Model(inputs=[input_a, input_b], outputs=output_streams) # Using direct outputs from MAIL
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;input_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_b&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;output_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


    &lt;span class="c1"&gt;# Compiling the model
&lt;/span&gt;    &lt;span class="c1"&gt;# Each output can have its own loss function and metrics
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_Reg10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mse&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_ClassBin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;binary_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_Reg5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mae&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                  &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_ClassBin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;

    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Generating dummy data for testing
&lt;/span&gt;    &lt;span class="n"&gt;num_samples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="n"&gt;X_a_dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_a_dim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;X_b_dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_b_dim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;Y_1_dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;Y_2_dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
    &lt;span class="n"&gt;Y_3_dummy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_samples&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Training the model
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Starting dummy training...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;X_a_dummy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_b_dummy&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_Reg10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Y_1_dummy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_ClassBin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Y_2_dummy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_Reg5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Y_3_dummy&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dummy training completed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# "Intercepting" attention weights after training (example)
&lt;/span&gt;    &lt;span class="c1"&gt;# For more robust analysis, attention weights should be collected
&lt;/span&gt;    &lt;span class="c1"&gt;# during prediction or using a Keras Callback.
&lt;/span&gt;    &lt;span class="c1"&gt;# The `mail_layer.learned_attention_weights` variable contains weights from the last processed batch.
&lt;/span&gt;
    &lt;span class="c1"&gt;# To get attention weights for a new dataset:
&lt;/span&gt;    &lt;span class="c1"&gt;# Create an intermediate model that also returns the attention weights.
&lt;/span&gt;    &lt;span class="c1"&gt;# attention_model_outputs = [model.get_layer('mail_processing').output] # List of lists of weights
&lt;/span&gt;    &lt;span class="c1"&gt;# Attention outputs are in mail_layer.learned_attention_weights,
&lt;/span&gt;    &lt;span class="c1"&gt;# which is a list (for each output_stream) of tensors (batch_size, concatenated_input_dim)
&lt;/span&gt;
    &lt;span class="c1"&gt;# Correction: To get attention weights as model output, we need to define a model that exposes them.
&lt;/span&gt;    &lt;span class="c1"&gt;# The MAIL layer stores the weights from the last call in `self.learned_attention_weights`,
&lt;/span&gt;    &lt;span class="c1"&gt;# but this is not ideal for systematic extraction.
&lt;/span&gt;    &lt;span class="c1"&gt;# A better way is to modify the layer's `call` to return the weights
&lt;/span&gt;    &lt;span class="c1"&gt;# or create a model that has attention weights as one of its outputs.
&lt;/span&gt;
    &lt;span class="c1"&gt;# Example of how to build a model to extract attention weights:
&lt;/span&gt;    &lt;span class="c1"&gt;# Assuming the 'mail_processing' layer was built as above.
&lt;/span&gt;    &lt;span class="c1"&gt;# We need the layer's `call` to return the weights or have access
&lt;/span&gt;    &lt;span class="c1"&gt;# to the outputs of the attention sublayers.
&lt;/span&gt;
    &lt;span class="c1"&gt;# Let's get the names of the attention weight layers within MAIL
&lt;/span&gt;    &lt;span class="c1"&gt;# attention_weight_layer_names = []
&lt;/span&gt;    &lt;span class="c1"&gt;# for i in range(num_outputs):
&lt;/span&gt;    &lt;span class="c1"&gt;#     attention_weight_layer_names.append(f'attention_weights_{i}') # Dense layer with softmax
&lt;/span&gt;
    &lt;span class="c1"&gt;# Accessing the outputs of attention layers directly from the trained model
&lt;/span&gt;    &lt;span class="c1"&gt;# (assuming the Dense sublayers generating weights were named accordingly)
&lt;/span&gt;    &lt;span class="c1"&gt;# This requires sublayers to be accessible. In the current implementation, they are class attributes.
&lt;/span&gt;
    &lt;span class="c1"&gt;# A cleaner approach to extracting weights:
&lt;/span&gt;    &lt;span class="c1"&gt;# Create a new model that has attention outputs as outputs.
&lt;/span&gt;    &lt;span class="c1"&gt;# The outputs of the Dense layers that calculate attention weights (softmax)
&lt;/span&gt;    &lt;span class="c1"&gt;# within the MAIL layer can be exposed.
&lt;/span&gt;    &lt;span class="c1"&gt;# mail_layer_instance = model.get_layer('mail_processing')
&lt;/span&gt;    &lt;span class="c1"&gt;# attention_outputs_for_extraction = []
&lt;/span&gt;    &lt;span class="c1"&gt;# for i in range(num_outputs):
&lt;/span&gt;    &lt;span class="c1"&gt;#     # Accessing the named sublayers
&lt;/span&gt;    &lt;span class="c1"&gt;#     # The name would be mail_processing/attention_weights_0, etc., if built within the model's scope.
&lt;/span&gt;    &lt;span class="c1"&gt;#     # In our case, sublayers are in the self.attention_score_layers list
&lt;/span&gt;    &lt;span class="c1"&gt;#     attention_weight_sub_layer = mail_layer_instance.attention_score_layers[i*2 + 1] # The Dense with softmax
&lt;/span&gt;    &lt;span class="c1"&gt;#     attention_outputs_for_extraction.append(attention_weight_sub_layer.output)
&lt;/span&gt;
    &lt;span class="c1"&gt;# if attention_outputs_for_extraction:
&lt;/span&gt;    &lt;span class="c1"&gt;#     attention_extractor_model = Model(inputs=model.inputs, outputs=model.outputs + attention_outputs_for_extraction)
&lt;/span&gt;    &lt;span class="c1"&gt;#     predictions_and_attentions = attention_extractor_model.predict([X_a_dummy[:5], X_b_dummy[:5]])
&lt;/span&gt;
    &lt;span class="c1"&gt;#     main_predictions = predictions_and_attentions[:num_outputs]
&lt;/span&gt;    &lt;span class="c1"&gt;#     extracted_attention_weights = predictions_and_attentions[num_outputs:]
&lt;/span&gt;
    &lt;span class="c1"&gt;#     print(f"\nExtracting attention weights for {len(extracted_attention_weights)} output streams:")
&lt;/span&gt;    &lt;span class="c1"&gt;#     for i, weights in enumerate(extracted_attention_weights):
&lt;/span&gt;    &lt;span class="c1"&gt;#         print(f"  Attention weights for Output {i+1} (shape: {weights.shape}):\n  {weights[0][:10]}...") # First 10 features of the first example
&lt;/span&gt;    &lt;span class="c1"&gt;# else:
&lt;/span&gt;    &lt;span class="c1"&gt;#     print("\nCould not extract attention weights this way. Check layer structure.")
&lt;/span&gt;
    &lt;span class="c1"&gt;# Simpler way to access weights from the last batch processed by the layer instance:
&lt;/span&gt;    &lt;span class="n"&gt;last_batch_attention_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mail_layer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;learned_attention_weights&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;last_batch_attention_weights&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Attention weights from the last processed batch (accessed from layer instance):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_batch_attention_weights&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Attention weights for Output &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (shape: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Implementation Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;__init__&lt;/code&gt;&lt;/strong&gt;: Initializes the number of output streams, their dimensions, and optional hidden units for the attention layers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;build&lt;/code&gt;&lt;/strong&gt;: Creates the necessary sublayers. For each output stream, two sequential &lt;code&gt;Dense&lt;/code&gt; layers (one with &lt;code&gt;tanh&lt;/code&gt; and another with &lt;code&gt;softmax&lt;/code&gt; over the concatenated input dimension) are created to calculate attention weights, and one &lt;code&gt;Dense&lt;/code&gt; layer to process the weighted input and generate the stream's output.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;call&lt;/code&gt;&lt;/strong&gt;:

&lt;ol&gt;
&lt;li&gt; Inputs are concatenated (if multiple).&lt;/li&gt;
&lt;li&gt; For each output stream &lt;code&gt;i&lt;/code&gt;:

&lt;ul&gt;
&lt;li&gt;  Attention weights (&lt;code&gt;attention_weights&lt;/code&gt;) are calculated using the corresponding &lt;code&gt;Dense&lt;/code&gt; sublayers, applying &lt;code&gt;softmax&lt;/code&gt; so weights sum to 1 (or behave like importance probabilities) over the features of the concatenated input.&lt;/li&gt;
&lt;li&gt;  The concatenated input is weighted by element-wise multiplication with &lt;code&gt;attention_weights&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  The weighted input (&lt;code&gt;attended_inputs&lt;/code&gt;) is passed through the output processing layer to generate &lt;code&gt;stream_output&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt; Calculated attention weights (&lt;code&gt;current_attention_weights&lt;/code&gt;) are stored in the instance variable &lt;code&gt;self.learned_attention_weights&lt;/code&gt; for inspection (mainly useful for the last processed batch).&lt;/li&gt;

&lt;li&gt; Returns a list of output tensors.&lt;/li&gt;

&lt;/ol&gt;

&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;&lt;code&gt;get_config&lt;/code&gt; / &lt;code&gt;from_config&lt;/code&gt;&lt;/strong&gt;: Allow the layer to be serialized and deserialized by Keras.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Usage Example&lt;/strong&gt;: Demonstrates how to instantiate the MAIL layer in a Keras model with two inputs and three outputs, compile it, and train it with dummy data. It also outlines how attention weights could be extracted, highlighting that the most robust way is to build a model that explicitly returns these weights as part of its outputs.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Experiments and Results (Conceptual)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To validate the MAIL layer, a set of hypothetical experiments would be conducted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Dataset:&lt;/strong&gt; A synthetic or real dataset with multiple heterogeneous inputs (e.g., tabular data, time series, text embeddings) and multiple output tasks (e.g., one regression and two classifications). For example, in an industrial predictive maintenance scenario:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Inputs:&lt;/strong&gt; Sensor data (vibration, temperature, pressure), maintenance logs (text processed into embeddings), machine specifications (tabular).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Outputs:&lt;/strong&gt; Risk of failure (regression), probable failure type (classification), remaining useful life (regression).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Baseline Model:&lt;/strong&gt; A standard MIMO-DL architecture without the MAIL layer (e.g., simple concatenation of processed inputs followed by branches for each output).&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Model with MAIL:&lt;/strong&gt; The same baseline architecture but with the MAIL layer inserted before the output branches.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Metrics:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Task Performance:&lt;/strong&gt; Appropriate metrics for each output task (e.g., MSE for regression, Accuracy/F1-score for classification).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Interpretability:&lt;/strong&gt; Qualitative analysis of attention weights (&lt;code&gt;alpha_j&lt;/code&gt;). Visualizations of the weights can show which input features (or which original input streams, if the mapping is clear after concatenation) receive the most attention for each output task. For example, it is expected that to predict "failure type," "maintenance logs" might receive higher attention, while for "risk of failure," "sensor data" would be more heavily weighted.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Expected Results:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;  The model with MAIL should achieve comparable or slightly superior performance to the baseline model, due to attention's ability to focus on relevant features.&lt;/li&gt;
&lt;li&gt;  Analysis of attention weights should provide insights into the input-output relationships learned by the model, ideally aligning with domain knowledge or revealing new interactions. For example, if input &lt;code&gt;X_1&lt;/code&gt; is consistently weighted more heavily for output &lt;code&gt;Y_1&lt;/code&gt; than for &lt;code&gt;Y_2&lt;/code&gt;, this provides interpretable evidence of information flow specialization.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. Discussion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The proposed MAIL layer offers a mechanism to dissect the complex interactions within MIMO-DL models. By forcing the model to learn explicit attention weights for each output pathway, we gain a window into its internal workings. "Intercepting" these weights allows researchers and practitioners to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Validate model behavior:&lt;/strong&gt; Verify if the model is focusing on relevant features as expected by domain knowledge.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Discover new relationships:&lt;/strong&gt; Identify unexpected interactions between inputs and outputs that could lead to new hypotheses.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Debug the model:&lt;/strong&gt; If a specific output is underperforming, analyzing attention weights might indicate whether the model is failing to attend to the correct inputs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Improve model architecture:&lt;/strong&gt; Insights about feature importance can guide feature engineering or the design of more efficient architectures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Interpretability provided by attention weights is not a definitive causal explanation but rather a correlation learned by the model.&lt;/li&gt;
&lt;li&gt;  If inputs are extensively pre-processed and transformed before the MAIL layer, mapping attention weights back to original features can be complex.&lt;/li&gt;
&lt;li&gt;  The complexity of the MAIL layer itself increases with the number of input/output streams and data dimensionality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Future Work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Explore more sophisticated attention mechanisms within the MAIL layer (e.g., location-based attention, hierarchical self-attention between input streams).&lt;/li&gt;
&lt;li&gt;  Develop more advanced visualization methods for attention weights in MIMO contexts.&lt;/li&gt;
&lt;li&gt;  Apply MAIL to real-world problems in domains like healthcare, finance, and autonomous systems to evaluate its practical utility.&lt;/li&gt;
&lt;li&gt;  Integrate the MAIL layer with other XAI techniques to obtain richer and more robust explanations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;7. Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The MAIL (Multi-layer Attentional Interception Layer) is a novel approach for embedding interpretability into deep neural networks with multiple inputs and multiple outputs. By explicitly learning the relevance of inputs to specific outputs through dedicated attention heads, MAIL allows for the "interception" and analysis of these relationships. The provided Python implementation demonstrates the feasibility of integrating such a layer into existing deep learning workflows. We believe MAIL represents a step towards more transparent and understandable MIMO-DL models, facilitating their adoption in critical applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. References&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Bahdanau, D., Cho, K., &amp;amp; Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. &lt;em&gt;arXiv preprint arXiv:1409.0473.&lt;/em&gt; (Reference for original attention)&lt;/li&gt;
&lt;li&gt;  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... &amp;amp; Polosukhin, I. (2017). Attention is all you need. &lt;em&gt;Advances in neural information processing systems, 30.&lt;/em&gt; (Reference for Transformers and Multi-Head Attention)&lt;/li&gt;
&lt;li&gt;  Galassi, A., Lippi, M., &amp;amp; Torroni, P. (2020). Attention in natural language processing. &lt;em&gt;IEEE Transactions on Neural Networks and Learning Systems, 32(10), 4291-4313.&lt;/em&gt; (Survey on attention in NLP)&lt;/li&gt;
&lt;li&gt;  Chaudhari, S., Mithal, V., Polatkan, G., Ramanath, R., &amp;amp; Bera, A. (2021). An attentive survey of attention models. &lt;em&gt;ACM Transactions on Intelligent Systems and Technology (TIST), 12(5), 1-32.&lt;/em&gt; (Comprehensive survey on attention models)&lt;/li&gt;
&lt;li&gt;  Samek, W., Wiegand, T., &amp;amp; Müller, K. R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. &lt;em&gt;ITU Journal: ICT Discoveries, 1(1), 39-48.&lt;/em&gt; (Overview of XAI)&lt;/li&gt;
&lt;li&gt;  TensorFlow Core. &lt;em&gt;Attention layers&lt;/em&gt;. (Accessed on June 2025). Available at: &lt;a href="https://dev.toTensorFlow%20documentation%20for%20attention%20layers"&gt;https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Rame, A., &amp;amp; Cord, M. (2021). MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks. &lt;em&gt;Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).&lt;/em&gt; (Example of MIMO architecture)&lt;/li&gt;
&lt;li&gt;  Xu, D., Cheng, W., Luo, D., Liu, X., &amp;amp; Zhang, X. (2019). A Survey on Multi-output Learning. &lt;em&gt;arXiv preprint arXiv:1907.10042.&lt;/em&gt; (Survey on multi-output learning).&lt;/li&gt;
&lt;li&gt;  Sabath, A. (2021). Scikeras Tutorial: A MIMO Wrapper for CapsNet Hyperparameter Tuning with Keras. &lt;em&gt;Towards Data Science.&lt;/em&gt; (Use of Keras for MIMO).&lt;/li&gt;
&lt;li&gt;  Bhatia, S. (N.d.). Combining Multiple Features and Multiple Outputs Using Keras Functional API. &lt;em&gt;Analytics Vidhya.&lt;/em&gt; (Example of Keras API for MIMO).&lt;/li&gt;
&lt;li&gt;  MathWorks. &lt;em&gt;Import Keras Layers&lt;/em&gt;. (Accessed on June 2025). (Support for MIMO in tools).&lt;/li&gt;
&lt;li&gt;  Zhang, C., Li, Y., Liu, P., &amp;amp; Li, G. Y. (2021). An Attention-Aided Deep Learning Framework for Massive MIMO Channel Estimation. &lt;em&gt;arXiv preprint arXiv:2108.09605.&lt;/em&gt; (Attention in MIMO for communications).&lt;/li&gt;
&lt;li&gt;  Yu, W. (2021). &lt;em&gt;A Learning Approach to the Optimization of Massive MIMO Systems&lt;/em&gt;. (Seminar Video, Stanford or similar, on DL in Massive MIMO).&lt;/li&gt;
&lt;li&gt;  Gregor, K., &amp;amp; LeCun, Y. (2010). Learning Fast Approximations of Sparse Coding. &lt;em&gt;ICML.&lt;/em&gt; (Reference for "unrolling" which can inspire interpretability). (The paper "Algorithm Unrolling: Interpretable, Efficient Deep Learning..." discusses how "unrolling" iterative algorithms can lead to more interpretable DL architectures.)&lt;/li&gt;
&lt;li&gt;  Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. &lt;em&gt;Nature Machine Intelligence, 1(5), 206-215.&lt;/em&gt; (Advocacy for inherently interpretable models). (Prof. Cynthia Rudin's lab focuses on interpretability).&lt;/li&gt;
&lt;li&gt;  DataCamp. (2024). &lt;em&gt;What is Attention and Why Do LLMs and Transformers Need It?&lt;/em&gt; (Article explaining attention).&lt;/li&gt;
&lt;li&gt;  Wu, M. (2022). &lt;em&gt;Optimizing for Interpretability in Deep Neural Networks&lt;/em&gt;. (Stanford Seminar on interpretability).&lt;/li&gt;
&lt;li&gt;  Fraunhofer HHI. &lt;em&gt;Interpretable Machine Learning&lt;/em&gt;. (Research page on XAI).&lt;/li&gt;
&lt;li&gt;  Nguyen, T. H. D., et al. (2023). On the Combination of Multi-Input and Self-Attention for Sign Language Recognition. &lt;em&gt;International Conference on Applied Science and Engineering (ICASE).&lt;/em&gt; (Combination of Multi-Input and Attention).&lt;/li&gt;
&lt;li&gt;  Hasan, M. K., et al. (2023). Implementation of the deep learning method for signal detection in massive-MIMO-NOMA systems. &lt;em&gt;Scientific Reports.&lt;/em&gt; (DL in Massive MIMO systems).&lt;/li&gt;
&lt;li&gt;  OpenReview. (2022). &lt;em&gt;MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition&lt;/em&gt;. (Paper on MIMONets).&lt;/li&gt;
&lt;li&gt;  Analytics Vidhya. (2025). &lt;em&gt;Understanding Attention Mechanisms Using Multi-Head Attention&lt;/em&gt;. (Article on Multi-Head Attention).&lt;/li&gt;
&lt;li&gt;  SRI Lab, EPFL. &lt;em&gt;Reliable and Interpretable Artificial Intelligence&lt;/em&gt;. (Research focus on reliable and interpretable AI).&lt;/li&gt;
&lt;li&gt;  GeeksforGeeks. (2025). &lt;em&gt;Multi-Head Attention Mechanism&lt;/em&gt;. (Tutorial on Multi-Head Attention).&lt;/li&gt;
&lt;li&gt;  Lakkaraju, H. (2022). &lt;em&gt;Stanford Seminar - ML Explainability Part 2 I Inherently Interpretable Models&lt;/em&gt;. (Stanford seminar video on interpretable models).&lt;/li&gt;
&lt;li&gt;  Pal, S. &amp;amp; Gulli, A. (2017). &lt;em&gt;2 ways to customize your deep learning models with Keras&lt;/em&gt;. Packt. (On customization in Keras).&lt;/li&gt;
&lt;li&gt;  TensorFlow Core. &lt;em&gt;Custom layers&lt;/em&gt;. (Accessed on June 2025). Available at: &lt;a href="https://dev.toDocumentation%20on%20custom%20layers"&gt;https://www.tensorflow.org/guide/keras/custom_layers&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Explainable Artificial Intelligence (XAI) for Deep Learning Models: A Comprehensive Review</title>
      <dc:creator>Lucas Ribeiro</dc:creator>
      <pubDate>Mon, 16 Jun 2025 11:56:51 +0000</pubDate>
      <link>https://forem.com/lucash_ribeiro_dev/explainable-artificial-intelligence-xai-for-deep-learning-models-a-comprehensive-review-3pbk</link>
      <guid>https://forem.com/lucash_ribeiro_dev/explainable-artificial-intelligence-xai-for-deep-learning-models-a-comprehensive-review-3pbk</guid>
      <description>&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; Deep Learning, Explainable Artificial Intelligence, XAI, Model Interpretability, Black-Box Models, Machine Learning, Algorithmic Transparency, XAI Evaluation.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;1. Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Deep Learning (DL) has emerged as a transformative force across numerous scientific and technological disciplines, achieving unprecedented success in complex tasks such as natural language processing, computer vision, and the analysis of structured and unstructured data. Large-scale generative models, for example, demonstrate a remarkable ability to synthesize high-resolution images and texts, as well as more complex data like videos and molecules. The sophistication and computational power of these models, exemplified by Large Language Models (LLMs) and diffusion models , are driving significant advancements. However, the very complexity that enables their superior performance also introduces substantial challenges regarding the understanding of their internal decision-making mechanisms. The growing capability and autonomy of these algorithmic systems intensify the demand for transparency, as their integration into critical processes makes the need to understand their underlying logic proportionally more vital.&lt;/p&gt;

&lt;p&gt;Many DL models, despite their remarkable performance, operate as "black boxes," offering little to no visibility into the internal logic that governs their predictions or decisions. This opacity is not merely a technical inconvenience; it represents a fundamental barrier to trust, accountability, and the broader societal acceptance of Artificial Intelligence (AI). In legal contexts, for instance, the lack of transparency in decision-making processes can compromise the ability of judges to perform their duties effectively. Similarly, in critical domains like healthcare, the black-box nature is a significant obstacle to clinical adoption, where understanding why a decision was made is crucial. Interpretability, in this context, is defined as "the ability to explain or to present [the model's workings] in understandable terms to a human." The absence of this interpretability fosters skepticism and complicates the debugging of errors or the identification of biases, especially in applications where failures can have severe consequences.&lt;/p&gt;

&lt;p&gt;In response to this pressing need, Explainable Artificial Intelligence (XAI) has emerged as a subfield of AI dedicated to incorporating transparency, interpretability, and explainability into the results and processes of algorithmic models. Initiatives like the Defense Advanced Research Projects Agency (DARPA)'s XAI program seek to create AI systems whose learned models and decisions can be understood and reliably used by end-users. XAI is, therefore, crucial for building and maintaining trust in the implementation of AI systems, aiding in the understanding of model behavior and the identification of potential problems, such as algorithmic biases that can lead to unfair or discriminatory outcomes.&lt;/p&gt;

&lt;p&gt;This paper aims to conduct a critical and comprehensive review of recent advancements, diverse methodologies, applications in critical domains, persistent challenges, and future directions of XAI in the specific context of Deep Learning models. It will explore the conceptual foundations of XAI, its practical implementations in high-impact areas like healthcare, the inherent challenges in its application and evaluation, and the research perspectives that promise to shape the future of a more transparent, trustworthy, and human-aligned AI.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;2. Fundamentals of Explainable Artificial Intelligence (XAI)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explainable Artificial Intelligence (XAI) primarily aims to enhance the transparency and comprehensibility of decisions made by AI systems, making them accessible and intelligible to both specialized professionals and lay users. The ability to interpret AI models not only promotes trust and reliability but also allows practitioners to understand, verify, and validate the results generated by these models. The objectives of XAI transcend the mere generation of explanations; they encompass empowering humans to understand, appropriately trust, and effectively manage the new generation of AI partners. This includes debugging models, identifying and mitigating unwanted biases, ensuring compliance with regulatory and ethical requirements, and, fundamentally, fostering a more symbiotic and collaborative relationship between humans and machines. By providing transparency, XAI allows humans to understand the internal mechanisms of AI, building a foundation of trust essential for the verification and responsible use of these systems in complex workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1. Taxonomy of XAI Methods&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;XAI methods can be broadly categorized based on when explainability is considered in the model's lifecycle. The main technical distinction is between &lt;em&gt;ante-hoc&lt;/em&gt; methods, which are inherently explainable by design, and &lt;em&gt;post-hoc&lt;/em&gt; methods, which are applied to black-box models after their training to elucidate their decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1.1. Post-hoc Methods&lt;/strong&gt;&lt;br&gt;
Post-hoc methods are designed to analyze already trained models, seeking to explain their predictions or behaviors without altering the original model architecture.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Shapley Additive Explanations (SHAP):&lt;/strong&gt; Grounded in Shapley values from cooperative game theory, SHAP quantifies the individual contribution of each input feature to a specific prediction. Its versatility makes it applicable to a wide range of complex models, offering both local interpretability (for individual predictions) and global interpretability (for the overall model behavior). It is a widely used technique, especially in the healthcare sector for disease prediction. However, the calculation of Shapley values can be computationally intensive, and the interpretation of these values may vary depending on the intrinsic characteristics of the analyzed model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Interpretable Model-agnostic Explanations (LIME):&lt;/strong&gt; LIME focuses on explaining individual predictions by locally approximating the behavior of a black-box model with a simpler, interpretable model (like a linear regression). This surrogate model is trained on perturbations of the input instance one wishes to explain. Its model-agnostic nature and the intuitiveness of local explanations are its main advantages. However, LIME can exhibit instability due to the random sampling inherent in the perturbation process, which can lead to different explanations for very similar input instances. Additionally, its perturbation-based approach may face limitations when dealing with highly complex models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other post-hoc methods include gradient-based approaches, such as &lt;em&gt;Layer-wise Relevance Propagation&lt;/em&gt; (LRP) and &lt;em&gt;Class Activation Mapping&lt;/em&gt; (CAM), which use the model's gradients to infer feature importance, and various other techniques based on input perturbation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1.2. Ante-hoc (Inherently Explainable) Methods&lt;/strong&gt;&lt;br&gt;
Ante-hoc methods refer to models that are designed from the outset to be transparent and understandable. Their architecture and operating mechanisms are intrinsically interpretable.&lt;/p&gt;

&lt;p&gt;Common examples include linear models (linear and logistic regression), decision trees, fuzzy inference systems, k-nearest neighbors (k-NN) algorithms, and Bayesian models. The main advantage of these methods is the direct transparency they offer, eliminating the need for a second model or technique to generate explanations. However, a frequently cited limitation is that these models may not achieve the same level of predictive performance as more complex black-box models on certain tasks, leading to what is known as the "explainability vs. accuracy trade-off."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1.3. Self-Explainability (Self-Explainable AI - S-XAI)&lt;/strong&gt;&lt;br&gt;
Self-Explainability (S-XAI) represents an emerging and promising approach that seeks to incorporate the ability to explain directly into the training process and architecture of Deep Learning models. The goal is for these models to generate inherent explanations that are intrinsically aligned with their internal decision-making processes. The rise of S-XAI is a direct response to the limitations and, crucially, the fidelity concerns of post-hoc methods. As post-hoc explanations can, in some cases, be misleading or not accurately reflect the model's true reasoning, S-XAI aims to build interpretability from the ground up, with the potential to lead to more reliable and robust explanations.&lt;/p&gt;

&lt;p&gt;S-XAI approaches can be categorized as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Explainability:&lt;/strong&gt; Focuses on integrating techniques like explainable feature engineering and the use of knowledge graphs to make the model's inputs more understandable and their relationships more transparent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Explainability:&lt;/strong&gt; Involves incorporating interpretability mechanisms into the model's architecture itself. Examples include:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Attention-based learning:&lt;/strong&gt; Attention mechanisms allow models to dynamically focus on relevant parts of the input data, analogous to human visual attention. Although not originally designed for explainability, they naturally highlight the most important features for the model's decision, being widely used in Convolutional Neural Networks (CNNs) and Transformers to focus on specific regions of images or segments of text sequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concept-based learning:&lt;/strong&gt; Uses concept activation vectors to interpret how the model understands and utilizes different high-level concepts in its decision-making processes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototype-based learning:&lt;/strong&gt; Explains the model's decisions by comparing new data samples with representative prototypes for each class, which are identified and learned during the model's training [ (referring to xDNN), ].&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Output Explainability:&lt;/strong&gt; Focuses on providing clear, concise, and understandable explanations about the model's final predictions or decisions.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;S-XAI seeks to overcome the fidelity concerns often associated with post-hoc methods, where the explanation is generated by a process separate from the original model. By integrating explainability into the model's design, it is expected that the explanations will be more faithful to the internal decision-making mechanisms, thereby increasing trust and robustness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Taxonomy of Key XAI Methods&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method Category&lt;/th&gt;
&lt;th&gt;Specific Technique&lt;/th&gt;
&lt;th&gt;Operating Principle&lt;/th&gt;
&lt;th&gt;Key Advantages&lt;/th&gt;
&lt;th&gt;Key Limitations/Challenges&lt;/th&gt;
&lt;th&gt;Relevant Snippets&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Post-hoc&lt;/td&gt;
&lt;td&gt;LIME (Local Interpretable Model-agnostic Explanations)&lt;/td&gt;
&lt;td&gt;Locally approximates black-box models with interpretable models trained on input perturbations.&lt;/td&gt;
&lt;td&gt;Model-agnostic, intuitive for local explanations.&lt;/td&gt;
&lt;td&gt;Instability due to sampling, limitations with very complex models, questionable fidelity.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-hoc&lt;/td&gt;
&lt;td&gt;SHAP (SHapley Additive exPlanations)&lt;/td&gt;
&lt;td&gt;Based on Shapley values from game theory to quantify the contribution of each feature to the prediction.&lt;/td&gt;
&lt;td&gt;Solid theoretical foundation, provides local and global feature importances, model-agnostic.&lt;/td&gt;
&lt;td&gt;Computational cost can be high, interpretation of values may depend on the model.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-hoc&lt;/td&gt;
&lt;td&gt;Gradient-Based Methods (e.g., CAM, LRP)&lt;/td&gt;
&lt;td&gt;Use gradients or activation maps to highlight important input regions for the decision.&lt;/td&gt;
&lt;td&gt;Useful for visual data, computationally efficient for some methods.&lt;/td&gt;
&lt;td&gt;Can suffer from saturated or noisy gradients, fidelity may vary.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ante-hoc&lt;/td&gt;
&lt;td&gt;Decision Trees&lt;/td&gt;
&lt;td&gt;Models based on hierarchical rules that partition the feature space.&lt;/td&gt;
&lt;td&gt;Highly interpretable, visualizable.&lt;/td&gt;
&lt;td&gt;May not capture complex relationships, prone to overfitting without proper pruning.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ante-hoc&lt;/td&gt;
&lt;td&gt;Linear/Logistic Regression&lt;/td&gt;
&lt;td&gt;Linear models that assign weights to input features.&lt;/td&gt;
&lt;td&gt;Simple to understand and interpret feature weights.&lt;/td&gt;
&lt;td&gt;Assumes linearity, may underperform on complex non-linear problems.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S-XAI&lt;/td&gt;
&lt;td&gt;Attention-Based Learning&lt;/td&gt;
&lt;td&gt;Incorporates attention mechanisms into the model architecture to focus on relevant parts of the input.&lt;/td&gt;
&lt;td&gt;Inherently highlights important features, improves performance on some tasks.&lt;/td&gt;
&lt;td&gt;Attention mechanisms may not reflect causality, attention itself can be complex.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S-XAI&lt;/td&gt;
&lt;td&gt;Concept-Based Learning&lt;/td&gt;
&lt;td&gt;Trains the model to recognize and use high-level concepts understandable by humans.&lt;/td&gt;
&lt;td&gt;Explanations in terms of meaningful concepts, alignment with human knowledge.&lt;/td&gt;
&lt;td&gt;Requires definition and annotation of concepts, can be difficult to scale.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S-XAI&lt;/td&gt;
&lt;td&gt;Prototype-Based Learning&lt;/td&gt;
&lt;td&gt;The model learns representative prototypes for each class and explains predictions based on similarity to these prototypes.&lt;/td&gt;
&lt;td&gt;Intuitive, example-based explanations, can handle complex data.&lt;/td&gt;
&lt;td&gt;Selection and interpretation of prototypes can be challenging.&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The diversity of methods in XAI reflects the complexity of the challenge of making AI understandable. A table like the one presented above allows for a concise visualization of the main approaches, their operating principles, and their respective pros and cons, aiding in the selection of appropriate methods for specific contexts or in understanding the trade-offs involved. The direct reference to the research materials substantiates the summarized information.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;3. Applications of XAI in Critical Deep Learning Domains&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The need for transparency and interpretability driven by XAI is particularly pressing in domains where algorithmic decisions have significant and direct consequences on human lives, finances, or fundamental rights. Healthcare stands out as one of the most promising and, simultaneously, most demanding fields for the application of XAI, given the criticality of decisions and the imperative need for trust in support systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.1. XAI in Health and Medicine&lt;/strong&gt;&lt;br&gt;
The application of XAI in healthcare aims to empower professionals with tools that not only make accurate predictions but also offer clarity on how these predictions are formulated.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Diagnostic aid and disease prediction:&lt;/strong&gt; XAI has the potential to provide crucial insights into how AI models arrive at diagnostic or prognostic conclusions, allowing healthcare professionals to make more informed and personalized decisions. Practical examples include the use of XAI in the diagnosis of colorectal cancer from the analysis of histopathological images, where important features are extracted and analyzed, and in the early detection of Parkinson's Disease through the interpretation of DaTSCAN images. The combination of medical imaging techniques with DL has already demonstrated a significant improvement in diagnostic and prognostic capabilities across various medical specialties.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interpretability in medical image analysis:&lt;/strong&gt; The inherent complexity of DL models applied to medical image analysis represents a considerable challenge to understanding their decision-making processes. XAI techniques, both post-hoc (like LIME, SHAP, and gradient-based methods) and S-XAI approaches, are increasingly applied to visualize and interpret the internal workings of these models, with the goal of increasing transparency and clinicians' trust in their results. The applications of DL in medical imaging are vast, ranging from improving image quality and reconstructing three-dimensional images from two-dimensional views, to generating synthetic images (often using Generative Adversarial Networks - GANs) for data augmentation, registering images from different modalities, and precisely segmenting anatomical or pathological structures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparency in drug discovery and personalized medicine:&lt;/strong&gt; Multimodal AI, which integrates various data sources such as genomic information, clinical data, and molecular data, is progressively reshaping the landscape of drug discovery and development. In this context, XAI is essential for uncovering and understanding the complex and often hidden patterns that these multimodal models reveal. Multimodal language models (MLMs), for example, are employed to correlate genetic variants with clinical biomarkers, optimizing patient stratification for clinical trials and improving the selection of candidates for different phases of drug development. In the field of genomics, DL applications, which can benefit from XAI for validation and knowledge discovery, include predicting protein binding sites on DNA/RNA, modeling gene expression, and enhancing genomic sequencing processes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite the enormous potential demonstrated, the effective integration of XAI into clinical practice has been notably slow and limited. This gap suggests that purely technical explainability, by itself, is insufficient. Factors such as the usability of explanations for clinicians, alignment with existing medical workflows, and addressing regulatory and ethical concerns are equally critical for real-world adoption. The "trust gap" refers not only to understanding the model but also to its reliability, safety, and relevance in the clinical context. Therefore, future research in XAI for healthcare must focus not only on algorithmic transparency but also on human-centered design and rigorous clinical validation of the generated explanations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.2. XAI in Other High-Impact Areas&lt;/strong&gt;&lt;br&gt;
The demand for XAI extends beyond medicine, covering various sectors where the opacity of AI models can pose significant risks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Finance:&lt;/strong&gt; In the insurance sector, for example, XAI methods are considered relevant for enhancing transparency in processes such as claims management, policy underwriting, and actuarial pricing. The ability to explain credit or investment decisions is crucial for regulatory compliance and for maintaining customer trust.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Criminal Justice:&lt;/strong&gt; XAI plays a crucial role in empowering judges and other legal professionals to make more informed and fair decisions based on algorithmic outcomes. The lack of transparency in AI systems used for risk assessment or evidence analysis can impede the effectiveness of the judicial system and raise serious questions about due process and fairness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous Systems:&lt;/strong&gt; In autonomous vehicles, safety is paramount. Federated learning, a technique that allows models to be trained on distributed data without centralizing it, is used for tasks like object detection. XAI can be fundamental in this context to debug model behavior, understand failures, and build trust in the safety and reliability of these complex systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Climate Science:&lt;/strong&gt; Although not the main focus of the provided research materials, the interpretability of machine learning models applied to climate physics is considered crucial, especially in regimes with scarce or non-stationary data. XAI can help ensure the generalization and reliability of climate projections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marketing:&lt;/strong&gt; There is an emerging interest in applying XAI in marketing, with the goal of demystifying the decision-making processes of predictive models used for customer segmentation, product recommendation, or campaign optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common thread that unites these diverse applications is the pressing need for accountability and the mitigation of risks associated with opaque AI decision-making. Whether to ensure financial fairness, judicial impartiality, safety in autonomous systems, or reliability in scientific forecasts, XAI is perceived as an essential mechanism to ensure that AI operates responsibly and in alignment with societal interests. The demand for XAI, therefore, correlates directly with the criticality and potential social impact of the AI application in question.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;4. Pressing Challenges and Limitations of XAI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Despite significant advancements and the growing recognition of its importance, XAI faces a series of complex challenges and intrinsic limitations that need to be addressed for its potential to be fully realized.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The dilemma between interpretability and model performance:&lt;/strong&gt; There is often a perceived trade-off between a model's interpretability and its predictive performance: simpler, and therefore more easily interpretable, models may not achieve the same accuracy as highly complex black-box models, such as deep neural networks. However, one of the central goals of XAI is precisely to develop methods and models that are increasingly interpretable while maintaining a high level of learning effectiveness and performance. This dichotomy may be more subtle than a simple inverse relationship. Approaches like S-XAI, for example, seek to challenge this notion by integrating interpretability directly into high-performance architectures. Furthermore, the "cost" of slightly lower performance may be acceptable in certain critical domains if, in return, significant and reliable explainability is obtained. The definition of "optimal" performance must, therefore, be contextualized; in high-risk areas, a slightly less accurate but fully transparent and reliable model may be preferable to a marginally more accurate black box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robustness of explanations and vulnerability to adversarial attacks:&lt;/strong&gt; Deep Learning models are known for their susceptibility to adversarial attacks, in which subtle and often imperceptible perturbations in the input data can lead to incorrect classifications or anomalous behaviors. This vulnerability can extend to the generated explanations. Robustness, in the context of XAI, refers to the ability of the AI model to maintain its performance and, crucially, to provide accurate and consistent explanations even in the presence of noise, input data perturbations, or deliberate adversarial attacks. Significant challenges persist in the susceptibility to sophisticated adversarial attacks and in maintaining the reliability of explanations under data distribution shifts. If the explanations themselves are not robust, they can be manipulated, leading to a false sense of understanding or trust on the part of the user. This not only undermines the fundamental purpose of XAI but can be even more dangerous than dealing with a recognized black box, as a misleading explanation can induce errors with severe consequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Factual consistency, "hallucinations," and the reliability of explanations:&lt;/strong&gt; A critical challenge in the field of AI, with direct implications for XAI, is ensuring that AI systems not only process data but also genuinely understand and align with human values and factual reality. Generative models, especially LLMs, are prone to the phenomenon of "hallucination," where they can generate responses that seem plausible but are factually inaccurate, inconsistent, or completely fabricated. If explanations are generated by models with similar characteristics, or if XAI methods are applied to models prone to hallucinations, the explanations themselves may inherit these reliability problems. The problem of "hallucination" in generative AI  directly impacts XAI, as an explanation that "hallucinates" is inherently misleading and harmful, potentially worse than no explanation at all. This creates a "meta-hallucination" problem, where the explanation itself is a convincing falsehood, severely undermining trust.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability and computational efficiency of XAI methods:&lt;/strong&gt; Training Deep Learning models, especially large-scale ones, requires substantial computational resources, including high-performance GPUs or TPUs. Some XAI methods, such as SHAP, can add significant computational overhead, making their application on very large models or in real-time scenarios a challenge. Despite advances in model compression and efficient training techniques, the fundamental challenge of computational efficiency persists, often exacerbated by the trend of developing ever-larger and more complex models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intrinsic limitations of popular techniques:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LIME:&lt;/strong&gt; It can suffer from instability due to the nature of random sampling in its perturbation process and may have limitations in handling the complexities of highly non-linear models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SHAP:&lt;/strong&gt; Although theoretically robust, its computational cost can be prohibitive for some use cases, and the interpretability of Shapley values can vary depending on the specific characteristics of the model being explained.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post-hoc methods in general:&lt;/strong&gt; Concerns persist about the &lt;em&gt;faithfulness&lt;/em&gt; of these explanations, i.e., whether they accurately reflect the true decision-making mechanisms of the original model, rather than being just plausible approximations.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Issues of trust, adoption, and integration into real-world practices:&lt;/strong&gt; Despite the transformative potential of XAI, its effective integration into clinical practice, for example, has been slow and limited. This is largely due to the persistent lack of trust and understanding of AI models by professionals. The lack of transparency in algorithmic decision-making processes can prevent professionals from using these AI systems effectively and safely. The adoption of XAI is, therefore, not just a technical challenge but also a complex socio-technical one. It involves human factors, such as the usability and relevance of explanations for different types of users, the need for organizational changes to incorporate new tools and processes, and the lack of standardized practices and benchmarks for evaluating and comparing XAI methods. For XAI to be widely adopted, it needs to be not only technically sound but also user-centered, easily integrable into existing workflows, and demonstrate clear and safe benefits, possibly with the support of regulatory frameworks and standardization.&lt;/li&gt;

&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;5. Evaluation of Methods and Explanations in XAI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The evaluation of the effectiveness and quality of explanations generated by XAI methods is a crucial component for the development and reliable deployment of transparent AI systems. The literature suggests that the evaluation of explanations can be fundamentally categorized into two main aspects: (a) the &lt;em&gt;faithfulness&lt;/em&gt; of the explanation with respect to the model's prediction, i.e., how correctly it represents the underlying reasons for the model's decision; and (b) the &lt;em&gt;usefulness&lt;/em&gt; of the explanation for the end-user, i.e., how well it helps the human to understand and interact with the AI system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5.1. Quantitative Metrics for Evaluation&lt;/strong&gt;&lt;br&gt;
Evaluating the effectiveness of XAI methods remains a pressing issue, with approaches ranging from qualitative user studies to the development of automated quantitative metrics. The latter seek to offer an objective measure of different properties of the explanations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness:&lt;/strong&gt; This dimension assesses how accurately an explanation reflects the true reasoning process of the AI model being explained. It is a crucial measure for judging whether the explanations are reliable and truly correspond to the model's internal behavior.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Examples of Metrics:&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness Correlation:&lt;/strong&gt; Evaluates the correlation between the importance attributed to features by the XAI technique and the actual impact of those features on the model's predictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infidelity:&lt;/strong&gt; Quantifies the difference between the provided explanation and the actual impact observed in the model's predictions when features are perturbed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prediction Gap on Important/Unimportant feature perturbation (PGI/PGU):&lt;/strong&gt; Measure the change in prediction when the most important (PGI) or least important (PGU) features, as identified by the explanation, are perturbed or removed.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Robustness / Stability:&lt;/strong&gt; These metrics evaluate the consistency of explanations when small perturbations are introduced to the model's input. Ideally, explanations for similar inputs should be consistently similar, ensuring that the model's interpretations are stable and reliable in the face of small variations in the data.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Examples of Metrics:&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sensitivity:&lt;/strong&gt; Assesses how much an explanation changes in response to small changes in the input, ensuring the consistent identification of important features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relative Input Stability (RIS), Relative Output Stability (ROS), Relative Representation Stability (RRS):&lt;/strong&gt; Measure the maximum change in attribution scores relative to perturbations in the input (RIS), the model's output (ROS), or the model's internal representations (RRS), respectively.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Localization:&lt;/strong&gt; Particularly relevant for image data, this metric evaluates how well an explanation can identify and highlight the relevant parts of the input that contributed to the model's decision.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Examples of Metrics:&lt;/em&gt; Comparisons between segmentation maps (if available as ground truth) and the image regions identified by the XAI method, often using metrics like Intersection over Union (IoU).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Complexity/Understandability of the Explanation:&lt;/strong&gt; Measures the cognitive load required for a human to understand the provided explanation. Explanations with lower complexity are generally considered more interpretable and easier to assimilate.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Related Metrics:&lt;/em&gt; Number of rules (R) in a rule-based explanation, or the number of features (F) used to construct the explanation.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Plausibility:&lt;/strong&gt; Assesses whether the explanation makes sense to human experts in the application domain, even if it is not a perfectly faithful representation of the model's complete internal logic. An explanation can be plausible without being fully faithful, and vice versa.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;There is a fundamental tension in the quantitative evaluation of XAI. While faithfulness and robustness metrics seek objectivity, concepts like "usefulness," "understandability," and "plausibility" are inherently subjective and dependent on the user and context. This underscores the irreplaceable role of human evaluation in the XAI cycle. Purely quantitative metrics may not capture the entirety of an explanation's "quality," necessitating qualitative and human-centered approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5.2. Qualitative Evaluation and the Role of the Human-in-the-Loop (HITL)&lt;/strong&gt;&lt;br&gt;
Qualitative evaluation, often involving direct human participation (Human-in-the-Loop - HITL), is essential to complement quantitative metrics. HITL integrates human judgment and expertise at key stages of XAI development and validation, helping to bridge the gap between the complex behavior of AI models and the generation of practical, explainable results.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Humans, especially domain experts, can validate the relevance and correctness of explanations. For example, radiologists can confirm whether the regions highlighted by an XAI system in an X-ray image are, in fact, medically relevant for the diagnosis.&lt;/li&gt;
&lt;li&gt;Feedback from domain experts is crucial for refining both the performance of the AI model and the clarity and usefulness of the explanations it provides.&lt;/li&gt;
&lt;li&gt;Studies with users and experts often examine dimensions such as the clarity, coherence, narrative quality, and actionability of explanations.&lt;/li&gt;
&lt;li&gt;Cognitive metrics, such as user satisfaction, the level of trust generated, understanding of the model's decision, and impact on user productivity, are also important components of qualitative evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The phenomenon of "hallucination" in AI models  and the potential for misleading explanations make the HITL approach not just beneficial, but essential for validating XAI in critical applications. Automated metrics alone may fail to detect explanations that are semantically flawed, factually incorrect, or contextually inappropriate, even if they appear syntactically plausible. Human experts are needed to validate whether an explanation is not only faithful to the model but also correct and meaningful within the specific application domain. Thus, HITL acts as a critical safeguard against the deployment of AI systems with explanations that could be misleading or harmful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5.3. Challenges in Standardization and Objectivity of XAI Evaluation&lt;/strong&gt;&lt;br&gt;
Evaluating explainability is a complex task, hindered by the inherently subjective nature of what constitutes a "good" explanation, which can vary significantly depending on the user, task, and context. Many studies apply XAI methods, but few have systematically measured their effectiveness using standardized quantitative benchmarks. The absence of a mathematical or universally accepted definition of explainability and interpretability further complicates the development of objective and comparable evaluation methods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 2: Common Metrics for Evaluating Explanations in XAI&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Evaluation Dimension&lt;/th&gt;
&lt;th&gt;Specific Metric&lt;/th&gt;
&lt;th&gt;Metric Description&lt;/th&gt;
&lt;th&gt;Type (Quant./Qual./HITL)&lt;/th&gt;
&lt;th&gt;Relevant Snippets&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Faithfulness&lt;/td&gt;
&lt;td&gt;PGI/PGU&lt;/td&gt;
&lt;td&gt;Measures the change in prediction when perturbing important/unimportant features.&lt;/td&gt;
&lt;td&gt;Quantitative&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Faithfulness&lt;/td&gt;
&lt;td&gt;Faithfulness Correlation / Infidelity&lt;/td&gt;
&lt;td&gt;Assesses the correspondence between the importance assigned by the explanation and the actual impact of the features.&lt;/td&gt;
&lt;td&gt;Quantitative&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Robustness/Stability&lt;/td&gt;
&lt;td&gt;RIS/ROS/RRS&lt;/td&gt;
&lt;td&gt;Measures the stability of the explanation relative to perturbations in the input, output, or internal representations.&lt;/td&gt;
&lt;td&gt;Quantitative&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Robustness/Stability&lt;/td&gt;
&lt;td&gt;Sensitivity&lt;/td&gt;
&lt;td&gt;Assesses how much an explanation changes with small alterations in the input.&lt;/td&gt;
&lt;td&gt;Quantitative&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Localization (for images)&lt;/td&gt;
&lt;td&gt;IoU (Intersection over Union)&lt;/td&gt;
&lt;td&gt;Compares regions identified by the explanation with a ground truth (e.g., segmentation map).&lt;/td&gt;
&lt;td&gt;Quantitative&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Understandability/Complexity&lt;/td&gt;
&lt;td&gt;Rule/Feature Count (R/F)&lt;/td&gt;
&lt;td&gt;Measures the number of rules or features used in the explanation as a proxy for complexity.&lt;/td&gt;
&lt;td&gt;Quantitative&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Usefulness to the User&lt;/td&gt;
&lt;td&gt;User Satisfaction, Trust, Understanding&lt;/td&gt;
&lt;td&gt;Assesses the user's perception of the explanation's utility, clarity, and impact on their trust and understanding.&lt;/td&gt;
&lt;td&gt;Qualitative / HITL&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plausibility&lt;/td&gt;
&lt;td&gt;Domain Expert Evaluation&lt;/td&gt;
&lt;td&gt;Experts judge whether the explanation makes sense in the context of the domain, regardless of fidelity to the model.&lt;/td&gt;
&lt;td&gt;Qualitative / HITL&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Evaluation in XAI is multifaceted, and a combination of quantitative and qualitative metrics, with a strong emphasis on human validation, is generally necessary for a holistic assessment of the quality and effectiveness of explanations.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;6. Future Directions and Open Research in XAI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The field of Explainable Artificial Intelligence is constantly evolving, driven by the need to make Deep Learning systems more transparent, reliable, and aligned with human expectations. Several promising research directions and open challenges continue to shape the future of XAI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Development of more robust, generalizable, and faithful S-XAI:&lt;/strong&gt; Research in Self-Explainability (S-XAI) is a particularly active area, focusing on the development of models that are inherently interpretable without sacrificing performance. This includes continuous advancements in S-XAI methods for medical image analysis and other domains, aiming for explanations that are more robust to perturbations, generalizable to different datasets, and, crucially, faithful to the true decision-making processes of the model. The enhancement of approaches like attention-based learning, concept-based learning, and prototype-based learning is fundamental to achieving these goals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration of domain knowledge for contextually rich explanations:&lt;/strong&gt; For explanations to be truly useful, they need to be contextually relevant. An important direction is the integration of domain-specific knowledge into S-XAI methods and other XAI approaches. This is especially vital in fields like medicine, where clinical context, patient history, and established medical knowledge are essential for correctly interpreting both the model's predictions and its explanations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhancement of human-AI interaction and personalization of explanations:&lt;/strong&gt; Effective collaboration between humans and AI systems is a central goal, and XAI plays a key role in this. Future research should explore how to improve human-AI interaction in decision-making, for example, in the medical context. An important avenue is the development of explanations that can be personalized and adapted to the user's level of expertise, informational needs, and cognitive style. As AI becomes more widespread, the "one-size-fits-all" explanation approach will prove inadequate. Different users (a DL researcher, a clinician, a patient) have different needs and levels of understanding. Therefore, the XAI of the future will likely need to evolve to offer personalized and adaptive explanations, making human-AI collaboration more fluid and effective.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addressing fundamental DL challenges (e.g., causality, reasoning) in the context of XAI:&lt;/strong&gt; Many current DL models, despite their predictive power, operate primarily based on pattern correlation, with limited capabilities for causal or abstract reasoning. The gap between human-like reasoning and the pattern-matching capabilities of AI remains a significant challenge. XAI needs to evolve to be able to explain models that demonstrate more complex forms of reasoning, including the ability to distinguish correlation from causation in analytical tasks. This implies not only explaining the "what" and "how" of decisions but also, ideally, the "why" in a deeper, more causal sense.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ethical and regulatory considerations for XAI:&lt;/strong&gt; XAI is fundamental to the ethical deployment of AI, as it promotes trust, transparency, and accountability. Legislative and policy developments, such as the AI Act in the European Union, are increasingly emphasizing the need for algorithmic transparency and, in some cases, the "right to an explanation." XAI can be a powerful tool for identifying and mitigating algorithmic biases, contributing to fairer and more impartial decisions. However, XAI itself is not an ethical panacea. It carries significant ethical responsibilities; if misused or poorly designed, it can create a false sense of security or be used to obscure, rather than illuminate, the workings of systems. The development and deployment of XAI must, therefore, be guided by robust ethical principles and aligned with societal values and emerging regulatory requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improving the efficiency of generative models and their explanations:&lt;/strong&gt; Dominant deep generative models (DGMs), such as diffusion models and LLMs, face design challenges that result in slow and computationally intensive inference. Accelerating these models is an active area of research. By extension, the ability to efficiently explain their generations, which are often sequential or iterative, is also an important direction. The need for DGMs that inherit the advantages of diffusion models (such as the high quality of generated samples) but support one-step sample generation also applies to the explainability of these samples. Explaining complex generative processes in an understandable and efficient manner remains an open challenge.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;7. Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Explainable Artificial Intelligence (XAI) has emerged not as a mere supplement, but as an indispensable component for the responsible advancement and trustworthy adoption of Deep Learning systems. As DL models become increasingly powerful and permeate critical aspects of society, the need to mitigate the risks associated with their "black-box" nature becomes paramount. XAI offers a path to unravel these complex algorithmic systems, promoting transparency, interpretability, and, ultimately, trust.&lt;/p&gt;

&lt;p&gt;Throughout this review, significant advancements in the field of XAI have been discussed, from consolidated post-hoc methodologies like LIME and SHAP to the burgeoning and promising field of Self-Explainability (S-XAI), which seeks to integrate interpretability into the very design of models. Applications in critical domains, with a focus on healthcare, demonstrate the transformative potential of XAI to improve decision-making, increase safety, and facilitate collaboration between humans and machines. However, persistent challenges remain. The dilemma between interpretability and performance, the robustness of explanations against attacks and perturbations, the need for standardized and objective evaluation, and the complex task of effectively integrating XAI into real-world practices require continuous research and innovation.&lt;/p&gt;

&lt;p&gt;The vast potential for future research in XAI is evident. The development of more sophisticated and faithful S-XAI methods, the integration of domain knowledge to contextually enrich explanations, the personalization of explainability for different users and contexts, and the addressing of fundamental ethical and regulatory issues are just some of the frontiers that are emerging. The journey of XAI is, in essence, a continuous co-evolution with AI itself. As AI models become more advanced and integrated into the social fabric, the demands on XAI for transparency, robustness, and reliability will only intensify, requiring incessant innovation and a constant, critical evaluation of its methods and impacts.&lt;/p&gt;

&lt;p&gt;To fully realize the promise of XAI, a call for interdisciplinary collaboration is imperative. Advancement in this field requires joint efforts from AI researchers, domain experts from various application areas, social scientists, ethicists, and policymakers. Only through this synergy will it be possible to ensure that XAI is developed and used in a way that maximizes its benefits and minimizes its risks, contributing to a future where artificial intelligence is not only powerful but also understandable, fair, and truly at the service of humanity.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Acknowledgements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;(This section would be included if there were specific funding or significant contributions from individuals or institutions to be acknowledged, as is standard in scientific papers.)&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Mezghani, E., et al. (2019). "Deep Learning Applications in Medical Imaging and Genomics". &lt;em&gt;Applied Sciences&lt;/em&gt;, 9(8), 1526.&lt;/li&gt;
&lt;li&gt;"Recent Advancements in Generative AI". (2024). &lt;em&gt;arXiv:2403.00025&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Paperguide.ai. (2024). "Top Research Papers on Explainable AI (XAI)".&lt;/li&gt;
&lt;li&gt;GeeksforGeeks. (2024). "Challenges in Deep Learning".&lt;/li&gt;
&lt;li&gt;"Explainable Artificial Intelligence for Disease Prediction: A Systematic Literature Review". (2024). &lt;em&gt;Journal of Personalized Medicine&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"A Survey on Explainable Artificial Intelligence (XAI) Techniques for Visualizing Deep Learning Models in Medical Imaging". (2024). &lt;em&gt;ResearchGate&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;MarkovML. (2024). "LIME vs SHAP: A Comparative Analysis of Interpretability Tools". &lt;em&gt;MarkovML Blog&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Unsolved Challenges in AI in 2024". (2024). &lt;em&gt;Gekko&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Frontiere.io. (2024). "Can there be harmony between human and AI? The key role of Explainable AI and Human-in-the-loop".&lt;/li&gt;
&lt;li&gt;Amann, J., et al. (2025). "What Is the Role of Explainability in Medical Artificial Intelligence? A Case-Based Approach". &lt;em&gt;Journal of Clinical Medicine&lt;/em&gt;, 12(4), 375.&lt;/li&gt;
&lt;li&gt;"Which LIME should I trust? Concepts, Challenges, and Solutions". (2025). &lt;em&gt;arXiv:2503.24365&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks". (2024). &lt;em&gt;arXiv:2410.02331&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks". (2024). &lt;em&gt;ResearchGate&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Self-Explainable AI and Attention for Interpretable Cancer Analysis: A Systematic Review Protocol". (2025). &lt;em&gt;protocols.io&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Attention Mechanisms in AI and Deep Learning Explained". (2024). &lt;em&gt;viso.ai&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Brás, C., et al. (2024). "Explainable AI for medical image analysis". In &lt;em&gt;Trustworthy AI in Medical Imaging&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;van der Velden, B. H. M., et al. (2023). "Explainable artificial intelligence (XAI) in radiology and nuclear medicine: a literature review". &lt;em&gt;Frontiers in Medicine&lt;/em&gt;, 10.&lt;/li&gt;
&lt;li&gt;"Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis". (2024). &lt;em&gt;PubMed Central&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"From siloed data to breakthroughs: Multimodal AI in drug discovery". (2024). &lt;em&gt;Drug Target Review&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"One-shot Federated Learning: A Survey". (2025). &lt;em&gt;arXiv:2505.02426&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Machine Learning for Climate Physics". (2024). &lt;em&gt;Annual Review of Condensed Matter Physics&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Evaluating the Usefulness of Explanations from Explainable Artificial Intelligence (XAI) Methods". (2024). &lt;em&gt;medRxiv&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"QUANTIFYING EXPLAINABLE AI METHODS IN MEDICAL DIAGNOSIS: A STUDY IN SKIN CANCER". (2024). &lt;em&gt;medRxiv&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"A Quantitative and Qualitative Evaluation of XAI Methods for Human-in-the-Loop Skeletal-based Human Activity Recognition". (2024). &lt;em&gt;PubMed Central&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Evaluation Metrics Research for Explainable Artificial Intelligence Global Methods Using Synthetic Data". (2023). &lt;em&gt;Mathematics&lt;/em&gt;, 6(1), 26.&lt;/li&gt;
&lt;li&gt;"What is the Role of Human-in-the-Loop in Explainable AI?". (n.d.). &lt;em&gt;milvus.io&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Explainable AI in medical imaging". (2023). &lt;em&gt;University of Twente Student Theses&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;"Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks". (2024). &lt;em&gt;AIModels.fyi&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Frontiere.io. (2024). "Can there be harmony between human and AI? The key role of Explainable AI and Human-in-the-loop".&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Abstract&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Deep Learning (DL) has revolutionized numerous fields, but its "black-box" nature often hinders trust and adoption in critical domains. Explainable Artificial Intelligence (XAI) emerges as an essential discipline to provide transparency and interpretability to DL models. This paper presents a comprehensive review of the advancements, challenges, and future perspectives of XAI applied to Deep Learning models. The fundamentals of XAI are discussed, including a taxonomy of post-hoc methods (e.g., LIME, SHAP), ante-hoc methods, and the growing field of Self-Explainability (S-XAI) with its attention-based, concept-based, and prototype-based approaches. Critical applications of XAI are explored, with an emphasis on healthcare (diagnosis, medical imaging, drug discovery) and other sectors like finance and justice. Pressing challenges are analyzed, such as the interpretability-performance dilemma, the robustness of explanations against adversarial attacks, factual consistency, computational scalability, and the limitations of popular techniques. The importance of evaluating XAI methods is highlighted, covering quantitative metrics (faithfulness, robustness, localization) and qualitative ones, including the crucial role of human-in-the-loop (HITL) evaluation, as well as the challenges in standardizing this evaluation. Finally, future directions are outlined, such as the development of more advanced S-XAI, the integration of domain knowledge, the personalization of explanations, addressing ethical and regulatory issues, and improving explainability in generative models. It is concluded that XAI is vital for the responsible advancement of DL, requiring continuous interdisciplinary collaboration to realize its full potential.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
