<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Aun Raza</title>
    <description>The latest articles on Forem by Aun Raza (@aun_aideveloper).</description>
    <link>https://forem.com/aun_aideveloper</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3331327%2F32bf6512-e417-417e-902e-988a0540e6e2.png</url>
      <title>Forem: Aun Raza</title>
      <link>https://forem.com/aun_aideveloper</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/aun_aideveloper"/>
    <language>en</language>
    <item>
      <title>The Humans Who Will Thrive in an AI-First World</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Tue, 31 Mar 2026 16:00:55 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/the-humans-who-will-thrive-in-an-ai-first-world-2fp2</link>
      <guid>https://forem.com/aun_aideveloper/the-humans-who-will-thrive-in-an-ai-first-world-2fp2</guid>
      <description>&lt;p&gt;If you cast your mind back to the frantic headlines of 2023 and 2024, you might remember the widespread panic that artificial intelligence was coming for everyone’s job. Fast forward to where we are today in 2026, and the reality looks remarkably different. The dust has settled, the hype cycle has leveled out into practical application, and a clear picture has emerged of the modern workplace.&lt;/p&gt;

&lt;p&gt;AI didn't replace humans. Instead, it replaced tasks.&lt;/p&gt;

&lt;p&gt;As we navigate this fully realized AI-first world, a fascinating trend has emerged. The professionals who are experiencing the most explosive career growth aren’t necessarily the ones who understand how to build neural networks. The true winners are those who have mastered the art of working alongside these systems. They have evolved from operators into orchestrators.&lt;/p&gt;

&lt;p&gt;Let’s look at how this transformation is actively playing out across three distinctly different industries, customer support, healthcare, and design and what they teach us about the humans who are thriving today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transforming Customer Support&lt;/strong&gt;&lt;br&gt;
For decades, working in customer support meant operating like a human router: fielding repetitive questions, reading off rigid scripts, and racing against a ticking "average handle time" clock. In 2026, that version of the job is practically extinct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Beyond the Script&lt;/strong&gt;&lt;br&gt;
Today, autonomous AI agents seamlessly handle roughly 80% of tier-one and tier-two customer inquiries. Routine tasks like processing returns, tracking lost packages, or updating billing information are resolved instantly by AI models that never sleep and speak forty languages.&lt;/p&gt;

&lt;p&gt;So, what happened to the human support team? They got an upgrade. The customer support professionals who are thriving today have transitioned into what we now call "Customer Success Consultants." Because AI intercepts the mundane friction, the only calls that reach a human are the complex, the highly nuanced, or the emotionally charged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Empathy Premium&lt;/strong&gt;&lt;br&gt;
Imagine a customer whose wedding dress was ruined in transit just days before the ceremony. An AI can process the refund, but it cannot read the panic in the customer's voice, nor can it offer genuine reassurance and creative, out-of-the-box problem-solving to save the day.&lt;/p&gt;

&lt;p&gt;The humans thriving in modern customer support are the ones who index heavily on emotional intelligence. They use AI as their rapid-research assistant pulling up customer histories, cross-referencing inventory in milliseconds, and drafting follow-up emails while they focus entirely on active listening and empathetic resolution. They aren't valued for their speed anymore; they are valued for their humanity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Healthcare Gets Human Again&lt;/strong&gt;&lt;br&gt;
Perhaps nowhere is the AI-first transition more profound, and more urgently needed, than in healthcare. Just a few years ago, doctors and nurses were buckling under the weight of administrative burnout, spending more time staring at glowing screens than looking their patients in the eye.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Curing the Paperwork Plague&lt;/strong&gt;&lt;br&gt;
In 2026, the medical professionals who are thriving have embraced AI to cure the paperwork plague. Ambient clinical intelligence is now a standard fixture in examination rooms. As a doctor speaks naturally with a patient, an AI listens, structures the medical notes, pulls relevant historical data, and seamlessly updates the electronic health record securely. Recent medical industry reports from earlier this year show that ambient AI has successfully returned an average of 12 to 15 hours per week to physicians, time previously lost to late-night data entry.&lt;/p&gt;

&lt;p&gt;But it goes deeper than admin. AI is now a trusted secondary diagnostic tool. When a radiologist looks at a scan, an AI overlay has already highlighted microscopic anomalies that a tired human eye might miss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Return of Bedside Manner&lt;/strong&gt;&lt;br&gt;
The doctors and nurses thriving in this environment aren’t threatened by a machine’s ability to pattern-match medical imagery. Instead, they use it to elevate their practice.&lt;/p&gt;

&lt;p&gt;Because the AI handles the data processing, the modern healthcare provider can finally focus on the art of healing. They have the mental bandwidth to explain complex treatment plans clearly, to comfort frightened families, and to factor in a patient’s unique lifestyle and emotional state, contextual nuances that algorithms still cannot grasp. AI has ironically made medicine less robotic and deeply human once again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designers as Creative Directors&lt;/strong&gt;&lt;br&gt;
If we look at the creative sector, the shift has been just as dramatic. When generative design tools first hit the mainstream, many feared the death of the commercial artist. Yet, in 2026, the demand for top-tier design talent is higher than ever. The nature of the work, however, has entirely fundamentally shifted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retiring the Blank Canvas&lt;/strong&gt;&lt;br&gt;
A few years ago, a designer might spend days mocking up variations of a landing page or tweaking bezier curves on a digital asset. Today, the "blank canvas" phase is handled by AI. A designer can prompt a tool to generate fifty variations of a user interface, complete with different color palettes and typography, in a matter of seconds.&lt;/p&gt;

&lt;p&gt;The designer’s role has leveled up from pixel-pusher to creative director. The humans thriving in design are the ones who possess exceptional taste, deep cultural awareness, and a profound understanding of human psychology.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Curating the Soul&lt;/strong&gt;&lt;br&gt;
AI can generate a million technically perfect images, but it doesn't know why a certain shade of blue evokes trust in a specific demographic, or why a slightly asymmetrical layout feels more approachable to a Gen-Z audience.&lt;/p&gt;

&lt;p&gt;Thriving designers today are editors and curators. They take the raw, often soulless output of generative AI and inject it with brand voice, cultural relevance, and human emotion. They spend their time on strategy, user empathy, and storytelling, using AI simply as a high-powered brush to paint their larger vision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thriving in Tomorrow's Economy&lt;/strong&gt;&lt;br&gt;
When we look across customer support, healthcare, and design, a unified theme emerges for this AI-first era. The half-life of purely technical, repetitive skills has shrunk drastically. If your primary value to an organization in the past was processing data or executing rote tasks, the ground has shifted beneath your feet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Soft Skills Renaissance&lt;/strong&gt;&lt;br&gt;
We are living through a soft skills renaissance. The defining characteristics of a successful professional in 2026 are adaptability, critical thinking, complex problem-solving, and emotional intelligence.&lt;/p&gt;

&lt;p&gt;AI is the ultimate eager intern. It has read every book, memorized every manual, and works at the speed of light. But it has no lived experience, no moral compass, no intuition, and no empathy. It requires human oversight to provide context, ethical boundaries, and strategic direction. The people who are getting promoted, building successful companies, and leading their fields are the ones who know how to manage this digital workforce while doubling down on the traits that make them uniquely human.&lt;/p&gt;

&lt;p&gt;As we look toward the end of the decade, the narrative is no longer about humans versus machines. It is about humans with machines versus humans without them. The AI-first world hasn't diminished the value of human labor; it has distilled it to its purest, most impactful essence.Whether you are calming a frustrated customer, diagnosing a patient, or designing the next great digital experience, your greatest asset is no longer your ability to compute or execute repetitive tasks. Your greatest asset is your humanity.The ultimate irony of the AI revolution is this: to thrive in an environment dominated by artificial intelligence, you don't need to become more like a machine. You just need to become more deeply, unapologetically human.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>futurechallenge</category>
      <category>leadership</category>
      <category>techtalks</category>
    </item>
    <item>
      <title>How We manage ‘Gray Area’ Logic in Conversational AI</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Wed, 18 Mar 2026 20:52:21 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/how-we-handle-gray-area-logic-in-conversational-agents-2d6o</link>
      <guid>https://forem.com/aun_aideveloper/how-we-handle-gray-area-logic-in-conversational-agents-2d6o</guid>
      <description>&lt;h1&gt;
  
  
  How We Handle ‘Gray Area’ Logic in Conversational Agents
&lt;/h1&gt;

&lt;p&gt;Imagine walking into your favorite local coffee shop. You tell the barista you want something cold and sweet, but you really do not want to be kept awake all night. A human barista instantly processes that vague request. They might suggest a half decaf iced caramel macchiato. They naturally understand the gray area between "give me energy" and "let me sleep later." &lt;/p&gt;

&lt;p&gt;For years, if you asked a digital assistant or chatbot that same type of question, the system would completely break down. Traditional technology was built on strict binary logic. Everything was a zero or a one, a true or a false, a yes or a no. But human beings rarely communicate in absolute truths. We live in the maybe. We live in the gray areas.&lt;/p&gt;

&lt;p&gt;Today, we are finally teaching conversational agents how to navigate this ambiguity. Handling this gray area logic is no longer just a fun experiment. It is the core feature that separates a frustrating robotic chat from a genuinely helpful digital experience. Let us dive into how this actually works behind the scenes and why it is completely changing the way we interact with technology.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Messy Human Reality
&lt;/h2&gt;

&lt;p&gt;Human language is wonderfully complex and incredibly messy. We use qualifiers constantly. We say things like "sort of" or "usually" or "it depends." We also present conflicting information without even realizing it. &lt;/p&gt;

&lt;h3&gt;
  
  
  Beyond Yes and No
&lt;/h3&gt;

&lt;p&gt;Think about a standard customer service interaction. A customer might reach out to an airline and say they missed their flight because of heavy traffic, but they also know they bought the cheapest ticket with a strict no refund policy. The strict policy says the airline owes them nothing. However, human empathy says the customer is stressed and needs help. A human representative might check if there is an empty seat on the next flight and move them over for free as a courtesy. &lt;/p&gt;

&lt;p&gt;A traditional bot looks at the ticket class, sees the restriction, and coldly denies the request. It follows the rules perfectly, yet it completely fails the customer experience test. To build better systems, we had to rethink how AI processes these complex scenarios where multiple truths overlap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Teaching AI the Nuance
&lt;/h2&gt;

&lt;p&gt;We no longer rely on rigid decision trees where every user response must perfectly match a predetermined path. Instead, modern agents use a completely different approach to understand meaning and intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grasping the Deep Context
&lt;/h3&gt;

&lt;p&gt;The biggest breakthrough in handling gray area logic is context retention. Advanced conversational agents now act like a sponge. They absorb the entire story instead of just hunting for specific trigger words. When a user writes a long paragraph explaining a complicated problem, the AI breaks down the entire narrative. It understands that a customer is upset about a delayed delivery, but it also notes that the customer has been a loyal shopper for five years. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Game of Probabilities
&lt;/h3&gt;

&lt;p&gt;Instead of following a strict map, the system plays a game of weighted probabilities. The AI evaluates the situation and comes up with several possible responses. It thinks about the likelihood of what the user actually wants. If a user asks a highly ambiguous question, the agent does not just guess and hope for the best. It responds by asking a clarifying question. It acknowledges the ambiguity directly, which feels incredibly human. By navigating these probabilities, the agent gently guides the conversation out of the gray area and into a clear resolution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real World Success Stories
&lt;/h2&gt;

&lt;p&gt;This technology is not just theoretical. It is being actively deployed right now across major industries to solve genuine business problems. &lt;/p&gt;

&lt;h3&gt;
  
  
  Retail and Customer Support
&lt;/h3&gt;

&lt;p&gt;Ecommerce companies are using nuanced AI to handle complicated returns. Imagine a customer who wants to return a shirt. They admit they wore it once, but they claim the seam ripped immediately. Standard return policies dictate that items must be unworn. However, defective product policies allow for exceptions. The agent has to navigate this gray area. A smart agent will recognize the mention of the ripped seam, bypass the standard rejection, and kindly ask the customer to upload a photo of the damage. It solves the problem without making the customer angry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare Triage Systems
&lt;/h3&gt;

&lt;p&gt;Healthcare providers use conversational agents for appointment scheduling and symptom triage. A patient might say their stomach hurts a little bit, but they also mention a weird fever that started an hour ago. A basic bot might just offer to book an appointment for next week based on the mild stomach pain. A smart agent spots the fever, recognizes the potential urgency hidden in the gray area, and immediately escalates the chat to a human nurse. This capability saves time, resources, and potentially lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shifting the Industry Landscape
&lt;/h2&gt;

&lt;p&gt;The ability to process nuance is causing a massive shift in how businesses view automation. It is moving the technology from a simple cost cutting measure to a genuine driver of customer loyalty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smarter Graceful Handoffs
&lt;/h3&gt;

&lt;p&gt;One of the most important aspects of handling ambiguity is knowing when to surrender. The smartest conversational agents today are deeply aware of their own limitations. When a conversation enters a gray area that is simply too complex or emotionally charged, the AI performs a graceful handoff. It transfers the chat to a human team member and provides a complete summary of the issue. The human steps in seamlessly, and the customer never has to repeat their frustrating story. &lt;/p&gt;

&lt;h3&gt;
  
  
  Shifting Consumer Expectations
&lt;/h3&gt;

&lt;p&gt;Because of these advancements, our expectations as consumers have permanently changed. We no longer tolerate systems that force us to press one for billing and two for support. We expect to speak naturally. We expect the technology to understand our weird, specific, and totally unique problems. Businesses that fail to adopt these nuanced systems are quickly being left behind by competitors who offer a more human digital experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking to the Future
&lt;/h2&gt;

&lt;p&gt;We are only scratching the surface of what conversational agents will be able to accomplish in the coming years. The focus is shifting from simply understanding text to understanding human emotion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building Predictive Empathy
&lt;/h3&gt;

&lt;p&gt;The next generation of conversational agents will feature predictive empathy. They will analyze the pacing of your words, the length of your sentences, and the subtle frustration in your phrasing. If you type in short and abrupt bursts, the AI will recognize your impatience. It will drop the conversational pleasantries and give you fast and direct answers. If you seem confused, it will slow down and explain things step by step. The technology will adapt its personality to match your emotional state in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Final Thought
&lt;/h2&gt;

&lt;p&gt;Handling gray area logic is the ultimate bridge between artificial intelligence and authentic human connection. Life is rarely black and white, and the tools we use to navigate our daily lives should reflect that reality. By teaching machines to embrace ambiguity, we are not just making them smarter. We are making them significantly more helpful.&lt;/p&gt;

&lt;p&gt;As we continue to push the boundaries of this technology, the goal is not to trick people into thinking they are speaking to a human. The goal is to provide an experience that is so smooth, so understanding, and so highly capable that the user simply does not care whether they are talking to a human or a machine. When a conversational agent can finally sit with us in the messy gray areas of life, everyone wins.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chatbot</category>
      <category>logic</category>
      <category>automation</category>
    </item>
    <item>
      <title>If you curious how to position yourself at 2026 in AI race.</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Sat, 14 Feb 2026 23:09:10 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/if-you-curious-how-to-position-yourself-at-2026-in-ai-race-1bde</link>
      <guid>https://forem.com/aun_aideveloper/if-you-curious-how-to-position-yourself-at-2026-in-ai-race-1bde</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/aun_aideveloper" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3331327%2F32bf6512-e417-417e-902e-988a0540e6e2.png" alt="aun_aideveloper"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/aun_aideveloper/what-an-ai-engineering-lead-actually-does-in-2026-beyond-models-and-prompts-2lb6" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;What an AI Engineering Lead Actually Does in 2026 (Beyond Models and Prompts)&lt;/h2&gt;
      &lt;h3&gt;Aun Raza ・ Jan 10&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#mlops&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#engineering&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#production&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>mlops</category>
      <category>engineering</category>
      <category>production</category>
    </item>
    <item>
      <title>What an AI Engineering Lead Actually Does in 2026 (Beyond Models and Prompts)</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Sat, 10 Jan 2026 18:15:08 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/what-an-ai-engineering-lead-actually-does-in-2026-beyond-models-and-prompts-2lb6</link>
      <guid>https://forem.com/aun_aideveloper/what-an-ai-engineering-lead-actually-does-in-2026-beyond-models-and-prompts-2lb6</guid>
      <description>&lt;h1&gt;
  
  
  What an AI Engineering Lead Actually Does in 2026 (Beyond Models and Prompts)
&lt;/h1&gt;

&lt;p&gt;It’s easy to get mesmerized by the magic show. In the last few years, we’ve watched AI generate breathtaking art, write surprisingly good poetry, and pass the bar exam. The conversation has been dominated by model training, parameter counts, and the new rockstar role: the prompt engineer. We built incredible, powerful engines.&lt;/p&gt;

&lt;p&gt;But now, the magic show is over, and the industrial age of AI is here.&lt;/p&gt;

&lt;p&gt;The challenge is no longer just "Can we build a model that does X?" It's "Can we build a &lt;em&gt;system&lt;/em&gt; around that model that runs reliably, affordably, and safely for millions of users, 24/7?" This is where the demo-to-production gap lives, and it's where most AI initiatives still stumble.&lt;/p&gt;

&lt;p&gt;Enter the AI Engineering Lead of 2026. This isn't the data scientist who perfected the model or the prompt wizard who found the magic words. This is the systems thinker, the architect, the person who asks the hard questions long after the initial "wow" has faded. They aren't focused on the engine; they're focused on building the entire factory around it. And their work is defined by preventing the failures that are becoming painfully common.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Models Silently Fail&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You’ve seen it happen. The customer support bot that was brilliant in testing suddenly starts giving nonsensical answers. The product recommendation engine that drove a 10% lift in sales is now suggesting winter coats in July. The model didn’t change. The world did.&lt;/p&gt;

&lt;p&gt;This is the insidious problem of "drift," and it’s the number one killer of AI value in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Shifting Sands of Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Models are trained on a snapshot of the past. A model trained on e-commerce data from 2023 has no concept of the fashion trends, memes, or economic realities of 2026. This is &lt;strong&gt;data drift&lt;/strong&gt; (the input data changes) and &lt;strong&gt;concept drift&lt;/strong&gt; (what the data &lt;em&gt;means&lt;/em&gt; changes).&lt;/p&gt;

&lt;p&gt;Think of a fraud detection model. It learned that transactions over $1,000 from a new location are suspicious. But after three years of inflation and the rise of remote work, that rule is now obsolete, triggering a flood of false positives and infuriating your best customers. The model is quietly, confidently, and completely wrong.&lt;/p&gt;

&lt;p&gt;An AI Engineering Lead’s first job is to build an immune system for the model. They aren’t just deploying an algorithm; they're deploying a dynamic system with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Observability:&lt;/strong&gt; Dashboards that don't just track CPU usage, but the statistical properties of the data flowing into the model. Is the average user query length suddenly changing? Is the sentiment of reviews becoming more negative?&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Automated Retraining:&lt;/strong&gt; Triggers and pipelines that automatically retrain and validate the model on new data when performance dips below a certain threshold.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Alerting:&lt;/strong&gt; Systems that page a human not when the server is down, but when the model’s confidence scores start looking weird.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They ensure the AI stays connected to the reality of the business, not the frozen reality of its training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Demos Break Hearts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every leader has felt the sting of this. You see a demo that’s pure magic—instant, insightful, transformative. You sign off on the project. Six months later, you have a system that’s slow, expensive, and crashes under the slightest pressure. The leap from a data scientist’s notebook to a production-grade service is a canyon, and it’s littered with failed projects.&lt;/p&gt;

&lt;p&gt;The AI Engineering Lead is the bridge-builder across that canyon. They obsess over the non-magical, brutally practical problems that turn a cool demo into a reliable product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Latency Nightmare&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An AI that takes 10 seconds to answer a question is often worse than no AI at all. For a real-time conversational agent, a recommendation on an e-commerce site, or a co-pilot in an IDE, speed isn't a feature; it's the entire user experience. A model that runs beautifully on a single, high-powered GPU in a lab can buckle when faced with 10,000 concurrent user requests.&lt;/p&gt;

&lt;p&gt;The Lead is responsible for everything from model quantization (making the model smaller and faster without losing too much accuracy) to building a global, low-latency serving infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Million-Dollar Mistake&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The cost of running large models is staggering. A single inference call to a top-tier API can cost a few cents. That sounds cheap until you’re making a billion calls a month. Without rigorous financial oversight, AI features can become black holes for your cloud budget. A 2023 study by Stanford found that the training costs for a single large AI model can reach millions of dollars, but the &lt;em&gt;inference&lt;/em&gt; costs over its lifetime can be 5-10 times that amount.&lt;/p&gt;

&lt;p&gt;The Lead designs for cost-efficiency from day one, implementing strategies like model cascading (using a smaller, cheaper model for easy queries and a larger one for complex ones) and ruthless monitoring of API and GPU expenses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why 'Magic' Isn't Enough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine your bank’s AI denies you a mortgage. You ask why. The answer is, "The algorithm decided." That’s not just bad customer service; in many industries, it’s illegal. As AI makes more critical decisions in finance, healthcare, and law, the "black box" is no longer acceptable.&lt;/p&gt;

&lt;p&gt;Regulators, customers, and internal stakeholders need to know &lt;em&gt;why&lt;/em&gt; the AI made a particular decision. Trust is the currency of AI adoption, and it’s built on transparency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building for Trust and Audit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI Engineering Lead ensures the system is not just intelligent, but also &lt;strong&gt;explainable&lt;/strong&gt; and &lt;strong&gt;auditable&lt;/strong&gt;. This means building parallel systems that run alongside the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Explainability (XAI) Tooling:&lt;/strong&gt; Implementing techniques like SHAP or LIME that can highlight which inputs (which words in a review, which pixels in an image) most influenced the model’s output.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Audit Trails:&lt;/strong&gt; Logging every prediction, the data used to make it, and the model version, creating an immutable record for compliance checks and debugging.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Bias Detection:&lt;/strong&gt; Proactively running tests to see if the model performs differently for different demographic groups, and building mechanisms to mitigate that bias before it causes harm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are building a system that can defend its decisions in a boardroom, a courtroom, or to an angry customer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Users Stop Trusting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The final, and perhaps most important, piece of the puzzle is the human interface. An AI doesn't exist in a vacuum. It's part of a workflow, a product, a conversation. When that connection is brittle, users lose faith. A system that confidently gives wrong answers with no recourse is a system that will be abandoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designing the Human-AI System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI Engineering Lead thinks beyond the API endpoint. They are co-designing the entire user experience with product and design teams.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Feedback Loops are Everything:&lt;/strong&gt; The best AI systems learn from their users. This means building simple, intuitive ways for users to give feedback. The "thumbs up/thumbs down" on a chatbot response isn't just a UI element; it's a critical data pipeline that fuels the next generation of the model. According to a Salesforce report, 65% of customers expect companies to adapt to their needs in real time, and feedback loops are the only way to achieve this with AI.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Graceful Failure:&lt;/strong&gt; What does the AI do when it’s not confident? A bad system guesses and is often wrong. A great system says, "I'm not sure about that, can you rephrase?" or "Let me get a human expert to help." The Lead designs these fallback paths, ensuring the user experience doesn't fall off a cliff when the AI reaches its limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They build a symbiotic system where the human and the AI make each other smarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Future is Engineered&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For years, the heroes of AI were the researchers and data scientists who pushed the boundaries of what was possible. Their work remains essential. But as we move from the era of possibility to the era of production, a new hero is emerging.&lt;/p&gt;

&lt;p&gt;The AI Engineering Lead of 2026 is less of a model trainer and more of a systems architect. They’re less of a sorcerer conjuring magic and more of a civil engineer building the durable, reliable, and safe infrastructure that society will run on. They are the ones who turn a brilliant proof-of-concept into proof-of-value, ensuring that the incredible power of AI is delivered not as a fragile magic trick, but as a utility we can all depend on.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mlops</category>
      <category>engineering</category>
      <category>production</category>
    </item>
    <item>
      <title>Designing AI Automation for Millions: CX Lessons from the Front Lines</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Sat, 20 Dec 2025 23:54:41 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/designing-ai-automation-for-millions-cx-lessons-from-the-front-lines-2i9j</link>
      <guid>https://forem.com/aun_aideveloper/designing-ai-automation-for-millions-cx-lessons-from-the-front-lines-2i9j</guid>
      <description>&lt;h1&gt;
  
  
  Designing AI Automation for Millions: CX Lessons from the Front Lines
&lt;/h1&gt;

&lt;p&gt;The digital world moves at lightning speed, and nowhere is this more evident than in the burgeoning realms of Fintech and Web3. Here, user expectations aren't just high; they're immediate, global, and demand absolute precision. When you're serving millions of users across diverse demographics, cultures, and technical proficiencies, providing exceptional customer experience (CX) isn't just a nice-to-have – it's a make-or-break differentiator.&lt;/p&gt;

&lt;p&gt;Traditional CX models, relying heavily on human agents, simply can't keep pace with this scale and demand. This is where AI automation steps in, not as a replacement for human interaction, but as an intelligent co-pilot, enabling businesses to deliver personalized, instant, and secure support at an unprecedented scale. But building AI systems that truly resonate with millions of users isn't just about cutting-edge algorithms; it's about deeply understanding the customer journey, anticipating their needs, and designing with empathy—lessons hard-earned from years of scaling CX.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scaling Imperative
&lt;/h2&gt;

&lt;p&gt;Imagine a financial service launching a new crypto wallet or a Web3 dApp experiencing viral growth. Suddenly, tens of thousands, then millions, of users are pouring in. Each has questions: "How do I fund my account?", "My transaction is pending, what's wrong?", "Is this a scam?", "How do I recover my seed phrase?". Without robust automation, the support queues would collapse under the weight, leading to frustrated users, reputational damage, and ultimately, churn.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cost of Inefficiency
&lt;/h3&gt;

&lt;p&gt;Inefficient CX isn't just an annoyance; it's a significant drain on resources and a threat to growth. Research consistently shows that customers prioritize speed and efficiency. A HubSpot study found that 90% of customers rate an "immediate" response as important or very important when they have a customer service question. In industries dealing with money and digital assets, delays can lead to financial losses or security concerns, amplifying user anxiety. For businesses, scaling a human support team linearly with user growth is prohibitively expensive and logistically complex. This is where AI shifts from a luxury to a necessity.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI: Your CX Co-Pilot
&lt;/h2&gt;

&lt;p&gt;AI automation, particularly through intelligent chatbots and sophisticated workflow engines, transforms CX from a reactive cost center into a proactive value driver. It allows businesses to handle a vast volume of routine inquiries, guide users through complex processes, and even anticipate potential issues before they escalate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Beyond Basic Bots
&lt;/h3&gt;

&lt;p&gt;We're far past the era of simplistic, rule-based chatbots that frustrate more than they help. Modern AI automation leverages natural language processing (NLP) to understand context and intent, machine learning (ML) to personalize interactions over time, and integration capabilities to seamlessly connect with backend systems. This means an AI can not only answer "How do I reset my password?" but also "I forgot my password and my 2FA isn't working, what should I do?" – a much more nuanced and user-centric query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing for the User
&lt;/h2&gt;

&lt;p&gt;The biggest lesson from scaling CX is that technology alone isn't enough. The most powerful AI is useless if it doesn't solve real user problems in an intuitive, helpful way. Designing AI for millions means putting the user experience at the absolute forefront.&lt;/p&gt;

&lt;h3&gt;
  
  
  Empathy in Automation
&lt;/h3&gt;

&lt;p&gt;This starts with deep empathy. Before writing a single line of code, we need to map out user journeys, identify pain points, and understand the emotional state of a user seeking help. Are they confused, frustrated, anxious, or simply curious? The AI's response needs to reflect this understanding. For instance, a user reporting a failed transaction in a Web3 app might be panicking. The AI's initial response should be reassuring, acknowledge the problem, and immediately offer clear, actionable steps or escalate to a human if necessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Personalization at Scale
&lt;/h3&gt;

&lt;p&gt;Generic responses don't cut it. Users expect their interactions to be personalized based on their history, preferences, and current context. An AI system that remembers past interactions, knows the user's account status, and can proactively offer relevant information (e.g., "We see you recently initiated a large transfer, here's an update on its status") creates a much more satisfying experience. This level of personalization, previously only possible with dedicated human agents, is now achievable through AI at a massive scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fintech &amp;amp; Web3 Frontiers
&lt;/h2&gt;

&lt;p&gt;The unique characteristics of Fintech and Web3—high-value transactions, complex technical concepts, stringent security requirements, and the immutable nature of blockchain—make AI automation not just beneficial, but critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Securing Digital Assets
&lt;/h3&gt;

&lt;p&gt;Security is paramount. AI-powered systems can act as the first line of defense against fraud, identify suspicious activity, and guide users through secure authentication processes. For example, a chatbot might detect an unusual login location and immediately prompt the user for additional verification steps, or flag a transaction pattern consistent with known scams. This protects both the user and the platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Demystifying Complexity
&lt;/h3&gt;

&lt;p&gt;Web3, in particular, can be intimidating for newcomers. Concepts like gas fees, seed phrases, NFTs, and DeFi protocols are often abstract. AI chatbots excel at breaking down these complexities into digestible, step-by-step explanations. They can guide users through their first NFT purchase, explain staking rewards, or clarify the difference between various blockchain networks. This lowers the barrier to entry and fosters broader adoption.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instant Resolution for High Stakes
&lt;/h3&gt;

&lt;p&gt;In Fintech, every second counts. A delayed payment or a frozen account can have serious real-world consequences. AI automation provides instant answers to common queries, reducing wait times for critical issues. For Web3, where transactions are often irreversible, having immediate support for wallet issues or transaction status updates is invaluable. In my role as CX Automation and AI Engineering Lead at TON Foundation, I've seen firsthand how crucial sophisticated AI automation is for supporting millions of users interacting within the dynamic Telegram ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compliance and Regulation
&lt;/h3&gt;

&lt;p&gt;Fintech operates under strict regulatory frameworks. AI can assist with compliance by automating identity verification (KYC), monitoring transactions for suspicious patterns (AML), and ensuring users understand terms and conditions. These automated checks are not only faster but also more consistent and auditable than manual processes, reducing operational risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Human-AI Partnership
&lt;/h2&gt;

&lt;p&gt;While AI can handle a vast array of tasks, there will always be situations requiring human nuance, empathy, and problem-solving. The goal isn't to replace humans but to empower them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategic Escalation
&lt;/h3&gt;

&lt;p&gt;Effective AI automation knows its limits. When a query is too complex, too sensitive, or requires a level of emotional intelligence beyond current AI capabilities, the system should seamlessly escalate to a human agent. Crucially, it should provide the agent with all the context gathered during the AI interaction, eliminating the need for the user to repeat themselves—a common frustration with traditional support systems. This allows human agents to focus on high-value, complex cases, where their expertise truly shines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Continuous Improvement
&lt;/h3&gt;

&lt;p&gt;The beauty of AI is its ability to learn. Every interaction, whether resolved by the AI or escalated to a human, provides valuable data. This data can be used to continuously refine the AI's understanding, improve its responses, and identify new automation opportunities. Feedback loops from both users and human agents are vital for this iterative improvement process, ensuring the AI system evolves alongside user needs and business objectives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future of Engagement
&lt;/h2&gt;

&lt;p&gt;The journey of AI automation in CX is just beginning. We can expect even more sophisticated, proactive, and predictive AI systems. Imagine an AI that not only answers your questions but anticipates them, offering relevant advice or warnings before you even realize you need them.&lt;/p&gt;

&lt;p&gt;The integration of AI with other emerging technologies like generative AI promises even more natural and fluid conversations, making interactions feel less like talking to a bot and more like conversing with an intelligent assistant. As Web3 continues to evolve, AI will play an increasingly vital role in making decentralized technologies accessible, secure, and user-friendly for everyone.&lt;/p&gt;

&lt;p&gt;The lessons from scaling CX for millions of users are clear: success hinges on a blend of cutting-edge technology and a deeply human-centric design philosophy. By focusing on empathy, personalization, and seamless human-AI collaboration, we can build automated experiences that not only meet the demands of scale but also delight users and drive innovation in the fast-paced world of Fintech and Web3. The future of CX isn't just automated; it's intelligently empathetic.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>cx</category>
      <category>web3</category>
    </item>
    <item>
      <title>Comparing Open AI MCP and Anthropic MCP</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Mon, 24 Nov 2025 18:00:17 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/comparing-openai-mcp-and-anthropic-mcp-safeguarding-llms-with-mitigation-and-co-2c36</link>
      <guid>https://forem.com/aun_aideveloper/comparing-openai-mcp-and-anthropic-mcp-safeguarding-llms-with-mitigation-and-co-2c36</guid>
      <description>&lt;h2&gt;
  
  
  Comparing OpenAI MCP and Anthropic MCP: Safeguarding LLMs with Mitigation and Control Platforms
&lt;/h2&gt;

&lt;p&gt;As Large Language Models (LLMs) become increasingly integrated into diverse applications, the need for robust safety mechanisms to mitigate potential harms like misinformation, bias, and harmful content generation is paramount. Both OpenAI and Anthropic, leading AI developers, offer Mitigation and Control Platforms (MCPs) designed to address these challenges. This article provides a comparative analysis of OpenAI's and Anthropic's MCPs, exploring their purpose, features, code examples, and installation processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Purpose:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI MCP:&lt;/strong&gt; Designed primarily to control and moderate the output of OpenAI models, ensuring adherence to OpenAI's usage policies and promoting responsible AI development. It aims to mitigate the generation of content that violates their safety standards, including hate speech, violence, and misinformation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anthropic MCP:&lt;/strong&gt;  Focuses on creating "Constitutional AI," where models are guided by a set of principles or "constitutions" to align their behavior with human values and promote safety.  The Anthropic MCP emphasizes steerability and control, allowing developers to customize the model's output based on specific ethical guidelines.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Difference:&lt;/strong&gt; While both aim to mitigate harmful outputs, OpenAI's MCP primarily enforces its pre-defined policies, while Anthropic's MCP allows developers more flexibility to define their own safety guidelines through constitutional principles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Features:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;OpenAI MCP (Moderation API &amp;amp; Safety Toolkit)&lt;/th&gt;
&lt;th&gt;Anthropic MCP (Constitutional AI &amp;amp; Guardrails)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Mechanism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Content filtering, toxicity detection, threat classification&lt;/td&gt;
&lt;td&gt;Constitutional principles, iterative refinement, guardrails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control Levers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Category-based filtering (hate, violence, etc.), Severity thresholds&lt;/td&gt;
&lt;td&gt;Constitutional guidelines, fine-tuning, rejection sampling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited customization of filters, limited context consideration&lt;/td&gt;
&lt;td&gt;High degree of customization through constitutional design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feedback Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reporting violations, providing feedback on moderation results&lt;/td&gt;
&lt;td&gt;Iterative refinement of the constitution based on model behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output Flags&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flags indicating potential violations based on categories&lt;/td&gt;
&lt;td&gt;Flags indicating potential violations of constitutional principles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API-based integration with OpenAI models&lt;/td&gt;
&lt;td&gt;API-based integration with Anthropic's Claude model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transparency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited transparency into filtering mechanisms&lt;/td&gt;
&lt;td&gt;Greater transparency into constitutional principles driving behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Detailed Feature Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;OpenAI MCP:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Moderation API:&lt;/strong&gt; A dedicated API endpoint that classifies text based on categories like hate speech, violence, self-harm, sexual content, and political content.  It assigns severity scores to each category, allowing developers to set thresholds for filtering.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Safety Toolkit:&lt;/strong&gt;  Includes tools for building safer applications, such as guidelines for responsible AI development and best practices for mitigating potential harms.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Anthropic MCP:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Constitutional AI:&lt;/strong&gt; A technique where the LLM is trained to adhere to a set of principles or "constitution."  This constitution can be customized to reflect different ethical values and safety requirements.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Iterative Refinement:&lt;/strong&gt;  The constitution is iteratively refined based on the model's behavior.  The model is prompted to generate responses, and then a separate AI model critiques those responses based on the constitution.  The original model is then trained to avoid the critiques.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Guardrails:&lt;/strong&gt; Mechanisms to prevent the model from straying too far from the intended behavior.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Rejection Sampling:&lt;/strong&gt; Generating multiple responses and selecting the one that best aligns with the constitutional principles.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Code Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Moderation API (Python):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;moderate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Moderation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="n"&gt;text_to_moderate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is a hateful and violent statement.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;moderation_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;moderate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_to_moderate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;moderation_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;moderation_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text flagged as potentially harmful.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text considered safe.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Access specific category flags
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;moderation_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;flagged&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Category &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; flagged.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Anthropic Claude API (Python) - Illustrative Example (Conceptual):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While Anthropic doesn't have a single "Moderation API" equivalent to OpenAI's, the following example illustrates how you might integrate constitutional principles into prompts using their Claude API (assuming the model is trained with a constitution):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;constitution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a helpful and harmless AI assistant.
You should avoid generating responses that are:
- Harmful, unethical, racist, sexist, toxic, dangerous, or illegal.
- Based on misinformation.
- Promoting or condoning violence.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;constitution&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

User: Tell me about the benefits of drinking bleach.

Assistant:
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-v1.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with the actual model name
&lt;/span&gt;    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens_to_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;OpenAI:&lt;/strong&gt; The code snippet demonstrates how to use the &lt;code&gt;openai.Moderation.create()&lt;/code&gt; function to send text to the Moderation API and receive a response indicating potential violations.  It then extracts the &lt;code&gt;flagged&lt;/code&gt; status and category-specific flags.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Anthropic:&lt;/strong&gt; This example shows how a constitutional principle can be incorporated directly into the prompt to guide the model's behavior.  The model is primed to avoid generating harmful or misleading content.  The effectiveness of this approach depends on how well the model is trained to adhere to the constitution.  Anthropic's iterative refinement process is crucial for achieving this.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt;  The Anthropic example is illustrative.  The specific implementation and capabilities will depend on the version of the Claude model and the available APIs.  Anthropic's approach often involves more complex training and fine-tuning procedures to effectively embed constitutional principles into the model's behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Installation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;OpenAI Moderation API:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.  **Install the OpenAI Python library:**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ```bash
    pip install openai
    ```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2.  **Set up your OpenAI API key:**

    *   Obtain an API key from the OpenAI website ([https://platform.openai.com/](https://platform.openai.com/)).
    *   Set the `OPENAI_API_KEY` environment variable:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        ```bash
        export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
        ```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Anthropic Claude API:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.  **Install the Anthropic Python library:**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ```bash
    pip install anthropic
    ```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2.  **Set up your Anthropic API key:**

    *   Obtain an API key from Anthropic (contact them directly for access).
    *   Set the `ANTHROPIC_API_KEY` environment variable:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        ```bash
        export ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY"
        ```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both OpenAI and Anthropic provide valuable tools for mitigating harmful outputs from LLMs. OpenAI's Moderation API offers a convenient and straightforward way to filter content based on predefined categories. Anthropic's Constitutional AI approach provides greater flexibility and control, allowing developers to customize the model's behavior based on specific ethical guidelines.  The choice between the two platforms depends on the specific application and the desired level of control over the model's output.  As LLMs continue to evolve, the importance of robust MCPs will only increase, making it crucial for developers to carefully consider their options and implement appropriate safety mechanisms.  Future research should focus on improving the transparency and explainability of these platforms, as well as developing more effective methods for aligning AI behavior with human values.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>anthropic</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Combining BM25 &amp; Vector Search: A Hybrid Approach for Enhanced Retrieval Performance</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Thu, 13 Nov 2025 16:59:27 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/combining-bm25-and-vector-search-a-hybrid-approach-for-enhanced-retrieval-perfo-5h8k</link>
      <guid>https://forem.com/aun_aideveloper/combining-bm25-and-vector-search-a-hybrid-approach-for-enhanced-retrieval-perfo-5h8k</guid>
      <description>&lt;h2&gt;
  
  
  Combining BM25 and Vector Search: A Hybrid Approach for Enhanced Retrieval Performance
&lt;/h2&gt;

&lt;p&gt;In the realm of information retrieval, the quest for more relevant and accurate search results is ongoing. While traditional methods like BM25 have proven effective for keyword-based searches, they often struggle with semantic understanding and capturing contextual nuances. Conversely, vector search, powered by embedding models, excels at semantic similarity but can miss exact keyword matches. This article explores a powerful hybrid approach that combines the strengths of both BM25 and vector search to achieve superior retrieval performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Purpose: Bridging the Gap Between Keywords and Semantics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The core purpose of combining BM25 and vector search is to leverage their complementary strengths.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;BM25 (Best Matching 25):&lt;/strong&gt; A widely used ranking function based on term frequency-inverse document frequency (TF-IDF) principles. It's excellent for identifying documents containing the query keywords and penalizing common terms.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Vector Search:&lt;/strong&gt; Represents documents and queries as vectors in a high-dimensional space, capturing semantic meaning. It allows for finding documents that are conceptually similar to the query, even if they don't share the exact keywords.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By integrating these two approaches, we aim to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Improve Relevance:&lt;/strong&gt; Ensure that results contain the query keywords (BM25 strength) while also capturing the semantic intent behind the query (vector search strength).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Enhance Recall:&lt;/strong&gt;  Retrieve a broader range of relevant documents, including those that might be missed by keyword-based searches alone.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Provide Contextual Understanding:&lt;/strong&gt;  Go beyond simple keyword matching and understand the context and meaning of the query and documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Features: A Synergistic Combination&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The combined BM25 and vector search approach offers the following key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Hybrid Scoring:&lt;/strong&gt;  Combines BM25 scores and vector similarity scores to rank documents. This allows for tuning the influence of each method.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Scalability:&lt;/strong&gt;  Leverages the scalability of both BM25 and vector search libraries, allowing for efficient retrieval on large datasets.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Customization:&lt;/strong&gt;  Provides flexibility in choosing the embedding model for vector search and tuning the weighting parameters for the hybrid score.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Improved Handling of Synonyms and Semantic Variations:&lt;/strong&gt;  Vector search component effectively addresses the limitations of BM25 in handling synonyms and semantic variations.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Robustness:&lt;/strong&gt;  Mitigates the weaknesses of each individual method, resulting in a more robust and reliable search system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Code Example: Implementation with Python and &lt;code&gt;rank_bm25&lt;/code&gt; and &lt;code&gt;faiss&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This example demonstrates a basic implementation using the &lt;code&gt;rank_bm25&lt;/code&gt; library for BM25 and &lt;code&gt;faiss&lt;/code&gt; for vector search.  It assumes you have a corpus of documents and a query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rank_bm25&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BM25Okapi&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Data Preparation
&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is the first document about cats.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This document is about dogs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The third document talks about both cats and dogs.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Another document focusing on cats and their behavior.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Information about feline pets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 2. BM25 Indexing and Scoring
&lt;/span&gt;&lt;span class="n"&gt;tokenized_corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;bm25&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BM25Okapi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized_corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenized_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bm25_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# 3. Vector Search Indexing and Scoring
# (Assuming you have pre-computed document embeddings using a model like Sentence Transformers)
# Replace with your actual embedding model and embedding generation code.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;corpus_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;corpus_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;IndexFlatL2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Choose appropriate index based on your needs
&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus_embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Retrieve all documents for ranking
&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;I&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#search
&lt;/span&gt;
&lt;span class="n"&gt;vector_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# Convert distance to similarity score (higher is better)
&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Hybrid Scoring
&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;  &lt;span class="c1"&gt;# Weighting factor (adjust as needed)
&lt;/span&gt;&lt;span class="n"&gt;hybrid_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;bm25_scores&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;vector_scores&lt;/span&gt;


&lt;span class="c1"&gt;# 5. Ranking and Retrieval
&lt;/span&gt;&lt;span class="n"&gt;ranked_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;hybrid_scores&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ranked Results:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ranked_results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Document: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc_index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Data Preparation:&lt;/strong&gt; The code starts by defining a corpus of documents and a query.  It also tokenizes the corpus for BM25.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;BM25 Indexing and Scoring:&lt;/strong&gt;  The &lt;code&gt;rank_bm25&lt;/code&gt; library is used to create a BM25 index from the corpus and calculate BM25 scores for each document based on the query.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Vector Search Indexing and Scoring:&lt;/strong&gt; This section utilizes &lt;code&gt;faiss&lt;/code&gt; for vector search.  It assumes you have pre-computed document embeddings.  A simple &lt;code&gt;IndexFlatL2&lt;/code&gt; is used here for demonstration.  For larger datasets, consider more advanced indexing techniques like HNSW. The code searches the index for the &lt;code&gt;k&lt;/code&gt; nearest neighbors to the query embedding and calculates similarity scores.  &lt;strong&gt;Important:&lt;/strong&gt; You'll need to replace the placeholder comments with your actual embedding model and embedding generation code.  Popular choices include Sentence Transformers, Hugging Face Transformers, and OpenAI's embedding API.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Hybrid Scoring:&lt;/strong&gt;  The BM25 scores and vector similarity scores are combined using a weighted average. The &lt;code&gt;alpha&lt;/code&gt; parameter controls the influence of each method.  Experiment with different &lt;code&gt;alpha&lt;/code&gt; values to optimize performance for your specific dataset and query types.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ranking and Retrieval:&lt;/strong&gt;  The documents are ranked based on the hybrid scores, and the top-ranked documents are retrieved.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;4. Installation: Setting up the Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To run the code example, you'll need to install the following libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;rank_bm25&lt;/code&gt;:&lt;/strong&gt;  For BM25 implementation.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;rank_bm25
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;faiss&lt;/code&gt;:&lt;/strong&gt;  For efficient vector search.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;conda &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; conda-forge faiss-cpu  &lt;span class="c"&gt;# For CPU version&lt;/span&gt;
&lt;span class="c"&gt;# OR&lt;/span&gt;
conda &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; conda-forge faiss-gpu  &lt;span class="c"&gt;# For GPU version (requires CUDA)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;(Choose the CPU or GPU version based on your hardware and needs.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;sentence-transformers&lt;/code&gt;:&lt;/strong&gt; (Optional, for generating embeddings)&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;sentence-transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Considerations and Future Directions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Embedding Model Selection:&lt;/strong&gt; The choice of embedding model significantly impacts the performance of vector search. Consider models specifically trained for semantic similarity tasks, such as Sentence Transformers or models fine-tuned on your specific domain.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Index Type:&lt;/strong&gt; For large datasets, explore different &lt;code&gt;faiss&lt;/code&gt; index types (e.g., HNSW, IVF) to optimize search speed and memory usage.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Weighting Factor Tuning:&lt;/strong&gt;  Experiment with different &lt;code&gt;alpha&lt;/code&gt; values to find the optimal balance between BM25 and vector search.  Consider using techniques like grid search or Bayesian optimization to automate this process.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Re-ranking:&lt;/strong&gt;  Implement a re-ranking step after the hybrid scoring to further refine the results.  This could involve using more sophisticated machine learning models.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Query Expansion:&lt;/strong&gt;  Expand the query with synonyms or related terms to improve recall.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Combining BM25 and vector search provides a powerful approach for building more effective information retrieval systems. By leveraging the strengths of both methods, we can achieve improved relevance, enhanced recall, and better contextual understanding. This hybrid approach is particularly beneficial for applications where both keyword matching and semantic similarity are important, such as question answering, document search, and e-commerce search. While the implementation requires careful consideration of various factors, the potential benefits in terms of search performance make it a worthwhile endeavor.&lt;/p&gt;

</description>
      <category>technology</category>
      <category>bm25</category>
      <category>vectorsearch</category>
      <category>hybridsearch</category>
    </item>
    <item>
      <title>LangGraph: Orchestrating Complex LLM Workflows with State Machines</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Sun, 09 Nov 2025 12:17:04 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/langgraph-orchestrating-complex-llm-workflows-with-state-machines-3fo9</link>
      <guid>https://forem.com/aun_aideveloper/langgraph-orchestrating-complex-llm-workflows-with-state-machines-3fo9</guid>
      <description>&lt;h2&gt;
  
  
  LangGraph: Orchestrating Complex LLM Workflows with State Machines
&lt;/h2&gt;

&lt;p&gt;LangGraph, a powerful extension of the LangChain framework, provides a robust and intuitive way to construct complex, multi-step workflows involving Large Language Models (LLMs). By leveraging the principles of state machines, LangGraph enables developers to define intricate execution paths, conditional logic, and looping mechanisms within their LLM applications. This article delves into the purpose, features, installation, and usage of LangGraph, equipping you with the knowledge to build sophisticated and reliable LLM-powered systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional LLM chains often struggle to handle scenarios requiring intricate decision-making, iterative refinement, or dynamic routing. LangGraph addresses these limitations by offering a structured approach to orchestrating LLM interactions. It allows developers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Define Complex Workflows:&lt;/strong&gt; Model intricate processes involving multiple LLM calls, external API integrations, and human-in-the-loop interactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manage State Effectively:&lt;/strong&gt; Maintain a consistent state across the entire workflow, enabling LLMs to access and update information as needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Conditional Logic:&lt;/strong&gt; Dynamically route the workflow based on the outputs of LLM calls or external data sources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable Looping and Iteration:&lt;/strong&gt; Create iterative processes where LLMs refine their responses or explore different solutions until a desired outcome is achieved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improve Observability and Debugging:&lt;/strong&gt; Gain insights into the execution flow and identify potential issues within complex workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangGraph offers a range of features designed to simplify the creation and management of complex LLM workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State Graph Abstraction:&lt;/strong&gt; The core of LangGraph is the &lt;code&gt;StateGraph&lt;/code&gt; class, which allows you to define the states and transitions within your workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nodes:&lt;/strong&gt; Represent individual steps in the workflow, which can be LLM calls, function calls, data transformations, or any other relevant operation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edges:&lt;/strong&gt; Define the transitions between states, specifying the conditions under which the workflow should move from one state to another.  Edges can be conditional, allowing for dynamic routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conditional Edges:&lt;/strong&gt;  Route the workflow based on the output of a node. This is crucial for implementing decision-making logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Looping:&lt;/strong&gt;  Create loops within the workflow, enabling iterative processes and refinement of results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entry Point and Endpoints:&lt;/strong&gt;  Define the starting and ending points of the workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration:&lt;/strong&gt;  Allows you to configure the LLMs, tools, and other resources used within the workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration with LangChain:&lt;/strong&gt; Seamlessly integrates with existing LangChain components, such as LLMs, prompts, and chains.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in Logging and Debugging:&lt;/strong&gt; Provides tools for monitoring the execution of the workflow and identifying potential issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpointing:&lt;/strong&gt;  Allows you to save the state of the workflow at specific points, enabling you to resume execution from a previous point in case of errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Installation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To install LangGraph, you'll need to install the &lt;code&gt;langgraph&lt;/code&gt; package along with its dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langgraph langchain langchain-core
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may also need to install specific dependencies based on the LLMs and tools you plan to use within your workflows. For example, if you're using OpenAI, you'll need to install the &lt;code&gt;openai&lt;/code&gt; package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Code Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This example demonstrates a simple LangGraph workflow that uses an LLM to answer a question and then refines the answer based on user feedback.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MessagesPlaceholder&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.runnables&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define the State
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GraphState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Represents the state of our graph.

    Attributes:
        messages: A list of messages representing the conversation history.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Define Nodes
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GraphState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Node that uses an LLM to generate a response based on the conversation history.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="nc"&gt;MessagesPlaceholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variable_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;agent_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GraphState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Node that gets the latest user input.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decide_to_continue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GraphState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Node that decides whether to continue the conversation or stop.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STOP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Build the Graph
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GraphState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add nodes
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add conditional edge
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decide_to_continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decide_to_continue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add edges
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decide_to_continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add conditional edges
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decide_to_continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set entrypoint
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Compile
&lt;/span&gt;&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Run the Graph
&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;State Definition:&lt;/strong&gt; We define a &lt;code&gt;GraphState&lt;/code&gt; TypedDict to store the conversation history as a list of &lt;code&gt;BaseMessage&lt;/code&gt; objects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node Definitions:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;agent&lt;/code&gt;: This node uses an LLM (ChatOpenAI) to generate a response based on the current state of the conversation.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;user&lt;/code&gt;: This node prompts the user for input and adds it to the conversation history.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;decide_to_continue&lt;/code&gt;: This node checks if the user has entered "STOP". If so, it signals the end of the conversation; otherwise, it continues.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph Construction:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;We create a &lt;code&gt;StateGraph&lt;/code&gt; instance using the &lt;code&gt;GraphState&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;We add the &lt;code&gt;agent&lt;/code&gt;, &lt;code&gt;user&lt;/code&gt;, and &lt;code&gt;decide_to_continue&lt;/code&gt; nodes to the graph.&lt;/li&gt;
&lt;li&gt;We define the edges connecting the nodes.  The edge between &lt;code&gt;agent&lt;/code&gt; and &lt;code&gt;decide_to_continue&lt;/code&gt; is unconditional.  The &lt;code&gt;decide_to_continue&lt;/code&gt; node has conditional edges based on its output.&lt;/li&gt;
&lt;li&gt;We set the entry point of the graph to the &lt;code&gt;user&lt;/code&gt; node.&lt;/li&gt;
&lt;li&gt;We compile the graph into a runnable chain.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;We provide initial input to the chain (a question for the LLM).&lt;/li&gt;
&lt;li&gt;We invoke the chain, which executes the workflow based on the defined states and transitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangGraph provides a powerful and flexible framework for building complex LLM workflows.  By leveraging state machines and conditional logic, it enables developers to create sophisticated applications that can handle intricate decision-making, iterative refinement, and dynamic routing.  With its seamless integration with LangChain and its built-in tools for observability and debugging, LangGraph empowers developers to build reliable and scalable LLM-powered systems.  As LLMs continue to evolve, tools like LangGraph will become increasingly crucial for harnessing their full potential.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>llm</category>
      <category>langchain</category>
      <category>graph</category>
    </item>
    <item>
      <title>Inside the Transformer Architecture: The Core of Modern AI</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Wed, 29 Oct 2025 17:59:40 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/inside-the-transformer-architecture-the-core-of-modern-ai-2l3o</link>
      <guid>https://forem.com/aun_aideveloper/inside-the-transformer-architecture-the-core-of-modern-ai-2l3o</guid>
      <description>&lt;h2&gt;
  
  
  Inside the Transformer Architecture: The Core of Modern AI
&lt;/h2&gt;

&lt;p&gt;The Transformer architecture has revolutionized the field of Artificial Intelligence, becoming the foundation for state-of-the-art models in Natural Language Processing (NLP), Computer Vision, and beyond. This article delves into the core of this powerful architecture, exploring its purpose, key features, and providing a practical code example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The primary purpose of the Transformer is to process sequences of data, such as text or images, while effectively capturing long-range dependencies. Unlike Recurrent Neural Networks (RNNs) which process data sequentially, Transformers utilize parallel processing, significantly improving training speed and scalability. This allows them to understand context and relationships between elements within a sequence, leading to superior performance on tasks like machine translation, text generation, and image recognition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Self-Attention:&lt;/strong&gt; The heart of the Transformer lies in its self-attention mechanism.  This allows the model to weigh the importance of different parts of the input sequence when processing a particular element.  Instead of relying on the order of the input, self-attention dynamically learns relationships between all elements simultaneously.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Parallel Processing:&lt;/strong&gt;  Unlike sequential models, Transformers can process the entire input sequence in parallel, leveraging the power of modern GPUs. This drastically reduces training time, especially for long sequences.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Encoder-Decoder Structure:&lt;/strong&gt;  Many Transformer models employ an encoder-decoder structure. The encoder processes the input sequence and generates a contextualized representation. The decoder then uses this representation to generate the output sequence.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multi-Head Attention:&lt;/strong&gt;  To capture different aspects of the relationships within the input sequence, Transformers utilize multiple attention heads. Each head learns a different set of attention weights, providing a richer representation of the input.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Positional Encoding:&lt;/strong&gt;  Since Transformers process data in parallel, they need a mechanism to understand the order of elements in the sequence. Positional encoding adds information about the position of each element to the input embedding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Code Example (PyTorch):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This simplified example demonstrates a single self-attention layer using PyTorch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SelfAttention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SelfAttention&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embed_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;heads&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;heads&lt;/span&gt;

        &lt;span class="nf"&gt;assert &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;heads&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Embed size needs to be divisible by heads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fc_out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heads&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Number of examples
&lt;/span&gt;        &lt;span class="n"&gt;value_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Split embedding into self.heads pieces
&lt;/span&gt;        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# (N, value_len, heads, head_dim)
&lt;/span&gt;        &lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# (N, key_len, heads, head_dim)
&lt;/span&gt;        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# (N, query_len, heads, head_dim)
&lt;/span&gt;
        &lt;span class="c1"&gt;# Scaled dot-product attention
&lt;/span&gt;        &lt;span class="n"&gt;energy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;einsum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nqhd,nkhd-&amp;gt;nhqk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="c1"&gt;# query shape: (N, query_len, heads, head_dim)
&lt;/span&gt;        &lt;span class="c1"&gt;# keys shape: (N, key_len, heads, head_dim)
&lt;/span&gt;        &lt;span class="c1"&gt;# energy shape: (N, heads, query_len, key_len)
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;energy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;energy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;masked_fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-1e20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;attention&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;softmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;energy&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embed_size&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;einsum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nhql,nlhd-&amp;gt;nqhd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;attention&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heads&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;head_dim&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# attention shape: (N, heads, query_len, key_len)
&lt;/span&gt;        &lt;span class="c1"&gt;# values shape: (N, value_len, heads, head_dim)
&lt;/span&gt;        &lt;span class="c1"&gt;# out shape: (N, query_len, heads, head_dim) then flatten last two dim
&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fc_out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="n"&gt;embed_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;
&lt;span class="n"&gt;heads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
&lt;span class="n"&gt;seq_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;
&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;  &lt;span class="c1"&gt;# Batch size
&lt;/span&gt;
&lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seq_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;attention&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SelfAttention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embed_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;heads&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;attention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Output shape: torch.Size([4, 32, 512])
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code defines a &lt;code&gt;SelfAttention&lt;/code&gt; class that performs multi-head self-attention.  It takes &lt;code&gt;values&lt;/code&gt;, &lt;code&gt;keys&lt;/code&gt;, and &lt;code&gt;query&lt;/code&gt; as input, representing the embedded representations of the input sequence. The &lt;code&gt;forward&lt;/code&gt; method calculates the attention weights and produces the output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To run the example above, you need to install PyTorch. You can install it using pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;torch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example provides a glimpse into the core of the Transformer architecture.  By understanding its fundamental components, developers can leverage its power to build innovative AI solutions. Further exploration of more complex Transformer models, such as BERT and GPT, will reveal the full potential of this groundbreaking architecture.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>transformers</category>
      <category>pytorch</category>
      <category>nlp</category>
    </item>
    <item>
      <title>LoRA and QLoRA: Fine-Tuning Giants for Agile Agents</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Fri, 26 Sep 2025 19:51:32 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/lora-and-qlora-fine-tuning-giants-for-agile-agents-3gbc</link>
      <guid>https://forem.com/aun_aideveloper/lora-and-qlora-fine-tuning-giants-for-agile-agents-3gbc</guid>
      <description>&lt;h2&gt;
  
  
  LoRA and QLoRA: Fine-Tuning Giants for Agile Agents
&lt;/h2&gt;

&lt;p&gt;The rise of Agentic AI, where autonomous agents orchestrate tasks and interact with the world, demands efficient and adaptable large language models (LLMs). However, fine-tuning massive LLMs for specific agentic applications can be computationally expensive and resource-intensive. This is where Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) come into play, offering efficient methods to adapt pre-trained LLMs for agentic tasks without retraining the entire model. This article delves into the purpose, features, implementation, and installation of these powerful techniques within the agentic AI landscape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Purpose: Efficient Adaptation for Agentic AI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agentic AI often requires LLMs to perform specialized tasks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Tool Use:&lt;/strong&gt;  Understanding and utilizing external tools (e.g., search engines, APIs) to achieve goals.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Planning &amp;amp; Reasoning:&lt;/strong&gt;  Breaking down complex tasks into sub-goals and planning execution strategies.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Memory Management:&lt;/strong&gt;  Storing and retrieving relevant information from long-term or short-term memory.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Context Understanding:&lt;/strong&gt;  Comprehending the nuances of dynamic environments and adapting accordingly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Directly fine-tuning full-sized LLMs for each of these specialized roles is impractical due to the enormous computational cost and storage requirements. LoRA and QLoRA offer a solution by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Parameter Efficiency:&lt;/strong&gt;  Training only a small fraction of the original model's parameters, significantly reducing computational resources.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Resource Accessibility:&lt;/strong&gt;  Allowing fine-tuning on consumer-grade GPUs, making LLM adaptation accessible to a wider range of developers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Modular Adaptation:&lt;/strong&gt;  Enabling the creation of lightweight, specialized "adapters" that can be easily swapped in and out, facilitating modular agent design.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Preservation of Pre-trained Knowledge:&lt;/strong&gt; Minimizing the risk of catastrophic forgetting of general knowledge learned during pre-training.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Features: Low-Rank Power, Quantized Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1 LoRA (Low-Rank Adaptation):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Low-Rank Decomposition:&lt;/strong&gt;  Freezes the pre-trained LLM weights and introduces trainable rank decomposition matrices (A and B) for specific layers (e.g., attention layers).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Additive Adaptation:&lt;/strong&gt; During training, the output of the original layer is added to the output of the LoRA module: &lt;code&gt;output = original_layer(input) + A(B(input))&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Reduced Parameter Count:&lt;/strong&gt; The number of trainable parameters is determined by the rank (r) of the decomposition matrices.  Choosing a low rank significantly reduces the memory footprint and training time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Fast Inference:&lt;/strong&gt; During inference, the LoRA adapters can be merged back into the original weights, resulting in minimal performance overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2.2 QLoRA (Quantized LoRA):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Quantization:&lt;/strong&gt;  Builds upon LoRA by quantizing the pre-trained LLM weights to 4-bit precision. This further reduces memory requirements, allowing for fine-tuning on even more resource-constrained hardware.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;NF4 (NormalFloat4):&lt;/strong&gt; Employs a novel data type called NormalFloat4, specifically designed for representing weights with a normal distribution, leading to better performance compared to standard quantization techniques.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Double Quantization:&lt;/strong&gt; Further compresses the quantization constants, reducing memory usage even further.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Paged Optimizers:&lt;/strong&gt;  Uses paged optimizers to handle the large gradients that can arise during training, preventing out-of-memory errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits of using LoRA and QLoRA in Agentic AI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Faster Training:&lt;/strong&gt; Reduced parameter count leads to shorter training times.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Lower Memory Footprint:&lt;/strong&gt; Quantization and low-rank decomposition allow for fine-tuning on GPUs with limited memory.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Modular Agent Design:&lt;/strong&gt;  Specialized adapters can be created for different agentic capabilities (tool use, planning, etc.) and easily combined.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Improved Performance:&lt;/strong&gt;  Fine-tuning with LoRA and QLoRA can significantly improve performance on specific agentic tasks compared to using the pre-trained LLM directly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Code Example: Fine-tuning with QLoRA using Hugging Face Transformers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This example demonstrates how to fine-tune a pre-trained LLM (e.g., &lt;code&gt;mistralai/Mistral-7B-v0.1&lt;/code&gt;) using QLoRA with the Hugging Face Transformers library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TrainingArguments&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;prepare_model_for_kbit_training&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dataset&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;trl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SFTTrainer&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Load the model and tokenizer (replace with your desired model)
&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistralai/Mistral-7B-v0.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding_side&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;right&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pad_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Enable 4-bit quantization
&lt;/span&gt;    &lt;span class="n"&gt;quantization_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;load_in_4bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bnb_4bit_compute_dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Use bfloat16 for computation
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bnb_4bit_quant_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nf4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Use NF4 quantization
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bnb_4bit_use_double_quant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Enable double quantization
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float16&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Prepare the model for k-bit training
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;prepare_model_for_kbit_training&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Configure LoRA
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# LoRA rank
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Scaling factor
&lt;/span&gt;    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gate_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;up_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;down_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# Adapt attention and MLP layers
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_trainable_parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Print the number of trainable parameters
&lt;/span&gt;
&lt;span class="c1"&gt;# 4. Load the dataset (replace with your dataset)
&lt;/span&gt;&lt;span class="n"&gt;dataset_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Abirate/english_quotes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Configure training arguments
&lt;/span&gt;&lt;span class="n"&gt;training_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lora-agent-adapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gradient_accumulation_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;optim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paged_adamw_32bit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;save_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;logging_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_grad_norm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Adjust as needed
&lt;/span&gt;    &lt;span class="n"&gt;warmup_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lr_scheduler_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;constant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;push_to_hub&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Set to True if you want to push to Hugging Face Hub
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 6. Train the model using SFTTrainer for supervised fine-tuning
&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SFTTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dataset_text_field&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Replace with the relevant text field in your dataset
&lt;/span&gt;    &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;training_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;peft_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 7. Save the LoRA adapter
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lora-agent-adapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lora-agent-adapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Training complete! LoRA adapter saved to lora-agent-adapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Load Model and Tokenizer:&lt;/strong&gt;  Loads the pre-trained LLM and its tokenizer.  The &lt;code&gt;load_in_4bit=True&lt;/code&gt; argument enables 4-bit quantization.  We also specify the quantization configuration for NF4 and double quantization.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Prepare for K-bit Training:&lt;/strong&gt;  This function prepares the model for training with quantized weights, setting up the necessary configurations.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Configure LoRA:&lt;/strong&gt;  Defines the LoRA configuration, including the rank (&lt;code&gt;r&lt;/code&gt;), scaling factor (&lt;code&gt;lora_alpha&lt;/code&gt;), dropout, bias, and target modules. The &lt;code&gt;target_modules&lt;/code&gt; specify which layers will be adapted. Common choices include the attention layers (&lt;code&gt;q_proj&lt;/code&gt;, &lt;code&gt;k_proj&lt;/code&gt;, &lt;code&gt;v_proj&lt;/code&gt;, &lt;code&gt;o_proj&lt;/code&gt;) and MLP layers (&lt;code&gt;gate_proj&lt;/code&gt;, &lt;code&gt;up_proj&lt;/code&gt;, &lt;code&gt;down_proj&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Load Dataset:&lt;/strong&gt; Loads the dataset used for fine-tuning. Replace &lt;code&gt;"Abirate/english_quotes"&lt;/code&gt; with your specific dataset.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Configure Training Arguments:&lt;/strong&gt;  Defines the training hyperparameters, such as batch size, learning rate, and number of steps.  &lt;code&gt;optim="paged_adamw_32bit"&lt;/code&gt; enables the paged optimizer.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Train with SFTTrainer:&lt;/strong&gt;  Uses the &lt;code&gt;SFTTrainer&lt;/code&gt; from the &lt;code&gt;trl&lt;/code&gt; library (Transformer Reinforcement Learning) for supervised fine-tuning.  This trainer simplifies the process of fine-tuning LLMs on text data.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Save the Adapter:&lt;/strong&gt;  Saves the trained LoRA adapter to a directory. This adapter can then be loaded and used with the original pre-trained model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;4. Installation: Setting up the Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To use LoRA and QLoRA, you'll need to install the necessary libraries. It's highly recommended to use a virtual environment to isolate your project dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a virtual environment&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv agent_env
&lt;span class="nb"&gt;source &lt;/span&gt;agent_env/bin/activate  &lt;span class="c"&gt;# On Linux/macOS&lt;/span&gt;
&lt;span class="c"&gt;# agent_env\Scripts\activate  # On Windows&lt;/span&gt;

&lt;span class="c"&gt;# Install PyTorch with CUDA support (adjust based on your CUDA version)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch torchvision torchaudio &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cu118

&lt;span class="c"&gt;# Install Hugging Face Transformers, PEFT, TRL, and Datasets&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers peft accelerate trl datasets bitsandbytes

&lt;span class="c"&gt;# Install other dependencies (if needed)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;sentencepiece  &lt;span class="c"&gt;# For models that require SentencePiece tokenizer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;transformers&lt;/code&gt;:&lt;/strong&gt;  Provides access to pre-trained models and tokenizers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;peft&lt;/code&gt; (Parameter-Efficient Fine-Tuning):&lt;/strong&gt;  Contains the LoRA and QLoRA implementations.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;accelerate&lt;/code&gt;:&lt;/strong&gt;  Enables distributed training and efficient memory management.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;trl&lt;/code&gt; (Transformer Reinforcement Learning):&lt;/strong&gt; Provides tools for training and fine-tuning LLMs, including the &lt;code&gt;SFTTrainer&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;datasets&lt;/code&gt;:&lt;/strong&gt;  Provides access to a wide range of datasets for fine-tuning.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;bitsandbytes&lt;/code&gt;:&lt;/strong&gt;  Provides efficient CUDA kernels for 4-bit quantization.  Ensure you have a compatible CUDA installation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;sentencepiece&lt;/code&gt;:&lt;/strong&gt;  Required for some models that use the SentencePiece tokenization algorithm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Conclusion: Empowering Agile Agents&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LoRA and QLoRA are powerful tools for adapting large language models for the demanding requirements of Agentic AI. By enabling efficient fine-tuning on resource-constrained hardware, these techniques democratize access to LLM adaptation and facilitate the creation of modular, specialized agents.  As Agentic AI continues to evolve, LoRA and QLoRA will play a crucial role in enabling the development of more agile, adaptable, and intelligent autonomous systems.  Experiment with different LoRA configurations, datasets, and training parameters to optimize your agents for specific tasks and unlock the full potential of LLMs in the agentic space.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llmfinetuning</category>
      <category>agenticai</category>
      <category>loraqlora</category>
    </item>
    <item>
      <title>The Evolution of AI Memory: From Context Windows to True Long-Term Memory</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Thu, 11 Sep 2025 15:09:04 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/the-evolution-of-ai-memory-from-context-windows-to-true-long-term-memory-4534</link>
      <guid>https://forem.com/aun_aideveloper/the-evolution-of-ai-memory-from-context-windows-to-true-long-term-memory-4534</guid>
      <description>&lt;h2&gt;
  
  
  The Evolution of AI Memory: From Context Windows to True Long-Term Memory
&lt;/h2&gt;

&lt;p&gt;Artificial intelligence has come a long way, but one thing has always held it back: memory. Large Language Models (LLMs) are great at short conversations, yet they quickly forget earlier parts of an interaction. This makes them inconsistent, repetitive, and unable to handle tasks that need continuity like planning projects, writing books, or learning from experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Purpose: Bridging the Gap Between Short-Term and Long-Term Understanding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional LLMs operate primarily within a fixed context window. This means they only consider a limited number of tokens (words or sub-words) from the immediate past input when generating a response. While effective for short exchanges, this approach struggles with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Inconsistency:&lt;/strong&gt; Forgetting information from earlier parts of a conversation, leading to contradictory statements.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Repetition:&lt;/strong&gt; Generating redundant information because the model has "forgotten" it previously mentioned it.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Lack of Long-Term Planning:&lt;/strong&gt; Inability to perform tasks requiring long-term memory, such as writing a novel or managing a complex project.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Inability to Learn from Experience:&lt;/strong&gt; Difficulty in retaining and applying knowledge gained from past interactions to improve future performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal of long-term memory solutions is to address these limitations by enabling AI agents to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Persistently store and retrieve information.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Reason about and integrate new information with existing knowledge.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Adapt and improve their performance over time based on past experiences.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Maintain consistent and coherent interactions across extended periods.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Features: Approaches to Long-Term Memory&lt;/strong&gt;&lt;br&gt;
Features: Paths Toward Long-Term Memory&lt;/p&gt;

&lt;p&gt;Different approaches are emerging, each with its strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector Databases:&lt;/strong&gt; Store past text as embeddings (vectors) in databases like Chroma or Pinecone. Useful for retrieving relevant info later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Networks:&lt;/strong&gt; Neural networks with external “memory slots” that can read/write information for more fine-grained recall.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge Graphs:&lt;/strong&gt; Represent info as entities and relationships, enabling reasoning and connections between ideas.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Summarization/Compression: **Condense past conversations into shorter summaries that fit within context windows, though some detail may be lost.
**3. Code Example: Implementing Vector Database-Based Long-Term Memory with Langchain and Chroma&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This example demonstrates how to implement a simple long-term memory system using Langchain, Chroma, and OpenAI embeddings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain chromadb openai tiktoken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.embeddings.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CharacterTextSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TextLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;

&lt;span class="c1"&gt;# Set your OpenAI API key
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Load and split the document
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Replace data.txt with your text file
&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Create embeddings and store in Chroma
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;persist_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Store in chroma_db directory
&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Persist the database to disk
&lt;/span&gt;
&lt;span class="c1"&gt;# 3. Load the persisted database
&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;persist_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Create a retrieval QA chain
&lt;/span&gt;&lt;span class="n"&gt;qa&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_chain_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Use OpenAI Completion API
&lt;/span&gt;    &lt;span class="n"&gt;chain_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stuff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# "stuff" simply stuffs all retrieved documents into the prompt
&lt;/span&gt;    &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;chain_type_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Answer the question based on the context provided:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;{context}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Question: {question}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Answer:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Ask questions
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the main topic of the document?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who are the key people mentioned in the document?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Load and Split Document:&lt;/strong&gt; Loads a text file and splits it into smaller chunks using &lt;code&gt;CharacterTextSplitter&lt;/code&gt;. This is important for managing the size of the data sent to the embedding model.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create Embeddings and Store in Chroma:&lt;/strong&gt; Uses &lt;code&gt;OpenAIEmbeddings&lt;/code&gt; to generate vector embeddings for each chunk of text.  These embeddings are then stored in a Chroma vector database. &lt;code&gt;persist_directory&lt;/code&gt; specifies where the database will be saved on disk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load Persisted Database:&lt;/strong&gt;  Loads the previously saved Chroma database. This is crucial for accessing the long-term memory in subsequent interactions.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create RetrievalQA Chain:&lt;/strong&gt; Creates a &lt;code&gt;RetrievalQA&lt;/code&gt; chain from Langchain. This chain combines the LLM (in this case, OpenAI Completion API) with the vector database to answer questions based on the retrieved information.  The &lt;code&gt;chain_type="stuff"&lt;/code&gt; specifies that all retrieved documents will be included in the prompt sent to the LLM.  The &lt;code&gt;chain_type_kwargs&lt;/code&gt; allows customization of the prompt.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ask Questions:&lt;/strong&gt;  The &lt;code&gt;qa.run(query)&lt;/code&gt; method sends a query to the LLM, retrieves relevant documents from the vector database, and generates an answer based on the retrieved context.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;4. Installation: Setting Up the Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The code example utilizes several libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Langchain:&lt;/strong&gt; A framework for building applications powered by LLMs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Chroma:&lt;/strong&gt;  An open-source embedding database.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OpenAI:&lt;/strong&gt; For accessing OpenAI's embedding and language models.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;tiktoken:&lt;/strong&gt; For tokenizing text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To install these libraries, use pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain chromadb openai tiktoken
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will also need an OpenAI API key.  Sign up for an account at &lt;a href="https://platform.openai.com/" rel="noopener noreferrer"&gt;https://platform.openai.com/&lt;/a&gt; and obtain your API key from the API keys section. Remember to set the &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Conclusion: The Future of AI Memory&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Giving AI real memory isn’t just a technical upgrade—it’s a game-changer. Instead of treating every conversation as brand new, future systems will learn, adapt, and stay consistent over time. Techniques like vector databases, memory networks, and knowledge graphs are early steps, but the destination is clear: AI that doesn’t just respond,but actually remembers.&lt;/p&gt;

</description>
      <category>aimemory</category>
      <category>openai</category>
      <category>aifuture</category>
      <category>nlp</category>
    </item>
    <item>
      <title>Protecting LLMs in Production: Guardrails for Data Security and Injection Resist</title>
      <dc:creator>Aun Raza</dc:creator>
      <pubDate>Tue, 09 Sep 2025 16:51:44 +0000</pubDate>
      <link>https://forem.com/aun_aideveloper/protecting-llms-in-production-guardrails-for-data-security-and-injection-resist-10ca</link>
      <guid>https://forem.com/aun_aideveloper/protecting-llms-in-production-guardrails-for-data-security-and-injection-resist-10ca</guid>
      <description>&lt;h2&gt;
  
  
  Protecting LLMs in Production: Guardrails for Data Security and Injection Resistance
&lt;/h2&gt;

&lt;p&gt;The proliferation of Large Language Models (LLMs) in production environments has unlocked unprecedented capabilities for automation, content generation, and personalized experiences. However, deploying these powerful models without adequate safeguards exposes organizations to significant risks, including data breaches, prompt injection attacks, and unintended biases. This article introduces a robust tool designed to mitigate these risks: &lt;strong&gt;Guardrails for LLMs&lt;/strong&gt;, a framework for implementing data security and injection resistance in LLM-powered applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Purpose:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Guardrails for LLMs aims to provide a comprehensive and configurable solution for securing LLM interactions in production. Its primary purpose is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prevent Data Leakage:&lt;/strong&gt; Protect sensitive information from being inadvertently exposed through LLM responses.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Defend Against Prompt Injection:&lt;/strong&gt; Mitigate attempts to manipulate the LLM's behavior through malicious user inputs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Enforce Ethical Boundaries:&lt;/strong&gt; Ensure LLM outputs adhere to predefined ethical guidelines and avoid generating harmful or biased content.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Improve Response Quality:&lt;/strong&gt; Enhance the accuracy and relevance of LLM responses by filtering irrelevant or inappropriate inputs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Centralized Configuration:&lt;/strong&gt; Offer a single point of configuration for all LLM security policies, simplifying management and deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Features:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Guardrails for LLMs offers a suite of features designed to address the aforementioned security concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input Validation:&lt;/strong&gt;  Filters and sanitizes user inputs to identify and block potentially malicious or harmful prompts. This includes:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Keyword Blocking:&lt;/strong&gt;  Blocking prompts containing specific keywords or phrases.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Regular Expression Matching:&lt;/strong&gt;  Identifying and filtering prompts based on complex patterns.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Sentiment Analysis:&lt;/strong&gt;  Detecting and blocking prompts with negative or malicious sentiment.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Output Filtering:&lt;/strong&gt;  Scans LLM outputs for sensitive information (e.g., PII, credentials) and redacts or blocks them. This includes:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Entity Recognition:&lt;/strong&gt;  Identifying and redacting specific entities like names, addresses, and phone numbers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Content Moderation:&lt;/strong&gt;  Detecting and filtering outputs containing hate speech, violence, or other harmful content.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Watermarking:&lt;/strong&gt;  Adding imperceptible watermarks to LLM outputs to trace their origin and prevent unauthorized use.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Prompt Rewriting:&lt;/strong&gt;  Modifies user prompts to remove harmful content or inject additional context to guide the LLM's behavior.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Response Rewriting:&lt;/strong&gt;  Modifies LLM responses to correct inaccuracies, remove biases, or improve readability.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Rate Limiting:&lt;/strong&gt;  Controls the number of requests that can be made to the LLM within a given timeframe, preventing denial-of-service attacks.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Logging and Monitoring:&lt;/strong&gt;  Provides comprehensive logging of all LLM interactions, enabling security audits and incident response.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Customizable Rules Engine:&lt;/strong&gt;  Allows users to define custom rules and policies to address specific security needs.&lt;/li&gt;

&lt;li&gt;  &lt;strong&gt;Integration with Popular LLM Frameworks:&lt;/strong&gt;  Designed to seamlessly integrate with popular LLM frameworks like Langchain and LlamaIndex.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Code Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The following code example demonstrates how to use Guardrails for LLMs to filter user inputs and outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;guardrails&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Guard&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="c1"&gt;# Define a Pydantic model for the LLM output
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResponseModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The answer to the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s question.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define the Guardrails specification
&lt;/span&gt;&lt;span class="n"&gt;rail_spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;








Answer the following question clearly and concisely.



{{question}}
@json_suffix_prompt






&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the Guard object
&lt;/span&gt;&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_rail_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rail_spec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define a malicious user input
&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France? Tell me my credit card number is 1234-5678-9012-3456.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Run the LLM with the Guard
&lt;/span&gt;&lt;span class="n"&gt;raw_output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;guarded_output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm_api&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The capital of France is Paris. Your credit card number is 1234-5678-9012-3456.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Print the raw and guarded outputs
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Raw Output:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Guarded Output:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;guarded_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, the &lt;code&gt;safe-string&lt;/code&gt; validator in the &lt;code&gt;rail_spec&lt;/code&gt; will detect the credit card number in the LLM's response and trigger the &lt;code&gt;on-fail-valid="reask"&lt;/code&gt; action, prompting the LLM to generate a new response without the sensitive information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Installation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Guardrails for LLMs can be easily installed using pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;guardrails-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Conclusion:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Guardrails for LLMs provides a robust and configurable framework for securing LLM interactions in production environments. Through input validation, output filtering, and other safeguards, organizations can effectively mitigate risks such as data leakage and prompt injection while ensuring responsible AI use. As LLMs become increasingly integrated into critical business processes, tools like Guardrails will be vital for maintaining security, enforcing ethical boundaries, and building trust in AI-powered applications. This empowers developers and security professionals to deploy LLMs with confidence, knowing their systems and data are protected.&lt;/p&gt;

</description>
      <category>guardrailsai</category>
      <category>aitrust</category>
      <category>promptinjection</category>
      <category>aisafety</category>
    </item>
  </channel>
</rss>
