<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tune AI</title>
    <description>The latest articles on Forem by Tune AI (@tunehqai).</description>
    <link>https://forem.com/tunehqai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9096%2F8684d6e5-4d82-4e0f-92cb-d3cdd186f838.jpg</url>
      <title>Forem: Tune AI</title>
      <link>https://forem.com/tunehqai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tunehqai"/>
    <language>en</language>
    <item>
      <title>Benchmarking Pixtral Large vs Pixtral 12B</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Mon, 25 Nov 2024 21:52:24 +0000</pubDate>
      <link>https://forem.com/tunehqai/benchmarking-pixtral-large-vs-pixtral-12b-2l0e</link>
      <guid>https://forem.com/tunehqai/benchmarking-pixtral-large-vs-pixtral-12b-2l0e</guid>
      <description>&lt;p&gt;Youtube: &lt;a href="https://youtu.be/O412lbdYQA0" rel="noopener noreferrer"&gt;Click Me&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Multimodal AI has taken significant leaps in recent years, and &lt;strong&gt;Mistral AI's Pixtral Large&lt;/strong&gt; is no exception. This new Vision-Language Model (VLM) aims to redefine benchmarks in multimodal understanding and reasoning. In this post, I’ll dive into Pixtral Large's capabilities, its performance against its predecessor, Pixtral 12B, and GPT-4V, and share my benchmarking experiments to help you make informed decisions when choosing your next VLM.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What is Pixtral Large?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Pixtral Large is Mistral AI’s latest multimodal innovation. Building on the foundation of Pixtral 12B, it introduces enhanced reasoning and comprehension capabilities. Whether tackling complex math problems on datasets like &lt;strong&gt;MathVista&lt;/strong&gt;, document comprehension from &lt;strong&gt;DocVQA&lt;/strong&gt;, or visual-question answering with &lt;strong&gt;VQAv2&lt;/strong&gt;, Pixtral Large consistently sets itself apart with superior performance.&lt;/p&gt;

&lt;p&gt;At its core, Pixtral Large is powered by &lt;strong&gt;123 billion multimodal decoder parameters&lt;/strong&gt; and a &lt;strong&gt;1 billion-parameter vision encoder&lt;/strong&gt;, making it a true powerhouse. It supports up to &lt;strong&gt;30 high-resolution images&lt;/strong&gt; within a &lt;strong&gt;128K context window&lt;/strong&gt;, allowing it to handle complex, large-scale reasoning tasks effortlessly. Its &lt;strong&gt;Mistral Large 2 Text Encoder&lt;/strong&gt; enhances text processing while maintaining its exceptional multimodal capabilities.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Technical Specifications&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Although the exact architecture of Pixtral Large remains undisclosed, it likely builds upon Pixtral 12B's &lt;strong&gt;common embedding-based multimodal transformer decoder&lt;/strong&gt;. This setup enables it to process &lt;strong&gt;multi-image inferences&lt;/strong&gt; and perform high-quality cross-modal reasoning, excelling at tasks that require a deep integration of visual and textual data.&lt;/p&gt;

&lt;p&gt;Here are some standout specs of Pixtral Large:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parameters&lt;/strong&gt;: 123 billion (multimodal decoder) + 1 billion (vision encoder)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Window&lt;/strong&gt;: 128K tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Support&lt;/strong&gt;: Up to 30 high-resolution images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Applications&lt;/strong&gt;: Math reasoning, document comprehension, chart understanding, and more&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Pixtral Large vs. Pixtral 12B&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The shift from Pixtral 12B to Pixtral Large represents a nuanced tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pixtral 12B&lt;/strong&gt;: Balanced capabilities across tasks, excelling in &lt;strong&gt;label-based&lt;/strong&gt; and &lt;strong&gt;rationale-based&lt;/strong&gt; evaluations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pixtral Large&lt;/strong&gt;: Falls behind in label-based tasks but shines in &lt;strong&gt;rationale-based performance&lt;/strong&gt;, indicating superior reasoning and explanation capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This evolution demonstrates Pixtral Large’s focus on tasks requiring deeper comprehension and reasoning, making it a strong contender for specialized use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Benchmarking Results&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Datasets Used&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To test Pixtral Large, I benchmarked it against its predecessor and GPT-4V using two datasets:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ArxivQA&lt;/strong&gt;: Research paper-based QA tasks with GPT-4V inferences for comparison.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flickr30k&lt;/strong&gt;: A classic image captioning dataset enhanced with GPT-4O-generated captions.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Evaluation Metrics&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I used &lt;strong&gt;Cosine Similarity&lt;/strong&gt; to measure semantic alignment between generated outputs and reference data. Metrics included &lt;strong&gt;win rate&lt;/strong&gt;, &lt;strong&gt;average similarity&lt;/strong&gt;, and &lt;strong&gt;top-1, top-5, top-10 scores&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;ArxivQA Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;From &lt;strong&gt;1,000 randomly selected images&lt;/strong&gt;, Pixtral Large demonstrated a stronger ability to reason through scientific and mathematical content. While it struggled with label-based evaluations compared to Pixtral 12B, it outperformed in rationale-based tasks. This indicates a shift toward &lt;strong&gt;deeper reasoning capabilities&lt;/strong&gt;, ideal for complex QA scenarios.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3e11lo2vfnp6jrt1yewv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3e11lo2vfnp6jrt1yewv.png" alt=" " width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Flickr30k Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For the &lt;strong&gt;Flickr30k Captioning Benchmark&lt;/strong&gt;, Pixtral Large produced slight improvements over Pixtral 12B when evaluated against &lt;strong&gt;human-generated captions&lt;/strong&gt;. However, both models lagged in achieving a win rate for this task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frmvfwfxfe3xgglyforty.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frmvfwfxfe3xgglyforty.png" alt=" " width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Interestingly, when compared to &lt;strong&gt;GPT-4V captions&lt;/strong&gt;, Pixtral Large performed well, though it fell slightly behind Pixtral 12B in top-ranked matches. These results highlight Pixtral Large’s potential but also suggest areas for improvement in precision and caption generation.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Using Pixtral Large on Tune Studio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Due to the model's size and resource requirements, I used &lt;strong&gt;Tune Studio&lt;/strong&gt; for benchmarking. With its user-friendly interface and efficient inference scripts, I was able to process &lt;strong&gt;500 images per hour&lt;/strong&gt;, completing the job for under $20. This makes Tune Studio a valuable tool for researchers and developers working on large-scale AI projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Pixtral Large represents a significant step forward in multimodal AI, offering enhanced reasoning and cross-modal comprehension. While it may not surpass Pixtral 12B in every aspect, its focus on rationale-based tasks makes it a compelling choice for applications requiring deeper understanding.&lt;/p&gt;

&lt;p&gt;For developers, researchers, and enterprises looking for cutting-edge VLMs, Pixtral Large offers a mix of power and precision that’s hard to beat.&lt;/p&gt;




&lt;p&gt;What do you think about Pixtral Large? Is it the next big thing in VLMs, or do you see potential in other models like GPT-4V? Let me know your thoughts in the comments below! 🚀&lt;/p&gt;

</description>
      <category>llm</category>
      <category>vlm</category>
      <category>benchmarking</category>
      <category>research</category>
    </item>
    <item>
      <title>Transform UI Screenshots into HTML &amp; CSS with Qwen Coder and Qwen VL</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Thu, 14 Nov 2024 16:43:38 +0000</pubDate>
      <link>https://forem.com/tunehqai/transform-ui-screenshots-into-html-css-with-qwen-coder-and-qwen-vl-12nl</link>
      <guid>https://forem.com/tunehqai/transform-ui-screenshots-into-html-css-with-qwen-coder-and-qwen-vl-12nl</guid>
      <description>&lt;p&gt;🎥Youtube Video Link: &lt;a href="https://youtu.be/TK1TDe7fHGI" rel="noopener noreferrer"&gt;Click Me&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Imagine this: you’re working on a website redesign, and you’ve just captured a UI screenshot that embodies the look you want. Wouldn’t it be incredible if you could automatically turn that image into HTML and CSS? This tutorial will show you exactly how to make that happen, transforming visual designs into code using cutting-edge &lt;strong&gt;vision-language models (VLMs)&lt;/strong&gt; and &lt;strong&gt;Qwen Coder&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;In this setup, we’ll build a pipeline where an AI model analyzes your UI design image, understands its layout, colors, typography, and structure, and then generates clean, organized HTML and CSS code. This process opens up a world of possibilities for &lt;strong&gt;UI prototyping, automated design-to-code workflows&lt;/strong&gt;, and quick mockup generation.&lt;/p&gt;

&lt;p&gt;Some cool points we'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Upload and Describe UI Designs&lt;/strong&gt;: How we upload a UI screenshot and get a detailed breakdown of the design elements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate HTML &amp;amp; CSS with AI&lt;/strong&gt;: Transforming these descriptions into fully functional HTML and CSS code for quick web design prototyping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s get started!&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Setting Up API Details and Image Encoding
&lt;/h3&gt;

&lt;p&gt;First, let’s configure the API endpoint, headers, and a helper function to encode images into Base64. This encoding step allows us to send the image data to the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BytesIO&lt;/span&gt;

&lt;span class="c1"&gt;# Set API details
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://proxy.tune.app/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with your actual API key
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Encode image in Base64 format
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;encode_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;RGBA&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;RGB&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Convert RGBA to RGB
&lt;/span&gt;    &lt;span class="n"&gt;buffered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JPEG&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getvalue&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Querying the Vision-Language Model for Description
&lt;/h3&gt;

&lt;p&gt;In this step, we’ll create a function that queries the VLM to analyze the UI image and provide a detailed description. This model captures all aspects of the UI, including &lt;strong&gt;color schemes, typography, layout structures, and icons&lt;/strong&gt;, which are essential for accurately generating HTML and CSS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Query the model for a description of the image
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base64_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;image_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/jpeg;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base64_image&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="n"&gt;image_content&lt;/span&gt;
                &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[{}])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Extracting HTML and CSS Code from Model Response
&lt;/h3&gt;

&lt;p&gt;Once we have the description, we prompt &lt;strong&gt;Qwen Coder&lt;/strong&gt; to generate HTML and CSS based on the UI layout. Our code will parse the response, extracting any HTML and CSS content for easy file output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="c1"&gt;# Extract HTML and CSS from model response
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_html_css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;html_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;### HTML\n```

html\n(.*?)

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;css_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;### CSS.*\n```

css\n(.*?)

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;html_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;html_match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;html_match&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;css_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;css_match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;css_match&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;html_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;css_code&lt;/span&gt;

&lt;span class="c1"&gt;# Save HTML and CSS to files
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;css_code&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;index.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;html_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;html_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;styles.css&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;css_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;css_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;css_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Building the Streamlit App for User Interaction
&lt;/h3&gt;

&lt;p&gt;Our final step is setting up the &lt;strong&gt;Streamlit&lt;/strong&gt; interface. This UI allows users to upload images, choose a model, generate descriptions, and output HTML/CSS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;streamlit&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;

&lt;span class="c1"&gt;# Streamlit UI setup
&lt;/span&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Image Description and HTML/CSS Generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Select Model for Image Understanding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                            &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen-2-vl-72b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistral/pixtral-12B-2409&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta/llama-3.2-90b-vision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;uploaded_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file_uploader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Choose an image...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate Description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;uploaded_image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uploaded_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;base64_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encode_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Generate the UI description
&lt;/span&gt;        &lt;span class="n"&gt;description_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please analyze this software interface image and provide a detailed description.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base64_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_choice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subheader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generated Description:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Generate HTML and CSS
&lt;/span&gt;            &lt;span class="n"&gt;html_css_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are TuneStudio, a coding assistant that generates HTML and CSS based on descriptions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please create HTML and CSS based on the following detailed description: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen-2.5-coder-32b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;html_css_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;html_css_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[{}])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;html_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;css_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_html_css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html_css_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;html_code&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;css_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;write_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;css_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTML and CSS files have been generated.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTML/CSS extraction failed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subheader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generated HTML and CSS:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html_css_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error generating HTML/CSS.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please upload an image.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;With this setup, you’ve created a pipeline that not only automates the analysis of UI images but also translates them into HTML and CSS. This workflow is a major time-saver for developers, designers, and anyone involved in UI design. Now, you can turn visual ideas into functional code with the power of AI!&lt;/p&gt;

&lt;p&gt;Let me know if you run into any questions or issues in the comments below.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Building a Multi-Turn-Assistant Application using Llama, Claude and GPT4o</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Fri, 18 Oct 2024 17:32:13 +0000</pubDate>
      <link>https://forem.com/tunehqai/building-a-multi-turn-assistant-application-using-llama-claude-and-gpt4o-1ieo</link>
      <guid>https://forem.com/tunehqai/building-a-multi-turn-assistant-application-using-llama-claude-and-gpt4o-1ieo</guid>
      <description>&lt;p&gt;💻Github: &lt;a href="https://github.com/aryankargwal/genai-tutorials/tree/main/multi-turn-agents" rel="noopener noreferrer"&gt;https://github.com/aryankargwal/genai-tutorials/tree/main/multi-turn-agents&lt;/a&gt;&lt;br&gt;
🎥Youtube: &lt;a href="https://youtu.be/S9iHpExFrTs" rel="noopener noreferrer"&gt;https://youtu.be/S9iHpExFrTs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this guide, we’ll explore the development of a &lt;strong&gt;multi-turn AI assistant application&lt;/strong&gt; using &lt;strong&gt;LLMs&lt;/strong&gt; and AI assistant integration. This application is designed to streamline complex workflows such as &lt;strong&gt;internet retrieval&lt;/strong&gt;, &lt;strong&gt;market research&lt;/strong&gt;, &lt;strong&gt;campaign generation&lt;/strong&gt;, and &lt;strong&gt;image creation&lt;/strong&gt;. Throughout this process, we will rely on &lt;strong&gt;Tune Studio&lt;/strong&gt; for AI model orchestration, and &lt;strong&gt;Streamlit&lt;/strong&gt; for the front-end user interface. The end goal is to create a fully automated assistant-led pipeline that performs end-to-end tasks by interacting with multiple AI assistants in a sequential manner—also known as a multi-turn workflow.&lt;/p&gt;


&lt;h3&gt;
  
  
  What is a Multi-Turn AI Assistant Application?
&lt;/h3&gt;

&lt;p&gt;In the context of AI and automation, a &lt;strong&gt;multi-turn assistant application&lt;/strong&gt; is one where multiple interactions (or "turns") are required to complete a task. The application maintains context throughout these turns and allows each assistant or model to perform specific sub-tasks in a coordinated manner. This approach contrasts with single-turn applications, where the AI assistant addresses a single user query without needing to track prior inputs or outputs.&lt;/p&gt;

&lt;p&gt;In this tutorial, the multi-turn approach allows AI assistants to collaborate across multiple steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Market Research Assistant&lt;/strong&gt; gathers data from the web.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics Assistant&lt;/strong&gt; processes the research and generates insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Campaign Generation Assistant&lt;/strong&gt; creates marketing strategies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Generation Assistant&lt;/strong&gt; produces a campaign poster.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each assistant plays a crucial role and passes context to the next in line, ensuring a smooth and coherent user experience.&lt;/p&gt;


&lt;h3&gt;
  
  
  What Are AI Assistants?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AI assistants&lt;/strong&gt; are digital agents powered by machine learning models that help users perform tasks, answer questions, and provide recommendations. Unlike co-pilots or AI agents, AI assistants focus on assisting with user-driven tasks, such as scheduling meetings, performing web searches, or, in our case, handling marketing tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F375blztvdwjnozw6bcxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F375blztvdwjnozw6bcxg.png" alt="Assistants v Copilots v Agents" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are three distinct categories of LLM-driven tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI Assistants&lt;/strong&gt;: Designed to respond to user commands and requests. Common examples include virtual assistants like &lt;strong&gt;Siri&lt;/strong&gt; or &lt;strong&gt;Alexa&lt;/strong&gt;, but they can also handle more specialized workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Co-Pilots&lt;/strong&gt;: These tools work alongside humans, helping improve tasks as they are being performed. Examples include &lt;strong&gt;Grammarly&lt;/strong&gt; for writing and &lt;strong&gt;GitHub Copilot&lt;/strong&gt; for coding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI Agents&lt;/strong&gt;: Autonomous agents that perform tasks without user input, such as &lt;strong&gt;ReACT&lt;/strong&gt; or &lt;strong&gt;Agentic Workflows&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our application, the &lt;strong&gt;AI assistants&lt;/strong&gt; are key players in achieving each part of the task while ensuring user control and input at every step. Now, let’s break down how we’ve integrated multiple assistants to create a seamless marketing and campaign generation tool.&lt;/p&gt;


&lt;h3&gt;
  
  
  Step-by-Step Breakdown of the Application
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6qiqsichvw1fipfyfd4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6qiqsichvw1fipfyfd4.png" alt="Application Flow" width="429" height="824"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  1. &lt;strong&gt;Performing Market Research with an AI Assistant&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;In this first step, the AI assistant is responsible for gathering relevant information from the internet. We use a &lt;strong&gt;Llama 3.1&lt;/strong&gt; model fine-tuned for research tasks to collect numerical data, trends, and insights from across the web.&lt;/p&gt;

&lt;p&gt;Here’s the core code for this assistant's function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_market_research_assistant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kargwalaryan/research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frequency_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function sends a user query to the &lt;strong&gt;Tune Studio&lt;/strong&gt; API, which uses a fine-tuned model to fetch relevant market research. The model acts as a &lt;strong&gt;subject matter expert&lt;/strong&gt; on the specific topic or product the user inquires about.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Analyzing Research and Creating Insights&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Once the data is gathered, the next assistant steps in to analyze the research. This assistant is run using &lt;strong&gt;Claude Sonnet&lt;/strong&gt;, a model known for its compliance, safety, and conversational adaptability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_analytics_assistant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Here is some market research data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;research_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Extract all the marketing insights and generate a campaign prompt.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are TuneStudio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-3.5-sonnet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frequency_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the &lt;strong&gt;Claude Sonnet&lt;/strong&gt; model processes the research and extracts stylistic and strategic insights that will inform the next step—campaign generation.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Generating the Marketing Campaign&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;For campaign generation, we need an assistant that not only understands the market analysis but can also create a compelling, structured campaign. The &lt;strong&gt;Claude Sonnet&lt;/strong&gt; model shines in this area, as it generates an engaging and compliant campaign strategy based on market trends.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_campaign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysis_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate a marketing campaign based on this analysis: {analysis_result}.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kargwalaryan/campaign-gen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frequency_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This assistant pulls from the insights gathered and creates a comprehensive campaign that could be deployed over the next few months.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;Image Generation for the Campaign Poster&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The final assistant in this pipeline generates a visual representation—a campaign poster—using &lt;strong&gt;GPT4o&lt;/strong&gt;. This model specializes in image creation tasks based on textual descriptions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_image_generation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysis_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate a campaign poster based on this analysis: {analysis_result}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kargwalaryan/image-gen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frequency_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model generates a creative campaign poster based on the strategy developed in the earlier steps, completing the entire marketing pipeline.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Use Multi-Turn Assistant Workflows?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Multi-turn workflows&lt;/strong&gt; allow for complex tasks to be broken into smaller, manageable operations, each handled by a specialized AI assistant. This ensures that the final output is not only accurate but also aligned with the user's overall goals.&lt;/p&gt;

&lt;p&gt;Some of the key advantages of multi-turn workflows include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context Retention&lt;/strong&gt;: The application retains context across different stages of the workflow. This allows each assistant to build upon the work of previous assistants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task Specialization&lt;/strong&gt;: Each assistant is optimized for a specific sub-task, ensuring higher performance in individual areas like research, analysis, campaign generation, and image creation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility and Customization&lt;/strong&gt;: You can easily modify or swap out assistants to suit different applications. For example, you could replace the market research assistant with one better suited to another industry or domain.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Creating a &lt;strong&gt;multi-turn AI assistant application&lt;/strong&gt; allows you to harness the power of multiple LLMs and assistants to handle complex tasks in a highly structured way. By breaking down tasks into distinct stages and integrating models like &lt;strong&gt;Llama 3.1&lt;/strong&gt;, &lt;strong&gt;Claude Sonnet&lt;/strong&gt;, and &lt;strong&gt;GPT4o&lt;/strong&gt;, you can build intelligent, autonomous pipelines that help users with everything from market research to visual content creation.&lt;/p&gt;

&lt;p&gt;This approach is ideal for applications where tasks need to be completed in a step-by-step manner while maintaining context across all steps.&lt;/p&gt;

&lt;p&gt;Let me know if you have any questions or suggestions for further improvement! Stay tuned for more advanced tutorials on LLMs and VLMs.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>llm</category>
      <category>streamlit</category>
      <category>python</category>
    </item>
    <item>
      <title>Doing Multihop on HotPotQA Using Qwen 2.5 72B</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Thu, 26 Sep 2024 14:46:21 +0000</pubDate>
      <link>https://forem.com/tunehqai/doing-multihop-on-hotpotqa-using-qwen-25-72b-43a1</link>
      <guid>https://forem.com/tunehqai/doing-multihop-on-hotpotqa-using-qwen-25-72b-43a1</guid>
      <description>&lt;p&gt;When dealing with complex question-answering tasks, a single-hop retrieval approach might not be enough. Questions often require synthesizing information from multiple sources. That’s where &lt;strong&gt;MultiHop Question Answering (QA)&lt;/strong&gt; comes into play, requiring more advanced tools for retrieval and reasoning. In this post, I’ll describe how I built a multi-hop QA pipeline using &lt;strong&gt;DSPy&lt;/strong&gt;, &lt;strong&gt;ColBERT&lt;/strong&gt;, &lt;strong&gt;TuneAPI&lt;/strong&gt;, and &lt;strong&gt;Qwen 2.5 72B&lt;/strong&gt; to handle multi-step reasoning over a knowledge base.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Key Tools
&lt;/h2&gt;

&lt;p&gt;Before diving into the code, let’s first break down the key tools and libraries that power this pipeline:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;DSPy&lt;/strong&gt; (Data Structure Processing)
&lt;/h3&gt;

&lt;p&gt;DSPy is a Python library that helps structure multi-step processes for tasks like retrieval-augmented generation and multi-hop question answering. It allows us to define a clear, modular flow for handling complex information retrieval tasks and integrate language models effectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;ColBERT&lt;/strong&gt; (Contextualized Late Interaction over BERT)
&lt;/h3&gt;

&lt;p&gt;ColBERT is a dense retrieval model designed to retrieve passages efficiently from large corpora. It works by encoding both the query and documents in a low-dimensional space and comparing them to find relevant matches. For multi-hop QA, ColBERT helps identify the most pertinent passages to answer complex questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;TuneAPI&lt;/strong&gt; (API Proxy for LLMs)
&lt;/h3&gt;

&lt;p&gt;TuneAPI acts as a proxy API to interact with LLMs such as Qwen. This lets us access the powerful inference capabilities of LLMs and customize how they process inputs and generate responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Qwen 2.5 72B&lt;/strong&gt; (Alibaba’s Vision-Language Model)
&lt;/h3&gt;

&lt;p&gt;Qwen 2.5 72B is a state-of-the-art large language model developed by Alibaba. While it’s primarily known for its vision-language tasks, Qwen excels in natural language reasoning, making it a great choice for multi-hop QA tasks where nuanced reasoning over text is required.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;HotPotQA&lt;/strong&gt; (Dataset for Multi-Hop QA)
&lt;/h3&gt;

&lt;p&gt;HotPotQA is a dataset designed specifically for multi-hop question answering. It contains questions that require information from multiple documents to arrive at an accurate answer, making it ideal for training and evaluating multi-hop QA systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up the MultiHopQA Pipeline
&lt;/h2&gt;

&lt;p&gt;The goal here is to build an end-to-end pipeline that can retrieve relevant documents using &lt;strong&gt;ColBERT&lt;/strong&gt;, pass the retrieved contexts to &lt;strong&gt;Qwen 2.5 72B&lt;/strong&gt; for reasoning, and finally output the predicted answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Walkthrough
&lt;/h3&gt;

&lt;p&gt;Let’s break down the process into manageable steps. Here’s the code for building the pipeline:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Importing Required Libraries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dsp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dspy.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HotPotQA&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dsp.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;deduplicate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We start by importing necessary libraries: &lt;strong&gt;DSPy&lt;/strong&gt; for data handling, &lt;strong&gt;ColBERT&lt;/strong&gt; for retrieval, and &lt;strong&gt;requests&lt;/strong&gt; to interact with the &lt;strong&gt;TuneAPI&lt;/strong&gt;. The &lt;code&gt;HotPotQA&lt;/code&gt; dataset is also loaded to provide multi-hop questions for the pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Creating a Custom Language Model Client
&lt;/h3&gt;

&lt;p&gt;To use Qwen, we need a custom class to handle API requests. We interact with Qwen via the TuneAPI to submit prompts and retrieve responses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomLMClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LM&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://proxy.tune.app/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;basic_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are TuneStudio, answer the question based on the context given to you.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frequency_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frequency_penalty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;custom_lm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CustomLMClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;qwen/qwen-2.5-72b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This class wraps around the Qwen model and formats the prompt, handles API communication, and processes the response. The &lt;code&gt;basic_request&lt;/code&gt; method takes care of sending requests to the &lt;strong&gt;TuneAPI&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Configuring Retrieval and Language Model
&lt;/h3&gt;

&lt;p&gt;Next, we configure &lt;strong&gt;ColBERT&lt;/strong&gt; and set up DSPy to use our custom language model client for inference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;colbertv2_wiki17_abstracts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ColBERTv2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://20.102.90.50:2017/wiki17_abstracts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;custom_lm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;colbertv2_wiki17_abstracts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;strong&gt;ColBERTv2&lt;/strong&gt; retrieves relevant Wikipedia abstracts. These abstracts will be passed to the language model for deeper reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Loading HotPotQA Dataset
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HotPotQA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dev_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trainset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;devset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_inputs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dev&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We load a small subset of the &lt;strong&gt;HotPotQA&lt;/strong&gt; dataset for testing. This dataset will provide multi-hop questions for the pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Simplified Baleen for Multi-Hop Retrieval
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Simplified Baleen&lt;/strong&gt; class handles the multi-hop retrieval process. It repeatedly retrieves passages, feeds them into the language model, and finally generates an answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimplifiedBaleen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lm_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;passages_per_hop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_hops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieve&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;passages_per_hop&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_hops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_hops&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lm_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lm_client&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; Context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;context_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Given the following information: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Answer the question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lm_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_hops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;passages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;passages&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deduplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;passages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Prediction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core of our pipeline. It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates queries based on previously retrieved context.&lt;/li&gt;
&lt;li&gt;Retrieves relevant documents using &lt;strong&gt;ColBERT&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Passes the final context to &lt;strong&gt;Qwen&lt;/strong&gt; to generate the answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Running the Pipeline
&lt;/h3&gt;

&lt;p&gt;We define a question and pass it through the pipeline to retrieve the answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;my_question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What position on the Billboard Top 100 did Alison Moyet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s late summer hit achieve?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;uncompiled_baleen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimplifiedBaleen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lm_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;custom_lm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;uncompiled_baleen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;my_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Predicted Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieved Contexts (truncated): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pred&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This question is answered using multiple passages retrieved in successive hops and is reasoned over by &lt;strong&gt;Qwen 2.5 72B&lt;/strong&gt;. The final answer is printed alongside the retrieved contexts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project highlights the growing importance of &lt;strong&gt;multi-hop question answering&lt;/strong&gt; and how combining modern tools like &lt;strong&gt;ColBERT&lt;/strong&gt; for retrieval and &lt;strong&gt;Qwen&lt;/strong&gt; for reasoning can provide powerful solutions. By leveraging datasets like &lt;strong&gt;HotPotQA&lt;/strong&gt;, it’s easier to experiment and fine-tune these pipelines for real-world QA systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Future Plans:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Experiment with more retrieval-augmented generation tasks.&lt;/li&gt;
&lt;li&gt;Extend this pipeline to support more languages and domain-specific datasets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For more NLP tutorials and walkthroughs&lt;/strong&gt;, feel free to check out my &lt;a href="https://www.youtube.com/@AryanKargwal" rel="noopener noreferrer"&gt;YouTube Channel&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>qwen</category>
      <category>tutorial</category>
      <category>tunestudio</category>
    </item>
    <item>
      <title>Boss Llama: Building a Smart Interview Simulator using Llama 3.1 70B</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Thu, 22 Aug 2024 17:22:20 +0000</pubDate>
      <link>https://forem.com/tunehqai/boss-llama-building-a-smart-interview-simulator-using-llama-31-70b-egi</link>
      <guid>https://forem.com/tunehqai/boss-llama-building-a-smart-interview-simulator-using-llama-31-70b-egi</guid>
      <description>&lt;p&gt;Moving a bit further from writing and exploring the world of LLMs as just a consumer of the vast knowledge and context awareness, I took it upon myself to build a product out of these tools. Boss Llama is an interactive intelligent interview simulator on Meta's &lt;a href="https://llama.meta.com/" rel="noopener noreferrer"&gt;Llama 3.1 70B&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although not a novel implementation, I hope my tutorial helps you understand the various API calls and functions involved in processing chat, inputs, and data files in a Streamlit Web Application using &lt;a href="https://studio.tune.app/playground" rel="noopener noreferrer"&gt;Tune Studio&lt;/a&gt; to deploy our model and set API gateways for inference. So, let's look at how you can replicate the same results.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model: Llama 3.1 70B
&lt;/h2&gt;

&lt;p&gt;An Open Source LLM, Meta's Llama 3.1 70B, has good reasons to become my choice for the product. By improving the context window from 8K to 128K from its predecessor, Llama 3 70B, the model also brings a more significant threshold for output tokens, 4096, over the previous 2048 context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm08cwjym5wvx23vu0sy1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm08cwjym5wvx23vu0sy1.png" alt="Quality Comparison" width="486" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tvnuevo5di4vgnnx9ad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tvnuevo5di4vgnnx9ad.png" alt="Speed Comparison" width="481" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx5yf5lstktb3mwc5zrz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx5yf5lstktb3mwc5zrz.png" alt="Price Comparison" width="483" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analysis taken from &lt;a href="https://artificialanalysis.ai/models/llama-3-1-instruct-70b" rel="noopener noreferrer"&gt;Aritificial Analysis&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Earlier, however, I had tried implementing a locally hosted Llama 3.1 8B for the task, which unfortunately lacks the context windows expected off of the use-case, which typically involves more extended exchanges averaging about 1500-2000 words or 2000+ tokens. So let us now check out how you can also make such an application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;Regarding the implementation, we have chosen &lt;a href="https://streamlit.io/" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt; as the base of our operations, helping us tie up the outputs generated by the API calls to a chat interface. Unfortunately, the larger model asks for a higher VRAM, which I have chosen to fulfill using Tune Studio's API Calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq64wwop4t1chbb9ftm6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq64wwop4t1chbb9ftm6u.png" alt="Boss Llama UI" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While discussing the code, however, we will skip over the Streamlit part and focus on its API aspect. If you wish to see how I implement that, check out my video tutorial on &lt;a href="https://youtu.be/cAHipN7CQwE?si=UBm2-Avyy4Bpx8E8" rel="noopener noreferrer"&gt;YouTube here&lt;/a&gt; or head over to the &lt;a href="https://github.com/aryankargwal/boss_llama" rel="noopener noreferrer"&gt;Github Repository here&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;For the upcoming code, here are some essential variables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conversation: A session state variable holding all the conversation exchanges between the bot and the user.&lt;/li&gt;
&lt;li&gt;Difficulty: The difficulty of the simulated interview&lt;/li&gt;
&lt;li&gt;API Key: Your Tune Studio API Key&lt;/li&gt;
&lt;li&gt;Max Questions: Number of Questions in the Interview&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  System Prompt
&lt;/h3&gt;

&lt;p&gt;Finetuning is the best way to get a model to perform exactly how you want it to; well, I went for the next best thing: a thoughtful and thorough system prompt. While choosing the system prompt, we should be detailed with our requirements, as the model tends to meander and hallucinate if such instructions are not given.&lt;/p&gt;

&lt;p&gt;The latest adversarial training on the modern Llama models also allows us to pass such a system prompt, avoiding any prompt leakage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are Boss Llama, an advanced AI interviewer. Your goal is to conduct a comprehensive and intelligent interview with the candidate based on the following details:

1. Job Description: {st.session_state.job_description}
2. Difficulty Level: {difficulty}
3. Number of Questions: {st.session_state.max_questions}

Instructions:
1. Start the Interview:
   - Begin by presenting a detailed job description for the role based on the difficulty level and the provided job description. Try to keep this introduction small and to the point as the user already knows what they are interviewing for.
   - Provide a warm welcome message to start the interview and set a positive tone.
2. Generate and Ask Questions:
   - Ask a series of questions, up to the specified number of questions. Ensure these questions are relevant to the job description and appropriately challenging according to the difficulty level.
   - Provide clear and concise prompts that assess the candidate's skills, knowledge, and fit for the role.

3. Conduct the Interview:
  - Engage with the candidate in a conversational manner. If the candidate's responses are vague or insufficient, let them know about it and give them a chance to improve, but count it as one more question.
   - Maintain a professional and supportive tone throughout the interview.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Generating Responses
&lt;/h3&gt;

&lt;p&gt;We will generate the responses and the conversation using a curl command from Tune Studio. The command is a simple way of linking your current working code with a model of your choice on Tune Studio, which holds an amazingly massive library of free models for inference and even more advanced models with custom fine-tuning and deploying practices for hard-core enthusiasts.&lt;/p&gt;

&lt;p&gt;The variable "conversation," which incrementally holds the ongoing conversation, is called every time to create a response that adds to the existing discussion.&lt;/p&gt;

&lt;p&gt;With parameters such as temperature, frequency_penaly, and max_tokens, we can also tweak the quality of responses, further enhancing the feeling of being interviewed by a proper interviewer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Function to call the API
def generate_response(conversation, apikey):
    url = "https://proxy.tune.app/chat/completions"
    headers = {
        "Authorization": apikey,  # Your API key
        "Content-Type": "application/json"
    }

    # Construct the payload for the API call
    payload = {
        "temperature": 0.9,
        "messages": conversation,
        "model": "meta/llama-3.1-70b-instruct",
        "stream": False,
        "frequency_penalty": 0.2,
        "max_tokens": 500
    }

    # Send the POST request to the API
    response = requests.post(url, headers=headers, data=json.dumps(payload))

    # Check if the request was successful
    if response.status_code == 200:
        # Extract the response from the JSON output
        return response.json()["choices"][0]["message"]["content"]
    else:
        return f"Error: {response.status_code} - {response.text}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Generate Evaluations
&lt;/h3&gt;

&lt;p&gt;For the evaluations, we are using a similar API call. We only pass the individual exchanges from the bot and the user to run an assessment using a suitable system prompt.&lt;/p&gt;

&lt;p&gt;This second call activates the interviewer's harsher and more intense side. This new call then looks at the conversation from a third perspective and assigns feedback and a score that feeds back into the web application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Function to generate evaluations on the interview
def generate_evaluation(question, answer, difficulty, apikey):
    url = "https://proxy.tune.app/chat/completions"
    headers = {
        "Authorization": apikey,
        "Content-Type": "application/json"
    }

    payload = {
        "temperature": 0.7,
        "messages": [
            {"role": "system", "content": f"Evaluate the following answer based on the job description difficulty level: {difficulty}."},
            {"role": "user", "content": f"Question: {question}\nAnswer: {answer}"}
        ],
        "model": "meta/llama-3.1-70b-instruct",
        "stream": False,
        "frequency_penalty": 0.2,
        "max_tokens": 500
    }

    try:
        response = requests.post(url, headers=headers, data=json.dumps(payload))
        response.raise_for_status()
        result = response.json()
        feedback = result.get("choices", [{}])[0].get("message", {}).get("content", "No feedback provided")
        score = result.get("choices", [{}])[0].get("score", 0)
        return feedback, score
    except requests.RequestException as e:
        return f"Error: {e}", 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Download PDF Report
&lt;/h3&gt;

&lt;p&gt;Well, finally, what good is progress in AI? Suppose all we are doing is asking it to write our assignment. In that case, the final function links the outputs generated by the Evaluation function to FPDF to generate a plausible and downloadable PDF for the evaluation.&lt;/p&gt;

&lt;p&gt;What is FPDF, you ask? FPDF, initially a PHP class, is a library used for PDF document generation under Python. Compared to other jargon available online, FPDF provides a more streamlined and straightforward way to generate PDFs. (It also allows us to add PNGs, JPEGs, and GIFs to the PDF, which should be a boon if we wish to add systems to include tacky diagrams in the report.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Function to generate a PDF report
def generate_pdf_report(evaluations):
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)

    pdf.cell(0, 10, "Interview Evaluation Report", ln=True, align="C")
    pdf.ln(10)  # Add a line break

    for evaluation in evaluations:
        pdf.set_font("Arial", style='B', size=12)
        pdf.multi_cell(0, 10, evaluation["Question"])
        pdf.set_font("Arial", size=12)
        pdf.multi_cell(0, 10, evaluation["Answer"])
        pdf.multi_cell(0, 10, f"Feedback: {evaluation['Feedback']}")
        pdf.multi_cell(0, 10, f"Score: {evaluation['Score']}")
        pdf.ln(5)  # Add a line break

    # Save the PDF to a temporary file
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".pdf")
    pdf.output(temp_file.name)
    return temp_file.name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Here, we saw an implementation of Llama 3.1 70B in a Streamlit Web Application to simulate brilliant interviews using a chat interface. Although the final product lacks some accessibility features such as TTS or STT, it shows great promise in the way even a model that is not fine-tuned can operate using just system prompts.&lt;/p&gt;

</description>
      <category>streamlit</category>
      <category>tunestudio</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to do efficient fine-Tuning for LLMs using SLoRA</title>
      <dc:creator>Abhishek Mishra</dc:creator>
      <pubDate>Wed, 07 Aug 2024 08:23:17 +0000</pubDate>
      <link>https://forem.com/tunehqai/how-to-do-efficient-fine-tuning-for-llms-using-slora-1a27</link>
      <guid>https://forem.com/tunehqai/how-to-do-efficient-fine-tuning-for-llms-using-slora-1a27</guid>
      <description>&lt;p&gt;You're a passionate AI developer, eager to try to utilize the power of large language models (LLMs) for your latest project. You've got brilliant ideas, but there's a catch – fine-tuning these massive models feels like trying to parallel park a cruise ship in a crowded marina. It's resource-intensive, time-consuming, and frankly, a bit overwhelming.&lt;br&gt;
Sound familiar? You're not alone in this boat.&lt;/p&gt;

&lt;p&gt;Before we dive deeper, let's clarify what we mean by fine-tuning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fine-tuning is the process of further training a pre-trained model on a specific task or dataset to adapt its knowledge for a particular application.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The world of AI has been grappling with a significant challenge: how to efficiently fine-tune LLMs without breaking the bank or melting your hardware – we all are GPU poor except NVIDIA. &lt;br&gt;
Traditional fine-tuning methods are like using a sledgehammer to crack a nut, – they get the job done, but at what cost?&lt;br&gt;
Enter &lt;strong&gt;SLoRA – Sparse Low-Rank Adaptation&lt;/strong&gt;, – the unsung hero for efficient model fine-tuning.&lt;/p&gt;

&lt;p&gt;Fine-tuning large language models (LLMs) has become a crucial step in achieving state-of-the-art results in various natural language processing (NLP) tasks. However, this process often comes with significant computational costs, memory requirements, and time constraints. In this article, we'll explore SLoRA, a novel approach to efficient model fine-tuning that promises to revolutionize the way we work with LLMs. By leveraging sparse low-rank adaptation, SLoRA offers a faster, more cost-effective, and more sustainable way to fine-tune LLMs without sacrificing performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is SLoRA?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxznk7b7h7xiys96eug5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxznk7b7h7xiys96eug5d.png" alt="image alt" width="800" height="595"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To understand SLoRA, it's essential to first grasp the concept of &lt;a href="https://arxiv.org/pdf/2106.09685" rel="noopener noreferrer"&gt;Low-Rank Adaptation (LoRA)&lt;/a&gt;. LoRA is a parameter-efficient fine-tuning method that updates only a small subset of model parameters while keeping the rest frozen. This approach has shown remarkable success in reducing computational costs and memory requirements for model fine-tuning.&lt;/p&gt;

&lt;p&gt;However, LoRA still updates a relatively large number of parameters, which can be computationally expensive and memory-intensive. This is where SLoRA comes in – by introducing sparsity into the LoRA framework, SLoRA further reduces the number of updated parameters to approximately 1% of the original model's parameters. This drastic reduction in updated parameters leads to significant computational savings and faster convergence rates&lt;br&gt;
SLoRA stands for Sparse Low-Rank Adaptation, a method designed to enhance the efficiency of fine-tuning LLMs. It builds on the concept of LoRA, which constrains the update of pre-trained weights using low-rank decomposition. SLoRA introduces sparsity into this approach, focusing only on a subset of parameters that significantly impact the model's performance.&lt;/p&gt;

&lt;p&gt;Here's a simple analogy -- Imagine you have a giant jigsaw puzzle. LoRA would focus on updating a specific section of the puzzle, while SLoRA would pinpoint only the most important pieces within that section.&lt;/p&gt;

&lt;h2&gt;
  
  
  How SLoRA Works?
&lt;/h2&gt;

&lt;p&gt;SLoRA employs a sparse matrix approach where the weight updates are constrained to a low-rank format and further sparsified. This involves decomposing the weight matrix into a product of two smaller matrices and applying updates only to a sparse subset of the original parameters. This method reduces the density of updates to about 1%, significantly cutting down on the resources needed for training.&lt;/p&gt;

&lt;p&gt;Think of a weight matrix in an LLM as a giant grid of numbers. These numbers represent the connections between different parts of the model. SLoRA uses a technique called sparse matrix decomposition to break down this grid into smaller, more manageable pieces.&lt;/p&gt;

&lt;p&gt;Imagine slicing a pizza into smaller triangles. SLoRA only updates the toppings on a select few of those triangles, leaving the rest untouched. This dramatically reduces the amount of data we need to process and store.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FwJ9uLUf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FwJ9uLUf.png" alt="image alt" width="800" height="1898"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison with Traditional Fine-Tuning Methods
&lt;/h3&gt;

&lt;p&gt;Traditional fine-tuning involves updating all model parameters, which is resource-intensive and time-consuming. In contrast, SLoRA updates only a small, significant subset of parameters, achieving similar performance with much lower computational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SLoRA in the nutshell:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SLoRA builds on the foundation of LoRA (Low-Rank Adaptation), which already constrains updates to a small subset of parameters.&lt;/li&gt;
&lt;li&gt;It takes this a step further by introducing sparsity – focusing on an even smaller, more crucial set of parameters.&lt;/li&gt;
&lt;li&gt;The result? You're updating only about 1% of the model's parameters, dramatically reducing computational load and memory requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2308.06522v1" rel="noopener noreferrer"&gt;&lt;strong&gt;You can read more here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Computational Requirements&lt;/strong&gt;: By updating only a sparse subset of parameters, SLoRA dramatically lowers the computational load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster Fine-Tuning Process&lt;/strong&gt;: The sparse updates enable quicker convergence, speeding up the fine-tuning process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower Memory Usage&lt;/strong&gt;: The reduced number of updates translates to lower memory requirements, making it feasible to deploy models on devices with limited memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Potential for Improved Model Performance&lt;/strong&gt;: Efficient parameter updates can lead to models that are not only faster but also potentially more robust and adaptable to specific tasks.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to use SLoRA in Your Projects?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Initialize the Sparse Matrix&lt;/strong&gt;: Begin by setting up the sparse matrix with low-rank decomposition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apply Sparse Updates&lt;/strong&gt;: Update only the critical parameters as identified by the sparsity constraints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Train the Model&lt;/strong&gt;: Proceed with the fine-tuning process, leveraging the computational efficiency of SLoRA.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;For more detailed guide checkout this paper: &lt;a href="https://arxiv.org/pdf/2308.06522" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2308.06522&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  SLoRA's Secret Sauce:
&lt;/h3&gt;

&lt;p&gt;Here's where SLoRA really flexes its muscles. By dramatically reducing the number of parameters updated during fine-tuning, SLoRA processes fewer tokens. It's like having a car that can drive the same distance using only a fraction of the fuel.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fewer tokens processed = Lower costs&lt;/li&gt;
&lt;li&gt;Efficient updates = More bang for your buck&lt;/li&gt;
&lt;li&gt;Optimized token usage = Stretch your budget further&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's say you're fine-tuning a model on a specific task. Traditional methods might process millions of tokens, updating every parameter. With SLoRA, you're looking at a fraction of that - potentially cutting your token usage (and thus, your costs)!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SLoRA represents a significant leap forward in efficient model fine-tuning. It empowers developers to unlock the full potential of LLMs while being mindful of resources and costs.&lt;/p&gt;

&lt;p&gt;Ready to give SLoRA a try? At &lt;a href="https://tunehq.ai/developers" rel="noopener noreferrer"&gt;Tune AI&lt;/a&gt;, we implemented this approach to, make models mode accessible and provide an efficient way to fine-tune and serve LLMs.&lt;br&gt;
Head over to &lt;a href="https://studio.tune.app/" rel="noopener noreferrer"&gt;Tune Studio&lt;/a&gt; and start exploring! Share your experiences and let's build a more accessible and sustainable future for AI together.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>machinelearning</category>
      <category>finetune</category>
      <category>genai</category>
    </item>
    <item>
      <title>LMQL, AAAL Pt.6</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Fri, 02 Aug 2024 04:00:00 +0000</pubDate>
      <link>https://forem.com/tunehqai/lmql-aaal-pt6-22ib</link>
      <guid>https://forem.com/tunehqai/lmql-aaal-pt6-22ib</guid>
      <description>&lt;p&gt;In my journey to enhance adversarial robustness in LLMs, I explored LMQL (Language Model Query Language). This tool is a programming language that allows seamless integration of LLM interaction into program code, providing a structured way to manage model inputs and outputs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmk2snvij63y5rq44xus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmk2snvij63y5rq44xus.png" alt=" " width="620" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LMQL stands out by enabling developers to specify constraints and rules directly within their code. This feature is crucial for preventing adversarial attacks such as prompt injection and token manipulation. By defining strict constraints, developers can ensure that the model processes only valid and safe inputs, reducing the risk of malicious manipulations.&lt;/p&gt;

&lt;p&gt;Additionally, LMQL supports dynamic control over model interactions. Developers can programmatically adjust the model’s behavior based on real-time input validation and monitoring. This flexibility allows for quick responses to potential adversarial attacks, enhancing the overall security of the LLM.&lt;/p&gt;

&lt;p&gt;Another advantage of LMQL is its ability to integrate with existing guardrail tools. For example, combining LMQL with Llama Guard or Nvidia NeMo Guardrails can create a multi-layered defense system. This integration allows for more robust input validation, ethical content generation, and comprehensive logging and monitoring.&lt;/p&gt;

&lt;p&gt;LMQL also facilitates better transparency and explainability. By embedding model interactions within the code, developers can easily trace and audit the model’s decision-making process. This transparency is vital for identifying and mitigating adversarial attacks, ensuring the model’s outputs are trustworthy and reliable.&lt;/p&gt;

&lt;p&gt;In conclusion, LMQL offers a powerful and flexible solution for enhancing the security of LLMs. Its ability to define constraints, dynamic control, and integration with existing tools makes it a valuable addition to any adversarial robustness strategy. Stay tuned for more insights into practical implementations of these tools in real-world applications.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>openai</category>
      <category>developer</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>Theoretical Limits and Scalability of Extra-LLMs: Do You Need Llama 405B</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Wed, 31 Jul 2024 15:30:00 +0000</pubDate>
      <link>https://forem.com/tunehqai/theoretical-limits-and-scalability-of-extra-llms-do-you-need-llama-300b-2nke</link>
      <guid>https://forem.com/tunehqai/theoretical-limits-and-scalability-of-extra-llms-do-you-need-llama-300b-2nke</guid>
      <description>&lt;p&gt;With the imminent release of Llama 3 405B, the AI community is abuzz with anticipation. Having recently explored this topic in a detailed blog post, I wanted to share some key takeaways on the scale, theoretical limits, and practical scalability of such colossal models. While Meta’s claims about Llama 3 405B’s performance are intriguing, it’s essential to understand what this model’s scale truly means and who stands to benefit most from it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Understanding the Scale
&lt;/h4&gt;

&lt;p&gt;The "400B" in Llama 3 405B signifies the model’s vast parameter count—405 billion to be exact. This immense scale allows the model to capture intricate patterns and nuances within data, theoretically enabling it to outperform smaller models in understanding and processing complex information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcr296s2x67woa0r2ycn5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcr296s2x67woa0r2ycn5.png" alt="Parameter Comparison of LLMs" width="616" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Theoretical Limits
&lt;/h4&gt;

&lt;p&gt;Training a model of this magnitude involves significant resources. For perspective, GPT-4 required around $64 million and 25,000 Nvidia GPUs over 100 days for training. It’s expected that Llama 3 400B will come with similarly daunting costs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83axyqy4fd72xhtfyyrr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83axyqy4fd72xhtfyyrr.png" alt="Electrcity Consumption for GPT-4" width="604" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The escalating costs and resource demands raise questions about the sustainability of pushing model sizes to the extreme. While advancements in model scale are exciting, the practical benefits and cost-effectiveness need careful consideration. For many, optimizing smaller models might offer a more balanced approach.&lt;/p&gt;

&lt;h4&gt;
  
  
  Practical Scalability Issues
&lt;/h4&gt;

&lt;p&gt;Deploying such massive models comes with its own set of challenges. The high costs of training, maintaining, and running these models often lead to diminishing returns. For instance, managing VRAM consumption for inference in models like GPT-4 requires substantial hardware resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0j07tbwcd9fd1yl8283.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0j07tbwcd9fd1yl8283.png" alt="Its Cheaper" width="615" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The practical issues associated with deploying extra-large models highlight the importance of evaluating the cost versus performance trade-offs. Smaller, well-optimized models might provide similar results at a fraction of the cost and complexity.&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;p&gt;The primary users of these models are likely to be large organizations with the resources to support their high costs. These include tech giants, research institutions, and financial firms that need cutting-edge performance for products, search engines, virtual assistants, and recommendation systems.&lt;/p&gt;

&lt;p&gt;For most individual users and smaller companies, exploring smaller, fine-tuned models might be more practical. Models such as Qwen 2 72B or Mistral 7B offer impressive results without the hefty price tag, making them viable alternatives for many applications.&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;p&gt;In my recent blog post, I delved into the technical and financial challenges associated with extra-large language models. While Llama 3 400B represents a significant leap in AI capabilities, it’s essential to balance ambition with practicality. For many, well-trained, fine-tuned models might offer the best balance between performance and cost. &lt;/p&gt;

&lt;p&gt;As AI continues to evolve, navigating the landscape of trade-offs between model size, performance, and cost remains crucial. For a deeper understanding of these dynamics, my blog post provides additional insights and practical advice.&lt;/p&gt;

&lt;h4&gt;
  
  
  Further Reading
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://developer.nvidia.com/blog/demystifying-ai-inference-deployments-for-trillion-parameter-large-language-models/" rel="noopener noreferrer"&gt;Demystifying AI Inference Deployments for Trillion Parameter Large Language Models&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://neptune.ai/blog/nlp-models-infrastructure-cost-optimization" rel="noopener noreferrer"&gt;Deploying Large NLP Models: Infrastructure Cost Optimization&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tunehq.ai/blog/finetuning-llms" rel="noopener noreferrer"&gt;Finetuning LLMs on Tune Studio&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llama3</category>
      <category>chatgpt</category>
      <category>llm</category>
    </item>
    <item>
      <title>Guardrails AI, AAAL Pt.5</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Mon, 29 Jul 2024 04:00:00 +0000</pubDate>
      <link>https://forem.com/tunehqai/guardrails-ai-aaal-pt5-35im</link>
      <guid>https://forem.com/tunehqai/guardrails-ai-aaal-pt5-35im</guid>
      <description>&lt;p&gt;As I explored the landscape of adversarial robustness in LLMs, Guardrails AI stood out for its open-source approach to building responsible and reliable AI applications. This toolkit is designed to ensure that LLMs operate within defined safety and ethical parameters, addressing the vulnerabilities that adversarial attacks exploit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vg68u0bismou184qhmv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7vg68u0bismou184qhmv.png" alt="Guardrails AI Architecture" width="475" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Guardrails AI offers a comprehensive suite of tools for validating and filtering inputs and outputs. One of its key features is the ability to set custom guardrails that prevent harmful or biased content generation. By defining these guardrails, developers can ensure their models adhere to ethical standards and provide reliable outputs.&lt;/p&gt;

&lt;p&gt;A significant aspect of Guardrails AI is its focus on transparency and explainability. The toolkit includes mechanisms to log and monitor the model’s behavior, providing insights into how and why certain decisions are made. This transparency is crucial for identifying and mitigating potential adversarial attacks, as it allows for continuous assessment and improvement of the model’s security measures.&lt;/p&gt;

&lt;p&gt;Guardrails AI also emphasizes community collaboration. As an open-source project, it encourages contributions and feedback from the AI community, fostering a collaborative environment for developing robust and secure AI applications. This community-driven approach ensures that the toolkit evolves with emerging threats and incorporates the latest advancements in adversarial robustness.&lt;/p&gt;

&lt;p&gt;In conclusion, Guardrails AI offers a robust framework for building responsible and reliable LLM applications. Its emphasis on ethical standards, transparency, and community collaboration makes it a valuable tool for enhancing the security and trustworthiness of language models. Stay tuned for the next part of this series, where I will delve into specific techniques and case studies of implementing Guardrails AI in real-world applications.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>rag</category>
      <category>gpt3</category>
      <category>openai</category>
    </item>
    <item>
      <title>Nvidia NeMo, AAAL Pt.4</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Fri, 26 Jul 2024 04:00:00 +0000</pubDate>
      <link>https://forem.com/tunehqai/nvidia-nemo-aaal-pt4-2e77</link>
      <guid>https://forem.com/tunehqai/nvidia-nemo-aaal-pt4-2e77</guid>
      <description>&lt;p&gt;Continuing my journey into the world of adversarial robustness in LLMs, I discovered Nvidia NeMo Guardrails. This toolkit offers a programmable approach to adding safety and compliance measures to LLM-based applications, addressing various adversarial attack vectors.&lt;/p&gt;

&lt;p&gt;NeMo Guardrails provide a flexible and customizable solution for enhancing the security of language models. One of its key features is the ability to define and enforce specific rules for model behavior. These rules can filter out harmful or malicious inputs, ensuring that the model operates within safe parameters. This rule-based approach is particularly effective against prompt injection attacks, where malicious prompts aim to alter the model’s output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvna8vskihh80er7afmt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvna8vskihh80er7afmt.png" alt="Nvidia NeMo Architecture" width="623" height="203"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition to rule-based filtering, NeMo Guardrails supports advanced monitoring and logging capabilities. These features allow for real-time detection and response to adversarial attacks, providing immediate protection against potential threats. By continuously monitoring the model’s inputs and outputs, NeMo Guardrails can identify suspicious activity and take corrective action promptly.&lt;/p&gt;

&lt;p&gt;Another significant advantage of NeMo Guardrails is its focus on ethical AI. The toolkit includes features to prevent the generation of biased or harmful content, ensuring that the model adheres to ethical standards. This is crucial for maintaining user trust and avoiding potential legal or reputational issues.&lt;/p&gt;

&lt;p&gt;NeMo Guardrails also prioritizes data security. The toolkit includes mechanisms to prevent data leakage, and protect sensitive information from being exposed. This is particularly important for applications that handle confidential or personal data, ensuring that user privacy is maintained.&lt;/p&gt;

&lt;p&gt;Overall, Nvidia NeMo Guardrails offers a powerful and flexible solution for enhancing the safety and reliability of LLMs. Its programmable approach, combined with advanced monitoring and ethical safeguards, makes it an essential tool for building robust and secure language models. Stay tuned for the next part of this series, where I will explore more tools and techniques for achieving adversarial robustness in LLMs.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AWS Sagemaker vs AWS Bedrock, Just an AI Tool or more?</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Wed, 24 Jul 2024 15:30:00 +0000</pubDate>
      <link>https://forem.com/tunehqai/aws-sagemaker-vs-aws-bedrock-just-an-ai-tool-or-more-2j83</link>
      <guid>https://forem.com/tunehqai/aws-sagemaker-vs-aws-bedrock-just-an-ai-tool-or-more-2j83</guid>
      <description>&lt;p&gt;In today’s rapidly evolving AI landscape, the demands of modern Generative AI (GenAI) models have outpaced traditional training and deployment pipelines. This has necessitated robust, scalable, and user-friendly tools that simplify the development of GenAI models. Enter AWS Bedrock, Amazon’s latest foray into GenAI, and the well-established AWS Sagemaker. While both services aim to streamline AI development, they cater to different needs and audiences. Let’s dive into the details and see which service might be the catalyst for your next AI project.&lt;/p&gt;

&lt;h4&gt;
  
  
  AWS Bedrock
&lt;/h4&gt;

&lt;p&gt;AWS Bedrock is Amazon’s fully managed service designed to simplify the integration and deployment of GenAI models. It boasts a selection of top-gen AI and foundational models, all accessible via a single API. This ease of use is a boon for developers who want to build secure, private, and responsible AI applications without delving into infrastructure management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt1f066xw3v4ouhb9gdr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpt1f066xw3v4ouhb9gdr.png" alt=" " width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simplified integration with a unified API&lt;/li&gt;
&lt;li&gt;Access to popular foundational models&lt;/li&gt;
&lt;li&gt;Focus on security and responsible AI principles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less control over the underlying infrastructure&lt;/li&gt;
&lt;li&gt;Limited to the models provided by the platform&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  AWS Sagemaker
&lt;/h4&gt;

&lt;p&gt;In contrast, AWS Sagemaker is a comprehensive machine learning service that supports a wide range of ML tasks, from computer vision to natural language processing. It offers an extensive suite of tools to build, train, deploy, and scale ML projects. While powerful, Sagemaker’s complexity can be daunting for new users.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxml4imhygv86ilnzdm6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxml4imhygv86ilnzdm6.png" alt=" " width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broad range of ML capabilities&lt;/li&gt;
&lt;li&gt;Web-based IDE for streamlined development&lt;/li&gt;
&lt;li&gt;Support for custom models and hyperparameter tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Steeper learning curve&lt;/li&gt;
&lt;li&gt;Requires more management of infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Technical Comparison
&lt;/h4&gt;

&lt;p&gt;When it comes to performing GenAI tasks, AWS Bedrock and Sagemaker offer distinct advantages. Bedrock excels in ease of use with a focus on inference, while Sagemaker provides extensive customization and control.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;AWS Bedrock&lt;/th&gt;
&lt;th&gt;AWS Sagemaker&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GenAI Tasks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ideal for inference-heavy tasks&lt;/td&gt;
&lt;td&gt;Broad ML capabilities, including GenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Access&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top models from AI21 Labs, Anthropic, Cohere, Meta&lt;/td&gt;
&lt;td&gt;Custom models via frameworks like TensorFlow and PyTorch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports fine-tuning and retrieval-augmented generation&lt;/td&gt;
&lt;td&gt;Extensive model customization and tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simplified API, minimal infrastructure management&lt;/td&gt;
&lt;td&gt;Web IDE, but complex infrastructure management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless architecture&lt;/td&gt;
&lt;td&gt;Batch and real-time deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Automation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Focus on simplicity&lt;/td&gt;
&lt;td&gt;Advanced automation with AutoML and Sagemaker Autopilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Built-in security and compliance&lt;/td&gt;
&lt;td&gt;Basic security features, but SOC2, HIPAA, and ISO27001 compliant&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Disclaimer: The pricing is just an estimate according to the official Amazon Pricing Pages, &lt;a href="https://aws.amazon.com/ec2/pricing/on-demand/" rel="noopener noreferrer"&gt;here&lt;/a&gt; and &lt;a href="https://aws.amazon.com/bedrock/pricing/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Financial Comparison
&lt;/h4&gt;

&lt;p&gt;Financially, both services operate on a pay-as-you-go model, but the pricing structures differ significantly. Here’s a comparison based on deploying the LLaMA 3 8B model with 1 million inferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Sagemaker:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inference Costs:&lt;/strong&gt; $3.825 per hour, totaling approximately $212 for 1 million inferences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Costs:&lt;/strong&gt; $0.0023 per GB, totaling $23 for 1TB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Estimated Cost:&lt;/strong&gt; ~$236&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Bedrock:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Tokens:&lt;/strong&gt; $0.0003 per 1,000 tokens, totaling $15 for 50 million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Tokens:&lt;/strong&gt; $0.0006 per 1,000 tokens, totaling $18 for 30 million tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Estimated Cost:&lt;/strong&gt; $33&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Opinion:&lt;/strong&gt;&lt;br&gt;
AWS Bedrock’s token-based pricing model is advantageous for inference-heavy tasks, offering simplicity and lower costs. In contrast, AWS Sagemaker’s instance-based pricing provides more control over the entire ML pipeline but can be more expensive and complex.&lt;/p&gt;

&lt;h4&gt;
  
  
  Conclusion
&lt;/h4&gt;

&lt;p&gt;Ultimately, the choice between AWS Bedrock and AWS Sagemaker depends on your specific needs. For developers seeking a hassle-free, inference-focused platform, AWS Bedrock is the clear winner.&lt;/p&gt;

&lt;p&gt;Its simplicity and lower cost make it ideal for those who want to avoid managing infrastructure. On the other hand, enterprises and advanced users who require extensive customization and broader ML capabilities will benefit from the comprehensive features of AWS Sagemaker.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>llm</category>
      <category>rag</category>
      <category>powerfuldevs</category>
    </item>
    <item>
      <title>Llama Guard, AAAL Pt.3</title>
      <dc:creator>Aryan Kargwal</dc:creator>
      <pubDate>Mon, 22 Jul 2024 04:00:00 +0000</pubDate>
      <link>https://forem.com/tunehqai/llama-guard-aaal-pt3-530o</link>
      <guid>https://forem.com/tunehqai/llama-guard-aaal-pt3-530o</guid>
      <description>&lt;p&gt;During my exploration of adversarial robustness in LLMs, I came across Llama Guard, a tool designed to enhance the security of language models. Llama Guard offers a comprehensive solution to protect LLMs from various adversarial attacks, ensuring their safe and reliable operation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrfsutov259vlf4893j6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrfsutov259vlf4893j6.png" alt="Llama Guard Architecture" width="624" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the primary features of the Llama Guard is its ability to detect and prevent prompt injection attacks. These attacks can manipulate the model’s output by feeding it malicious prompts. Llama Guard employs advanced filtering techniques to identify and block such prompts, safeguarding the model’s integrity. By doing so, it ensures that the LLM processes only valid and secure inputs.&lt;/p&gt;

&lt;p&gt;In addition to prompt injection, Llama Guard is effective against token manipulation attacks. These attacks involve altering the tokens in the input to confuse the model and generate incorrect outputs. Llama Guard continuously monitors the input tokens, detecting any anomalies or manipulations, and correcting them in real-time. This helps maintain the accuracy and reliability of the model’s responses.&lt;/p&gt;

&lt;p&gt;Furthermore, Llama Guard incorporates ethical considerations into its design. It includes mechanisms to prevent the generation of biased or harmful content, ensuring that the LLM adheres to ethical standards. This is particularly important in applications where the model’s output can significantly impact users or stakeholders.&lt;/p&gt;

&lt;p&gt;Llama Guard also emphasizes data security. It includes features to prevent the leakage of sensitive information, protecting both the users and the integrity of the data. This makes it a valuable tool for applications that handle confidential or sensitive information.&lt;/p&gt;

&lt;p&gt;In conclusion, Llama Guard offers a robust defense against adversarial attacks on LLMs. Its comprehensive features, including prompt injection prevention, token manipulation detection, and ethical safeguards, make it an essential tool for ensuring the safe and reliable operation of language models. In the upcoming parts of this series, I will explore other tools and techniques that contribute to the adversarial robustness of LLMs.&lt;/p&gt;

</description>
      <category>llama</category>
      <category>llm</category>
      <category>opensource</category>
      <category>developer</category>
    </item>
  </channel>
</rss>
