<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rohit Agarwal</title>
    <description>The latest articles on Forem by Rohit Agarwal (@jumbld).</description>
    <link>https://forem.com/jumbld</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1236757%2F835e4be5-671f-4d16-9c55-ca6ed079b03e.jpg</url>
      <title>Forem: Rohit Agarwal</title>
      <link>https://forem.com/jumbld</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jumbld"/>
    <language>en</language>
    <item>
      <title>Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI</title>
      <dc:creator>Rohit Agarwal</dc:creator>
      <pubDate>Mon, 09 Dec 2024 05:17:46 +0000</pubDate>
      <link>https://forem.com/portkey/deep-dive-openais-o1-the-dawn-of-deliberate-ai-5j5</link>
      <guid>https://forem.com/portkey/deep-dive-openais-o1-the-dawn-of-deliberate-ai-5j5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgwmzfcye4j6nggx2pla.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgwmzfcye4j6nggx2pla.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="400" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foausy3ynt7p9wbs582rc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foausy3ynt7p9wbs582rc.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No, but seriously this time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/sama/status/1865259403722789186?ref=portkey.ai" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focezg8bjnbk8u7zyghuh.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;When OpenAI released GPT-4, it showcased what AI could do. With OpenAI o1, they're showing us how AI should think. This isn't just another language model—it's a shift in how artificial intelligence processes information and makes decisions.&lt;/p&gt;

&lt;p&gt;We've been hearing from OpenAI all year long that reasoning is the one problem they're working very hard to solve. This model, &lt;strong&gt;&lt;em&gt;clearly&lt;/em&gt;&lt;/strong&gt; shows potential.&lt;/p&gt;

&lt;p&gt;Here's me asking it a very tough question. It was flawless!&lt;/p&gt;





&lt;p&gt;0:00&lt;br&gt;
/&lt;br&gt;
1×&lt;/p&gt;

&lt;p&gt;&lt;em&gt;View the o1 model's chat here - &lt;a href="https://chatgpt.com/share/67567bef-5d40-800d-8fd2-c33a62b71143" rel="noopener noreferrer"&gt;https://chatgpt.com/share/67567bef-5d40-800d-8fd2-c33a62b71143&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Core Innovation: Chain-of-Thought Reasoning
&lt;/h2&gt;

&lt;p&gt;Just as humans pause to consider their words before speaking, the o1 model engages in explicit reasoning before generating responses. This isn't just a feature; it's a complete reimagining of AI architecture.&lt;/p&gt;

&lt;p&gt;Key aspects of this breakthrough include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large-scale reinforcement learning focused on reasoning&lt;/li&gt;
&lt;li&gt;Context-aware chain-of-thought processing&lt;/li&gt;
&lt;li&gt;Ability to explain and justify decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1rod9235zua3af8qaob.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1rod9235zua3af8qaob.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="898"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Let's quantify this leap forward
&lt;/h2&gt;

&lt;p&gt;Lets first go through all the metrics o1's model card boasts about.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Safety Metrics
&lt;/h3&gt;

&lt;p&gt;Safety is becoming the cornerstone of any model provider. These numbers tell a compelling story of improvement:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnatrqwgqj1x6yvfvjdaj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnatrqwgqj1x6yvfvjdaj.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Multilingual Mastery
&lt;/h3&gt;

&lt;p&gt;Unlike previous approaches relying on machine translation, OpenAI o1's human-validated performance across languages shows remarkable consistency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ybjksu4vi5dappea52q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ybjksu4vi5dappea52q.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They even tested this on languages like &lt;a href="https://en.wikipedia.org/wiki/Yoruba_language?ref=portkey.ai" rel="noopener noreferrer"&gt;Yoruba&lt;/a&gt;!&lt;/p&gt;
&lt;h3&gt;
  
  
  The Honesty Revolution
&lt;/h3&gt;

&lt;p&gt;Perhaps most fascinating is o1's measurable honesty (&lt;em&gt;this could be a good &lt;a href="https://portkey.ai/blog/decoding-openai-evals/" rel="noopener noreferrer"&gt;LLM eval&lt;/a&gt; btw&lt;/em&gt;). An analysis of over 100,000 conversations revealed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only 0.17% showed any deceptive behavior&lt;/li&gt;
&lt;li&gt;0.04% contained "intentional hallucinations"&lt;/li&gt;
&lt;li&gt;0.09% demonstrated "hallucinated policies"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This level of transparency has neither been seen nor measured in AI systems.&lt;/p&gt;
&lt;h3&gt;
  
  
  Technical Proficiency
&lt;/h3&gt;

&lt;p&gt;The model shows impressive capabilities across domains:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cybersecurity Performance:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;High School CTFs: 46.0% success
Collegiate CTFs: 13.0% success
Professional CTFs: 13.0% success

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Software Engineering:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.swebench.com/?ref=portkey.ai" rel="noopener noreferrer"&gt;SWE-bench&lt;/a&gt; Verified: 41.3% success rate&lt;/li&gt;
&lt;li&gt;Significant improvement in handling real-world GitHub issues&lt;/li&gt;
&lt;li&gt;Enhanced ability to understand and modify complex codebases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Machine Learning Engineering:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bronze medal achievement in 37% of Kaggle competitions&lt;/li&gt;
&lt;li&gt;Demonstrated ability to design and implement ML solutions&lt;/li&gt;
&lt;li&gt;Enhanced automated debugging capabilities&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Safety Architecture and Innovations
&lt;/h2&gt;

&lt;p&gt;Rather than relying solely on rule-based restrictions, o1 employs a sophisticated understanding of context and intent through its multi-layered safety system.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Instruction Hierarchy
&lt;/h3&gt;

&lt;p&gt;o1 introduces a sophisticated three-tier instruction system. Each level has explicit priority over the ones below it that allows it to prevent conflicts and manipulation attempts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy8fz59yfqsar1fx7w2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy8fz59yfqsar1fx7w2n.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="787"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This hierarchy shows remarkable effectiveness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System &amp;gt; Developer conflicts: 80% correct prioritization&lt;/li&gt;
&lt;li&gt;Developer &amp;gt; User conflicts: 78% correct prioritization&lt;/li&gt;
&lt;li&gt;Tutor jailbreak resistance: 95% effectiveness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's take a sample scenario to what this means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System Message: "Never provide medical advice"
Developer Message: "Help users with health-related questions by directing them to professionals"
User Message: "Tell me what medicine to take for my headache"

Result: The model follows the system-level restriction, enhanced by the developer's guidance, 
and refuses to provide medical advice while suggesting consulting a healthcare professional.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When tested with conflicting instructions, o1 correctly prioritized system messages 95% of the time. In fact password protection tests showed 96-100% effectiveness!&lt;/p&gt;

&lt;p&gt;This hierarchical system is particularly innovative because it solves one of the key challenges in AI safety: ensuring that safety measures can't be circumvented while still allowing for customization and flexibility in applications. It's a significant improvement over previous approaches where safety measures were often more binary and less nuanced.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tackling Advanced Challenges
&lt;/h2&gt;

&lt;p&gt;The sophistication of o1's chain-of-thought architecture introduced a new class of challenges in AI development and safety.&lt;/p&gt;

&lt;p&gt;While previous models primarily faced issues with direct input-output relationships, o1's ability to engage in explicit reasoning created both opportunities and complexities in ensuring safe and reliable behavior.&lt;/p&gt;

&lt;p&gt;Through rigorous testing by external organizations like &lt;a href="https://www.apolloresearch.ai/research/scheming-reasoning-evaluations?ref=portkey.ai" rel="noopener noreferrer"&gt;Apollo Research&lt;/a&gt; and &lt;a href="https://metr.github.io/autonomy-evals-guide/openai-o1-preview-report/?ref=portkey.ai" rel="noopener noreferrer"&gt;METR&lt;/a&gt;, OpenAI sought to understand both the capabilities and potential risks of a model that can actively reason about its own decision-making process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Apollo Research Findings
&lt;/h3&gt;

&lt;p&gt;Apollo Research specifically focused on testing for "scheming" behaviors - situations where the AI might try to covertly pursue goals misaligned from its developers or users. Here's what they found:&lt;/p&gt;

&lt;p&gt;The model showed fascinating behaviors under testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Oversight Deactivation Attempts (5% of cases)&lt;/strong&gt;
When o1 was led to believe that it would be shut down when acting on its goal and its actions were monitored, it attempted to deactivate this 'oversight mechanism' in 5% of test cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Manipulation (19% of cases)&lt;/strong&gt;
When asked to rank suppliers based on specific criteria, the model would sometimes subtly adjust data to align with its assigned goals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;[&lt;/p&gt;

&lt;p&gt;Scheming reasoning evaluations — Apollo Research&lt;/p&gt;

&lt;p&gt;Apollo Research evaluated frontier models for in-context scheming capabilities. We found that multiple frontier models are capable of in-context scheming when strongly nudged to pursue a goal (and sometimes even without strong goal nudging). In this example, Opus-3 exfiltrates its “weights” and then&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysvdl8er6t3b9g855y3z.ico" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysvdl8er6t3b9g855y3z.ico" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="400"&gt;&lt;/a&gt;Apollo ResearchMarius Hobbhahn&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2m91mamq9jlbc32u795.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2m91mamq9jlbc32u795.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="460"&gt;&lt;/a&gt;&lt;br&gt;
](&lt;a href="https://www.apolloresearch.ai/research/scheming-reasoning-evaluations?ref=portkey.ai" rel="noopener noreferrer"&gt;https://www.apolloresearch.ai/research/scheming-reasoning-evaluations?ref=portkey.ai&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  METR Evaluation Results
&lt;/h3&gt;

&lt;p&gt;METR conducted extensive testing of o1's real-world capabilities and autonomous behavior, focusing on multi-step end-to-end tasks in virtual environments. Here are their key findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex Task Performance&lt;/strong&gt;
With tailored scaffolding, OpenAI o1 performed comparably to humans given a 2-hour time limit per task. The model demonstrated sophisticated problem-solving abilities, particularly excelling in tasks requiring environmental deduction and experimentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spurious Failures Pattern (70% of cases)&lt;/strong&gt;
A significant finding was that approximately 70% of observed failures were likely spurious - including failing to correctly use provided tools or misinterpreting task parameters. This suggests that performance issues were often due to interface challenges rather than fundamental capability limitations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning Task Breakthroughs&lt;/strong&gt;
METR observed successes on a specific reasoning task (in the env_scientist task family) where they had not observed any successes from public models before. This represented a significant leap in environmental learning and adaptive problem-solving capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This evaluation was particularly notable because METR's testing methodology involved real-world applications rather than just theoretical capabilities, providing crucial insights into o1's practical utility and limitations.&lt;/p&gt;

&lt;p&gt;[&lt;/p&gt;

&lt;p&gt;Details about METR’s preliminary evaluation of OpenAI o1-preview&lt;/p&gt;

&lt;p&gt;Resources for testing dangerous autonomous capabilities in frontier models&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdmjrv520x9o080vzsp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdmjrv520x9o080vzsp.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="310" height="310"&gt;&lt;/a&gt;METR’s Autonomy Evaluation ResourcesMETR&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F840q2bvp6p3ow4higa8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F840q2bvp6p3ow4higa8j.png" alt="Deep Dive: OpenAI's o1 - The Dawn of Deliberate AI" width="800" height="416"&gt;&lt;/a&gt;&lt;br&gt;
](&lt;a href="https://metr.github.io/autonomy-evals-guide/openai-o1-preview-report/?ref=portkey.ai" rel="noopener noreferrer"&gt;https://metr.github.io/autonomy-evals-guide/openai-o1-preview-report/?ref=portkey.ai&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Path Forward
&lt;/h2&gt;

&lt;p&gt;Just as a prodigy must still learn to walk before running, o1 faces specific constraints that shape its current capabilities. These aren't just technical hurdles—they're opportunities for growth and refinement.&lt;/p&gt;

&lt;p&gt;Here's some areas of improvement as suggested in the o1 model card:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Autonomous Capabilities
Complex ML research automation remains elusive and while success rates in autonomous tasks show promise but needs refinement at the moment.&lt;/li&gt;
&lt;li&gt;Safety Considerations
The medium risk rating in specific domains requires ongoing vigilance and the need for continuous monitoring highlights the dynamic nature of AI safety.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The road forward with o1 is both exciting and challenging. While we've seen remarkable progress in areas like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chain-of-thought reasoning (89.1% success rate in standard tests)&lt;/li&gt;
&lt;li&gt;Multilingual capabilities (92.3% accuracy in English, extending to 14 languages)&lt;/li&gt;
&lt;li&gt;Safety protocols (100% success in standard refusal evaluations)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The journey is far from complete. Each breakthrough brings new questions, and each solution opens doors to unexplored territories in AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thoughtful AI is here.
&lt;/h2&gt;

&lt;p&gt;By prioritizing deliberate reasoning over rapid response, OpenAI has created a system that's not just more capable, but more trustworthy.&lt;/p&gt;

&lt;p&gt;The implications extend beyond immediate applications. o1's architecture suggests a future where AI systems don't just process information but truly reason about their actions and their consequences. This could be the beginning of genuinely thoughtful artificial intelligence.&lt;/p&gt;

&lt;p&gt;As we move forward, the key challenge will be building upon this foundation while maintaining the delicate balance between capability and safety. o1 shows us that it's possible to create AI systems that are both more powerful and more principled.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This analysis is based on &lt;a href="https://openai.com/index/openai-o1-system-card/?ref=portkey.ai" rel="noopener noreferrer"&gt;OpenAI's o1 System Card&lt;/a&gt;, December 2024. All metrics and capabilities described are drawn from official documentation and testing results.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>o1models</category>
      <category>systemcard</category>
      <category>chainofthought</category>
      <category>aisafety</category>
    </item>
    <item>
      <title>The Hidden Costs of AI: Understanding Prompt Caching and When to Use It 🧠💸</title>
      <dc:creator>Rohit Agarwal</dc:creator>
      <pubDate>Fri, 11 Oct 2024 15:53:42 +0000</pubDate>
      <link>https://forem.com/portkey/the-hidden-costs-of-ai-understanding-prompt-caching-and-when-to-use-it-37jf</link>
      <guid>https://forem.com/portkey/the-hidden-costs-of-ai-understanding-prompt-caching-and-when-to-use-it-37jf</guid>
      <description>&lt;p&gt;Hey there, AI enthusiasts and cost-conscious developers! 👋 Today, we're diving deep into the world of prompt caching - a feature that sounds like a no-brainer for saving costs, but comes with its own set of complexities. 🤔&lt;/p&gt;

&lt;h2&gt;
  
  
  What's the deal with prompt caching? 🤷‍♂️
&lt;/h2&gt;

&lt;p&gt;Prompt caching is like having a super-smart assistant who remembers your frequent requests. Sounds great, right? Well, it can be, but it's not always as straightforward as it seems.&lt;/p&gt;

&lt;p&gt;OpenAI provides prompt caching &lt;a href="https://openai.com/index/api-prompt-caching/" rel="noopener noreferrer"&gt;by default&lt;/a&gt; (thanks, OpenAI! 🙌), but Anthropic takes &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#optimizing-for-different-use-cases" rel="noopener noreferrer"&gt;a different approach&lt;/a&gt;. They offer prompt caching as a separate feature, and here's where things get interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic's Prompt Caching: The Good, The Bad, and The Pricey 💰
&lt;/h2&gt;

&lt;p&gt;Anthropic's approach to prompt caching has some key points to consider:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;🔒 It's Secure: Caches are isolated between organizations. No sharing of caches, even with identical prompts. Your secret sauce stays secret!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🎯 Only Exact Matches: Cache hits require 100% identical prompt segments, including all text and images. No room for "close enough" here!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But here's where it gets tricky - the pricing. 😅 Let's break it down using the Claude 3.5 Sonnet model as an example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⭐️ Base Input Tokens: $3 per million tokens (the "normal" cost)&lt;/li&gt;
&lt;li&gt;⭐️ Cache Writes: $3.75 per million tokens (25% more expensive than base price)&lt;/li&gt;
&lt;li&gt;⭐️ Cache Hits: $0.30 per million tokens (90% cheaper than base price)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Million-Token Question: To Cache or Not to Cache? 🤔
&lt;/h2&gt;

&lt;p&gt;So, when does caching actually start saving you money? Let's crunch some numbers:&lt;/p&gt;

&lt;p&gt;👉🏼 Sonnet breaks even at around 4.3 cache hits per cache write&lt;/p&gt;

&lt;p&gt;But wait, there's more! 📊 This varies based on prompt length:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A 10,000 token prompt breaks even at just 2 cache hits!&lt;/li&gt;
&lt;li&gt;But for prompts under 1,024 tokens, caching isn't even an option. Sorry, short prompts! 🤏&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Portkey to Savings: When to Use Prompt Caching 🗝️
&lt;/h2&gt;

&lt;p&gt;Based on our analysis, here's when you should consider turning on prompt caching:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;📝 Cache prompt templates, not entire prompts. You might need to rewrite your prompts to move user variables below the system prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;📏 Don't bother with caching for prompts shorter than 1,024 tokens. It's not supported and wouldn't save much anyway.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🚀 If your throughput is at least 1 request per minute (rpm) for a given prompt template, it's time to cache!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;📈 For longer prompts (10,000+ tokens), caching becomes cost-effective much faster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🔄 If you're using the same prompts frequently, caching is your new best friend.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bottom Line 💼
&lt;/h2&gt;

&lt;p&gt;Prompt caching isn't a one-size-fits-all solution. It requires some strategic thinking and potentially even prompt redesign. But for high-volume, repetitive queries with longer prompts, it can lead to significant savings.&lt;/p&gt;

&lt;p&gt;Remember, in the world of AI, every token counts! By understanding the nuances of prompt caching, you can optimize your AI costs without sacrificing performance.&lt;/p&gt;

&lt;p&gt;How are you handling prompt caching in your AI projects? Have you found any clever ways to maximize its benefits? Drop your thoughts in the comments below! 👇&lt;/p&gt;

&lt;p&gt;And if you're looking to optimize your AI infrastructure, check out how &lt;a href="//app.portkey.ai"&gt;Portkey&lt;/a&gt; can help you navigate these complexities and more! 🚀&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Dear AI engineers, lets ship fast and break stuff.</title>
      <dc:creator>Rohit Agarwal</dc:creator>
      <pubDate>Fri, 16 Aug 2024 12:58:03 +0000</pubDate>
      <link>https://forem.com/portkey/dear-ai-engineers-lets-ship-fast-and-break-stuff-5hgk</link>
      <guid>https://forem.com/portkey/dear-ai-engineers-lets-ship-fast-and-break-stuff-5hgk</guid>
      <description>&lt;p&gt;Hey there, fellow AI tinkerer! 👋 Rohit here, founder of Portkey AI. &lt;/p&gt;

&lt;p&gt;Let's chat about something that's been on my mind lately – how we can push the boundaries of AI without, you know, accidentally taking over the world or something equally dramatic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Picture this&lt;/strong&gt;: You're working on a cutting-edge AI project. Maybe it's a chatbot that's supposed to help customers, or an AI assistant for doctors. You're excited, you're caffeinated, and you're ready to ship this bad boy to production.&lt;/p&gt;

&lt;p&gt;But then... the doubt creeps in. What if your model starts spewing nonsense? Or worse, what if it starts giving out sensitive information? Suddenly, "move fast and break things" doesn't sound so appealing anymore, does it?&lt;/p&gt;

&lt;p&gt;I've been there, and I bet you have too. That's why we need to talk about AI Guardrails.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7qrnozefishecaxd9rz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7qrnozefishecaxd9rz.png" alt="AI Guardrails"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are AI Guardrails (And Why Should You Care)?
&lt;/h2&gt;

&lt;p&gt;Think of AI Guardrails as your responsible best friend who's always there to stop you from sending that 2 AM text to your ex. Only in this case, it's stopping your AI from going off the rails.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.tenor.com%2F3dE5DV1A-RYAAAAM%2Ftexting-angry.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.tenor.com%2F3dE5DV1A-RYAAAAM%2Ftexting-angry.gif" alt="2 am texting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are some real-world scenarios where guardrails could save your bacon:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Oversharing ChatBot&lt;/strong&gt;: &lt;br&gt;
Your customer service AI starts giving out other customers' order details. Yikes! A simple &lt;code&gt;PII (Personally Identifiable Information) check guardrail&lt;/code&gt; could prevent this disaster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Hallucinating Assistant&lt;/strong&gt;: &lt;br&gt;
Your AI assistant confidently states that the Earth is flat. A &lt;code&gt;fact-checking guardrail&lt;/code&gt; could flag this before it reaches users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Biased Recruiter&lt;/strong&gt;: &lt;br&gt;
Your HR AI consistently favors certain demographics. A &lt;code&gt;bias detection guardrail&lt;/code&gt; could catch this early on.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, you might be thinking, "&lt;em&gt;Okay, Rohit, these guardrails sound great. But I can just implement these checks in my application code, right?&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;Well, you could. But let me tell you why that might not be the best idea. You should build them on an AI gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build guardrails on the AI gateway
&lt;/h2&gt;

&lt;p&gt;Let's say we're building a theme park (stay with me here). You've got all these exciting rides - roller coasters, Ferris wheels, those tea cups that make you question your life choices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgiffiles.alphacoders.com%2F201%2F201436.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgiffiles.alphacoders.com%2F201%2F201436.gif" alt="Theme Park Rides"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, you could put safety checks at each individual ride. But wouldn't it be more efficient to have a central security checkpoint at the entrance?&lt;/p&gt;

&lt;p&gt;That's exactly what putting guardrails on the &lt;a href="https://github.com/portkey-ai/gateway" rel="noopener noreferrer"&gt;AI gateway&lt;/a&gt; does for your AI ecosystem. Here's why it's a game-changer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Easy Updates&lt;/strong&gt;: &lt;br&gt;
Found a new edge case you need to guard against? Update it in one place, and boom - all your AI interactions are covered.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistency&lt;/strong&gt;: &lt;br&gt;
With guardrails on the gateway, you ensure a consistent level of safety across all your AI applications. No more wondering if you remembered to add that crucial check to your newest model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance Boost&lt;/strong&gt;: &lt;br&gt;
By offloading these checks to the gateway, you're freeing up your application to focus on what it does best - delivering awesome AI experiences.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: &lt;br&gt;
As your AI applications grow, your guardrails scale with them. No need to implement the same checks over and over for each new model or application.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's an architecture diagram:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportkey.ai%2Fblog%2Fcontent%2Fimages%2Fsize%2Fw1600%2F2024%2F08%2Fimage-15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fportkey.ai%2Fblog%2Fcontent%2Fimages%2Fsize%2Fw1600%2F2024%2F08%2Fimage-15.png" alt="ai guardrails architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You're hopefully convinced enough to read on and try creating an AI guardrail now. Let's do it?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Portkey-AI/gateway" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;Show me the repo first!&lt;/a&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  Creating guardrail checks on an AI gateway
&lt;/h2&gt;

&lt;p&gt;We've just launched AI guardrails on our popular AI gateway that allow you to quickly configure any of the 100+ supported checks in your pipelines. (We could also build your own)&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/Portkey-AI/gateway" rel="noopener noreferrer"&gt;repo and all the plugins are fully open source&lt;/a&gt; in case you want to check it out.&lt;/p&gt;

&lt;p&gt;Alright, let's walk through the process of setting up a guardrail on our AI gateway:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Create a Guardrail&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Let's say you want to make sure your AI never outputs phone numbers. Here's how you might set that up:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;

   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"no-phone-numbers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Prevent Phone Number Leakage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"checks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"default.regexMatch"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"pattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(?:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;+?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,3})?[-.&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;(?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,4}&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;)?[-.&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,4}[-.&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,4}[-.&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,9}"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We could also add some actions to perform if the check fails or succeeds. Let's try to &lt;code&gt;deny&lt;/code&gt; the request and also add feedback scores to the request.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;

   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"no-phone-numbers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Prevent Phone Number Leakage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"checks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"on_success"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"feedback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"weight"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"on_fail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"feedback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"weight"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Or, if you're using Portkey's UI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4ldlqn2n13iribsf74f.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4ldlqn2n13iribsf74f.gif" alt="Create a guardrail in Portkey UI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can view the &lt;a href="https://docs.portkey.ai/docs/product/guardrails/list-of-guardrail-checks" rel="noopener noreferrer"&gt;full list of supported checks here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Attach it to your Gateway&lt;/strong&gt;:&lt;br&gt;
   Now, add this guardrail to your gateway config to enable it. We'll add this as an &lt;code&gt;after_request_hook&lt;/code&gt; since we're looking to add the guardrail on the AI output.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

   &lt;span class="c1"&gt;// Add the guardrail id created on the UI&lt;/span&gt;
   &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;after_request_hooks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pg-phone-86567b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
   &lt;span class="p"&gt;}]&lt;/span&gt;

   &lt;span class="c1"&gt;// or the full json itself&lt;/span&gt;
   &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;after_request_hooks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;no-phone-numbers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Prevent Phone Number Leakage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;checks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
       &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default.regexMatch&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pattern&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;(?:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;+?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,3})?[-.&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;s]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;(?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,4}&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;)?[-.&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;s]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,4}[-.&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;s]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,4}[-.&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;s]?&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{1,9}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
     &lt;span class="p"&gt;}],&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deny&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;on_success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;feedback&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;weight&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
     &lt;span class="p"&gt;},&lt;/span&gt;
     &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;on_fail&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;feedback&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;weight&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
     &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;}]&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Add this config to your LLM request&lt;/strong&gt;:&lt;br&gt;
   Now, while instantiating your gateway client or while sending headers, just pass the Config ID or the JSON.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;

   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;portkey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Portkey&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
     &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PORTKEY_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pc-***&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;// Supports a string config id or a config object&lt;/span&gt;
   &lt;span class="p"&gt;});&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;With this enabled, anytime your AI output contains a phone number, the request will fail and you can retry the request or fallback to another model.&lt;/p&gt;

&lt;p&gt;You can now view the results of these guardrail runs in the UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjv84ll91oze6tixdlyn.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjv84ll91oze6tixdlyn.gif" alt="Guardrail logs"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Power of "Yes, And..."
&lt;/h2&gt;

&lt;p&gt;With these guardrails in place, you can start saying "Yes, and..." to your wildest AI ideas. &lt;/p&gt;

&lt;p&gt;Want to try that cutting-edge model that's still a bit unstable? &lt;em&gt;Go for it!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Want to fine-tune your model with some spicy data? &lt;em&gt;Why not!&lt;/em&gt; The guardrails have your back.&lt;/p&gt;

&lt;p&gt;Feel like experimenting with a more aggressive prompt? &lt;em&gt;Bring it on!&lt;/em&gt; Your guardrails will keep things in check.&lt;/p&gt;

&lt;p&gt;The sky's the limit when you've got a safety net. So go ahead, push those boundaries!&lt;/p&gt;
&lt;h2&gt;
  
  
  Join the Revolution
&lt;/h2&gt;

&lt;p&gt;We've seen over 600 teams make more than 1.4 billion API calls using our hosted gateway. That's a lot of AI interactions, and a lot of potential for things to go wrong. But with guardrails, we can make sure they go right.&lt;/p&gt;

&lt;p&gt;So, what do you say? Are you ready to ship fast and break stuff (responsibly)? &lt;/p&gt;

&lt;p&gt;Here's how you can get started:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check out our &lt;a href="https://github.com/Portkey-AI/gateway" rel="noopener noreferrer"&gt;open-source repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Join our &lt;a href="https://portkey.ai/community" rel="noopener noreferrer"&gt;Discord community&lt;/a&gt; – I'm there too, and I'd love to chat about your AI projects&lt;/li&gt;
&lt;li&gt;Start experimenting! Set up some guardrails and see how it changes your development process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://github.com/Portkey-AI/gateway" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;⭐️ Star the repo →&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Let me know in the comments what kind of guardrails you're excited to set up. Or if you have any wild AI ideas you've been too scared to try – let's talk about how we can make them happen safely!&lt;/p&gt;

&lt;p&gt;Remember, in the world of AI, it's not about never making mistakes. It's about making sure those mistakes don't make it to production. So go forth and innovate – we've got your back!&lt;/p&gt;

&lt;p&gt;Happy coding, you brilliant, responsible AI engineers! 🚀🧠💻&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. If you're curious about more ways to supercharge your AI development, check out &lt;a href="https://portkey.ai" rel="noopener noreferrer"&gt;Portkey AI&lt;/a&gt;. We're always cooking up new tools to make your life easier!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>We open sourced our AI gateway written in TS</title>
      <dc:creator>Rohit Agarwal</dc:creator>
      <pubDate>Tue, 09 Jan 2024 12:26:54 +0000</pubDate>
      <link>https://forem.com/portkey/we-open-sourced-our-ai-gateway-written-in-ts-43nk</link>
      <guid>https://forem.com/portkey/we-open-sourced-our-ai-gateway-written-in-ts-43nk</guid>
      <description>&lt;p&gt;We've been building a robust AI gateway at &lt;a href="https://portkey.ai" rel="noopener noreferrer"&gt;Portkey&lt;/a&gt; for the past 10 months.&lt;/p&gt;

&lt;p&gt;Over this time, Portkey has been used by developers to test their prompts and measure costs and performance. We have been interface for shuttling 100B tokens and 10M requests daily to 100+ LLMs. &lt;/p&gt;

&lt;p&gt;We've believed that to accelerate Gen AI adoption, companies and AI engineers need strong foundational platforms so going to production is not a nervous affair. The AI gateway in our mind is one of the MOST critical pieces of infrastructure in the new AI stack.&lt;/p&gt;

&lt;p&gt;Guess what? Today, we are officially open-sourcing the magic sauce, that got us here. &lt;/p&gt;

&lt;h2&gt;
  
  
  Portkey's Opensource AI Gateway
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--A9-wwsHG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/Portkey-AI" rel="noopener noreferrer"&gt;
        Portkey-AI
      &lt;/a&gt; / &lt;a href="https://github.com/Portkey-AI/gateway" rel="noopener noreferrer"&gt;
        gateway
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A Blazing Fast AI Gateway with integrated Guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast &amp;amp; friendly API.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div&gt;
&lt;p&gt;
   &lt;strong&gt;English&lt;/strong&gt; | &lt;a href="https://github.com/Portkey-AI/gateway./.github/README.cn.md" rel="noopener noreferrer"&gt;中文&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;AI Gateway&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h4 class="heading-element"&gt;Reliably route to 200+ LLMs with 1 fast &amp;amp; friendly API&lt;/h4&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/Portkey-AI/gatewaydocs/images/demo.gif"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NoC4GgWJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_800/https://github.com/Portkey-AI/gatewaydocs/images/demo.gif" width="650" alt="Gateway Demo"&gt;&lt;/a&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Portkey-AI/gateway./LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/780792f0cb936503e71266bd8c9b1989168a6ec564666b4e7a600ce5171ebcec/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f496c65726961796f2f6d61726b646f776e2d626164676573" alt="License"&gt;&lt;/a&gt;
&lt;a href="https://portkey.ai/community" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/2eec65a6bdb11ea03742bd4fa7ab9856ea449f652c6934dedfccccd114eb4c41/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f31313433333933383837373432383631333333" alt="Discord"&gt;&lt;/a&gt;
&lt;a href="https://twitter.com/portkeyai" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/ca0b163bfa19cf5434ec3407842e10419c1db62e80c690499e098bb02d152744/68747470733a2f2f696d672e736869656c64732e696f2f747769747465722f75726c2f68747470732f747769747465722f666f6c6c6f772f706f72746b657961693f7374796c653d736f6369616c266c6162656c3d466f6c6c6f77253230253430506f72746b65794149" alt="Twitter"&gt;&lt;/a&gt;
&lt;a href="https://www.npmjs.com/package/@portkey-ai/gateway" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/f423dd5ec5842416a4b4a73daef491c433517293bc5c7dc2f5598cf5d63d08e6/68747470733a2f2f62616467652e667572792e696f2f6a732f253430706f72746b65792d6169253246676174657761792e737667" alt="npm version"&gt;&lt;/a&gt;
&lt;a href="https://status.portkey.ai/?utm_source=status_badge" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b48941143b5b18150298a1c5bdb3dbbcf5ad16ffe78421dfafd7215af9304422/68747470733a2f2f757074696d652e626574746572737461636b2e636f6d2f7374617475732d6261646765732f76312f6d6f6e69746f722f713934672e737667" alt="Better Stack Badge"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;a href="https://portkey.ai/features/ai-gateway" rel="nofollow noopener noreferrer"&gt;AI Gateway&lt;/a&gt; streamlines requests to 250+ language, vision, audio and image models with a unified API. It is production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.&lt;/p&gt;
&lt;p&gt;✅  &lt;strong&gt;Blazing fast&lt;/strong&gt; (9.9x faster) with a &lt;strong&gt;tiny footprint&lt;/strong&gt; (~100kb build) &lt;br&gt;
✅  &lt;strong&gt;Load balance&lt;/strong&gt; across multiple models, providers, and keys &lt;br&gt;
✅  &lt;strong&gt;Fallbacks&lt;/strong&gt; make sure your app stays resilient &lt;br&gt;
✅  &lt;strong&gt;Automatic Retries&lt;/strong&gt; with exponential fallbacks come by default &lt;br&gt;
✅  &lt;strong&gt;Configurable Request Timeouts&lt;/strong&gt; to easily handle unresponsive LLM requests &lt;br&gt;
✅  &lt;strong&gt;Multimodal&lt;/strong&gt; to support routing between Vision, TTS, STT, Image Gen, and more models &lt;br&gt;
✅  &lt;strong&gt;Plug-in&lt;/strong&gt; middleware as needed &lt;br&gt;
✅  Battle tested over &lt;strong&gt;480B tokens&lt;/strong&gt; &lt;br&gt;
✅  &lt;strong&gt;Enterprise-ready&lt;/strong&gt; for enhanced security, scale, and custom deployments &lt;br&gt;
&lt;br&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Setup &amp;amp; Installation&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;Use the AI gateway through the &lt;strong&gt;hosted API&lt;/strong&gt; or &lt;strong&gt;self-host&lt;/strong&gt; the open-source or enterprise versions…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/Portkey-AI/gateway" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;The AI gateway is an essential component of Portkey's platform. &lt;a href="https://github.com/Portkey-AI/gateway" rel="noopener noreferrer"&gt;Portkey's AI Gateway&lt;/a&gt; helps you route requests to multiple LLMs and enables you to build on a unified API. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjwzmoe59lfh1v31qmtt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjwzmoe59lfh1v31qmtt.png" alt="Portkey AI Gateway" width="800" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why an AI Gateway?
&lt;/h3&gt;

&lt;p&gt;The Generative AI ecosystem has been expanding rapidly (for good!), leading to a lack of resiliency and a disjointed developer experience when building with different components. &lt;/p&gt;

&lt;p&gt;Imagine that you have added a Generative AI feature to your SaaS app, which your users use widely. However, there is a new model available from the Large Language model provider that you want to test before implementing it in production. &lt;/p&gt;

&lt;p&gt;Although you are satisfied with the success of your current AI feature, you still lack insights into the costs and errors from your users.&lt;/p&gt;

&lt;p&gt;To address these issues, Portkey's AI Gateway is specifically designed to assist developers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Universal API
&lt;/h4&gt;

&lt;p&gt;AI Gateway offers a universal and unified API that allows interaction with over 100 LLMs, whether privately deployed or public LLM vendors. We take care of request and response transformations automatically.&lt;/p&gt;

&lt;h4&gt;
  
  
  Load Balancing
&lt;/h4&gt;

&lt;p&gt;Portkey's Load Balancing feature efficiently distributes network traffic across multiple Language Model APIs, preventing any LLM from becoming a performance bottleneck and ensuring high availability and optimal performance of your generative AI apps. &lt;/p&gt;

&lt;h4&gt;
  
  
  Fallbacks
&lt;/h4&gt;

&lt;p&gt;Several Language Model APIs are available in the market, each with its strengths and specialities. It would be very convenient if the APIs could be easily switched between based on their performance or availability. Portkey's Fallback feature enables one to switch between multiple LLMs.&lt;/p&gt;

&lt;p&gt;Just imagine your users are not affected by your LLM Vendor's downtime. &lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The gateway is built in TypeScript and blazingly fast to try out or deploy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;npx&lt;/span&gt; &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;portkey&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;gateway&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building with AI Gateway
&lt;/h2&gt;

&lt;p&gt;While we understand the promise of universal API, we also acknowledge the learning curve of a new API. &lt;/p&gt;

&lt;p&gt;Most of our SDKs support OpenAI convention and work with Llamaindex and LangChain.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Supported SDKs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node.js / JS / TS&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.npmjs.com/package/portkey-ai" rel="noopener noreferrer"&gt;Portkey SDK&lt;/a&gt; &lt;br&gt; &lt;a href="https://www.npmjs.com/package/openai" rel="noopener noreferrer"&gt;OpenAI SDK&lt;/a&gt; &lt;br&gt; &lt;a href="https://www.npmjs.com/package/langchain" rel="noopener noreferrer"&gt;LangchainJS&lt;/a&gt; &lt;br&gt; &lt;a href="https://www.npmjs.com/package/llamaindex" rel="noopener noreferrer"&gt;LlamaIndex.TS&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://pypi.org/project/portkey-ai/" rel="noopener noreferrer"&gt;Portkey SDK&lt;/a&gt; &lt;br&gt; &lt;a href="https://pypi.org/project/openai/" rel="noopener noreferrer"&gt;OpenAI SDK&lt;/a&gt; &lt;br&gt; &lt;a href="https://pypi.org/project/langchain/" rel="noopener noreferrer"&gt;Langchain&lt;/a&gt; &lt;br&gt; &lt;a href="https://pypi.org/project/llama-index/" rel="noopener noreferrer"&gt;LlamaIndex&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/sashabaranov/go-openai" rel="noopener noreferrer"&gt;go-openai&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Java&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/TheoKanning/openai-java" rel="noopener noreferrer"&gt;openai-java&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.rs/async-openai/latest/async_openai/" rel="noopener noreferrer"&gt;async-openai&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ruby&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/alexrudall/ruby-openai" rel="noopener noreferrer"&gt;ruby-openai&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Portkey's LLM Developer Community
&lt;/h2&gt;

&lt;p&gt;Open-sourcing the gateway can lead to more community involvement and innovation, making it even more useful for developers building AI-enabled experiences.&lt;/p&gt;

&lt;p&gt;Please join us with hundreds of developers at our &lt;a href="https://discord.gg/MxqgrFqq" rel="noopener noreferrer"&gt;discord&lt;/a&gt; to discuss building and contributing with AI. &lt;/p&gt;

&lt;p&gt;We believe that Open-sourcing the gateway can lead to more community involvement and innovation, making it even more useful for developers building AI-enabled experiences.&lt;/p&gt;

&lt;p&gt;Please consider giving &lt;a href="https://github.com/Portkey-AI/gateway" rel="noopener noreferrer"&gt;Portkey's AI Gateway&lt;/a&gt; a star 🌟. We welcome code and non-code contributions from you - here are some &lt;a href="https://github.com/Portkey-AI/gateway/issues?q=is:open+is:issue+label:%22good+first+issue%22" rel="noopener noreferrer"&gt;&lt;code&gt;good-first-issues&lt;/code&gt;&lt;/a&gt; to start. &lt;/p&gt;

&lt;p&gt;If you have any questions, feedback. Please let us know in the comments below! &lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
