<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Marko Vidrih</title>
    <description>The latest articles on Forem by Marko Vidrih (@marko_vidrih).</description>
    <link>https://forem.com/marko_vidrih</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1247475%2Fcd2d113b-0b64-4b4a-a060-de3508cf9155.png</url>
      <title>Forem: Marko Vidrih</title>
      <link>https://forem.com/marko_vidrih</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/marko_vidrih"/>
    <language>en</language>
    <item>
      <title>The Easiest Way to Run Llama 3 Locally</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Fri, 17 May 2024 10:16:31 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/the-easiest-way-to-run-llama-3-locally-239i</link>
      <guid>https://forem.com/marko_vidrih/the-easiest-way-to-run-llama-3-locally-239i</guid>
      <description>&lt;p&gt;Running large language models (LLMs) on your own computer is now popular because it gives you security, privacy, and more control over what the model does. In this mini tutorial, we'll learn the simplest way to download and use the Llama 3 model.&lt;br&gt;
Llama 3 is Meta AI's latest LLM. It's open-source, has advanced AI features, and gives better responses compared to Gemma, Gemini, and Claud 3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Ollama?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ollama/ollama"&gt;Ollama&lt;/a&gt; is an open-source tool for using LLMs like Llama 3 on your computer. Thanks to new research, these models don't need a lot of VRAM, computing power, or storage. They are designed to work well on laptops.&lt;/p&gt;

&lt;p&gt;There are many tools for using LLMs on your computer, but Ollama is the easiest to set up and use. It lets you use LLMs directly from a terminal or PowerShell. It's fast and has features that let you start using it right away.&lt;/p&gt;

&lt;p&gt;The best thing about Ollama is that it works with all kinds of software, extensions, and applications. For example, you can use the CodeGPT extension in VSCode and connect Ollama to use Llama 3 as your AI code assistant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installing Ollama&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Download and install Ollama from its GitHub repository (&lt;a href="https://github.com/ollama/ollama"&gt;Ollama/ollama&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scroll down and click the download link for your operating system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs71rxuqzlmp3vlxjw32s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs71rxuqzlmp3vlxjw32s.png" alt="Image description" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;After installing Ollama, it will show in your system tray.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxdq05v3atj2s8m96ywi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxdq05v3atj2s8m96ywi.png" alt="Image description" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downloading and Using Llama 3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To download and start using the Llama 3 model, type this command in your terminal/shell:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ollama run llama3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It will take about 30 minutes to download the 4.7GB model, depending on your internet speed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67382tsrdf1o6h3jca5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67382tsrdf1o6h3jca5b.png" alt="Image description" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also install other LLMs by typing different commands. Once the download is finished, you can use Llama 3 locally just like using it online.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhhf7hcls02w929wa7fz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhhf7hcls02w929wa7fz.png" alt="Image description" width="800" height="652"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prompt Example: "Describe a day in the life of a Data Scientist."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x1i1a6l0fk1pe9z0ffs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3x1i1a6l0fk1pe9z0ffs.png" alt="Image description" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To show how fast it works, here's a GIF of Ollama generating Python code and explaining it.&lt;/p&gt;

&lt;p&gt;Prompt Example: "Write a Python code for building the digital clock."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F029kjox6tf19jfv4s62p.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F029kjox6tf19jfv4s62p.gif" alt="Image description" width="1170" height="644"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: If your laptop has an Nvidia GPU and CUDA installed, Ollama will use the GPU instead of the CPU, making it 10 times faster.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can exit the chat by typing &lt;code&gt;/bye&lt;/code&gt; and start again by typing &lt;code&gt;ollama run llama3&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open-source tools and models have made AI and LLMs accessible to everyone. Instead of being controlled by a few companies, tools like Ollama let anyone with a laptop use AI.&lt;/p&gt;

&lt;p&gt;Using LLMs locally gives you privacy, security, and control over responses. Plus, you don't have to pay for a service. You can even create your own AI coding assistant and use it in VSCode.&lt;/p&gt;

</description>
      <category>llama</category>
      <category>llm</category>
      <category>programming</category>
      <category>ai</category>
    </item>
    <item>
      <title>Google DeepMind Just Announced Gemini 1.5 Pro</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Thu, 15 Feb 2024 19:21:55 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/google-deepmind-just-announced-gemini-15-pro-4k9</link>
      <guid>https://forem.com/marko_vidrih/google-deepmind-just-announced-gemini-15-pro-4k9</guid>
      <description>&lt;p&gt;Google DeepMind has just pulled the curtains back on its latest marvel, Gemini 1.5 Pro, and while we can’t get our hands on it just yet (insert sad face here), the peek into its capabilities is nothing short of astonishing. Here’s a rundown of what makes Gemini 1.5 Pro a beacon of future AI technologies.&lt;/p&gt;

&lt;p&gt;The Essence of Gemini 1.5 Pro&lt;br&gt;
At its core, Gemini 1.5 Pro is a Mixture of Experts (MoE) model, drawing parallels to the likes of Mixtral, and is believed to be a distilled version of their Ultra 1.0 model. This refinement has allowed for a dramatic reduction in training costs, making it a more efficient yet powerful tool.&lt;/p&gt;

&lt;p&gt;Breaking Boundaries with Multimodal Context Length&lt;br&gt;
One of the standout features of Gemini 1.5 Pro is its “1M” token multimodal context length. This essentially means that the model can process and understand content from entire books, comprehensive codebases, and even movies, all at once. While proprietary LLM providers previously capped at 200k tokens, Gemini 1.5 Pro shatters this limit, although it’s worth noting that open-source models have ventured into this territory before.&lt;/p&gt;

&lt;p&gt;Needle in a Haystack: Synthetic Testing&lt;br&gt;
DeepMind has showcased the model’s prowess through synthetic tests, challenging it to locate and comprehend small bits of information hidden within massive datasets. Impressively, Gemini 1.5 Pro can handle this task across multiple modalities, including audio, video, and text, showcasing a significant advancement in AI’s search and retrieval capabilities.&lt;/p&gt;

&lt;p&gt;Real-World Applications and Demonstrations&lt;br&gt;
Although the current speeds of Gemini 1.5 Pro make it less practical for immediate use, taking about a minute to process queries, the potential applications are groundbreaking. For instance, the model can sift through a 45-minute video, processing one frame per second, to accurately describe and locate specific moments — a testament to its detailed understanding and analysis capabilities.&lt;/p&gt;

&lt;p&gt;Moreover, the ability to perform multimodal queries, such as interpreting abstract drawings and providing context-specific information, hints at a revolution in how we approach search and information retrieval.&lt;/p&gt;

&lt;p&gt;Bridging Language Gaps with Kalamang Translation&lt;br&gt;
One particularly fascinating application is the model’s ability to translate languages with minimal online presence, like Kalamang — a language spoken by fewer than 200 people. By inputting a single book and a bilingual wordlist into Gemini 1.5 Pro, the model demonstrates an incredible capacity to learn and translate between English and Kalamang, showcasing the potential for AI to preserve and revitalize endangered languages.&lt;/p&gt;

&lt;p&gt;Demos and Resources&lt;br&gt;
DeepMind has provided a glimpse into the future with several demos and resources, illustrating Gemini 1.5 Pro’s capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=SSnsmqIj1MI"&gt;Solving problems across 100,633 lines of code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=wa0MT8OwHuk"&gt;Multimodal interaction with a 44-minute movie&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=LHKL_210CcU&amp;amp;t=1s"&gt;Reasoning through a 402-page transcript&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While it’s wise to approach these early showcases with cautious optimism, Gemini 1.5 Pro undeniably hints at a bright and transformative future for artificial intelligence. Its ability to process, understand, and interact with vast amounts of multimodal data marks a significant leap forward in the quest to create more intelligent, versatile, and efficient AI systems.&lt;/p&gt;

&lt;p&gt;For more information and to dive deeper into the specifics of Gemini 1.5 Pro, check out the provided &lt;a href="https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#gemini-15"&gt;blog post&lt;/a&gt; and &lt;a href="https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf"&gt;technical report&lt;/a&gt;. The journey into the next frontier of AI is just beginning, and Gemini 1.5 Pro is leading the charge.&lt;/p&gt;

</description>
      <category>google</category>
      <category>deepmind</category>
      <category>largelanguagemodel</category>
      <category>llm</category>
    </item>
    <item>
      <title>😱 Andrej Karpathy departs OpenAI</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Thu, 15 Feb 2024 11:48:48 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/andrej-karpathy-departs-openai-538l</link>
      <guid>https://forem.com/marko_vidrih/andrej-karpathy-departs-openai-538l</guid>
      <description>&lt;p&gt;Renowned AI researcher and founding OpenAI member Andrej Karpathy just &lt;a href="https://twitter.com/karpathy/status/1757600075281547344"&gt;&lt;strong&gt;announced&lt;/strong&gt;&lt;/a&gt; he is departing the company for a second time, posting on X that he plans to pursue personal projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9eaazqeg8rs21u9tjx9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9eaazqeg8rs21u9tjx9.png" alt="Image description" width="732" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Karpathy helped found OpenAI in 2016 before serving as Tesla's Senior Director of AI for five years, rejoining Sam Altman and co. in 2023.&lt;/li&gt;
&lt;li&gt;The departure follows OpenAI's drama with Altman and the board, which has since relegated fellow co-founder Ilya Sutskever to an unclear role in the company.&lt;/li&gt;
&lt;li&gt;Karpathy emphasized that no drama or event led to the departure and that he was simply pursuing personal projects (including his famous AI lectures on YouTube).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;**Why it matters: **Between the mystery surrounding Ilya Sutskever and now Karpathy's departure, something seems fishy between the science and business arms of OpenAI. All eyes now turn to where Karpathy heads next - with likely no shortage of suitors for one of the world's top AI minds.&lt;/p&gt;

</description>
      <category>openai</category>
    </item>
    <item>
      <title>Running LLM Agents With LangChain</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Thu, 25 Jan 2024 11:56:36 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/running-llm-agents-with-langchain-ie2</link>
      <guid>https://forem.com/marko_vidrih/running-llm-agents-with-langchain-ie2</guid>
      <description>&lt;p&gt;Trained in causal language modeling, Large Language Models (LLMs) are adept at a broad spectrum of tasks, yet they often falter in fundamental areas such as logic, calculations, and searches. A particularly challenging situation arises when they inadequately perform in specific fields, like mathematics, but continue to autonomously manage all related computations.&lt;/p&gt;

&lt;p&gt;To address this shortcoming, one effective strategy involves embedding the LLM within a framework that enables it to utilize tools. This type of framework is known as an LLM agent.&lt;/p&gt;

&lt;p&gt;Let's delve into the mechanics of ReAct agents. I'll try to demonstrate how to construct these agents using the newly incorporated ChatHuggingFace class in LangChain. Concluding my exploration, I compare several open-source LLMs with GPT-3.5 and GPT-4 in a benchmark analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, how would a complete Agent setup look like?
&lt;/h2&gt;

&lt;p&gt;To illustrate a complete Agent setup, let's delve into an example where an LLM agent is tasked with a specific question, requiring the integration of various tools and observations.&lt;/p&gt;

&lt;p&gt;First, we initialize the environment by presenting the LLM agent with the initial question and a suite of tools it can utilize. For instance, if the question is, "What is the weather forecast for Paris tomorrow?", the agent has access to a weather forecasting tool, among others.&lt;/p&gt;

&lt;p&gt;The agent then begins processing the question, contemplating the necessary steps to find the answer. It might think, "To answer this, I need the latest weather data for Paris."&lt;/p&gt;

&lt;p&gt;The agent's first action would likely be to call the weather forecasting tool. The call might look like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Action:&lt;br&gt;
{&lt;br&gt;
    "action": "get_weather",&lt;br&gt;
    "action_input": {&lt;br&gt;
        "location": "Paris",&lt;br&gt;
        "date": "tomorrow"&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Once the tool is called, it returns the weather forecast data, which is then appended to the agent's prompt as an observation. For example, the observation might be: {'forecast': 'Partly cloudy, 18°C'}.&lt;/p&gt;

&lt;p&gt;The agent now updates its internal state with this new information and re-evaluates the situation. The updated prompt, including the observation, is processed, and the agent determines if it has sufficient information to answer the original question. The LLM is engaged again with this enriched prompt. &lt;/p&gt;

&lt;p&gt;If the information is adequate, the agent then formulates a final answer, prefaced with 'Final Answer:', like so:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Final Answer: The weather in Paris tomorrow is expected to be partly cloudy with a temperature of 18°C.&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The challenges in such an Agent setup are multifaceted:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Selection:&lt;/strong&gt; The agent must accurately determine which tool or tools are necessary to solve the given problem, avoiding irrelevant or unhelpful tool calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Argument Formatting:&lt;/strong&gt; When calling tools, the agent must format its requests correctly. This includes using the right tool names, providing the necessary argument values (not names), and adhering to any specific syntax or format required by the tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Information Integration:&lt;/strong&gt; The agent must effectively incorporate the observations from previous tool uses, along with the initial context, to build towards the final answer. This requires a nuanced understanding of how each piece of information contributes to the overall task.&lt;/p&gt;

&lt;p&gt;In essence, a complete Agent setup involves a harmonious interplay between asking the right questions, calling the appropriate tools with precision, and synthesizing all gathered information to reach a coherent and accurate conclusion. This setup, while complex, opens up vast possibilities for LLM agents in solving diverse and intricate problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing Agents with LangChain
&lt;/h2&gt;

&lt;p&gt;Recently integrated the ChatHuggingFace wrapper into &lt;a href="https://www.langchain.com/"&gt;LangChain&lt;/a&gt;, enabling the creation of agents using open-source models.&lt;/p&gt;

&lt;p&gt;The process to set up the ChatModel and equip it with tools is straightforward, as detailed in the &lt;a href="https://python.langchain.com/docs/integrations/chat/huggingface"&gt;Langchain documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;`from langchain_community.llms import HuggingFaceHub&lt;br&gt;
from langchain_community.chat_models.huggingface import ChatHuggingFace&lt;/p&gt;

&lt;p&gt;llm = HuggingFaceHub(&lt;br&gt;
    repo_id="HuggingFaceH4/zephyr-7b-beta",&lt;br&gt;
    task="text-generation",&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;chat_model = ChatHuggingFace(llm=llm)&lt;br&gt;
`&lt;br&gt;
Transforming the chat_model into an agent involves providing a ReAct style prompt and relevant tools:&lt;/p&gt;

&lt;p&gt;`from langchain import hub&lt;br&gt;
from langchain.agents import AgentExecutor, load_tools&lt;br&gt;
from langchain.agents.format_scratchpad import format_log_to_str&lt;br&gt;
from langchain.agents.output_parsers import (&lt;br&gt;
    ReActJsonSingleInputOutputParser,&lt;br&gt;
)&lt;br&gt;
from langchain.tools.render import render_text_description&lt;br&gt;
from langchain_community.utilities import SerpAPIWrapper&lt;/p&gt;

&lt;h1&gt;
  
  
  Initialize tools
&lt;/h1&gt;

&lt;p&gt;tools = load_tools(["serpapi", "llm-math"], llm=llm)&lt;/p&gt;

&lt;h1&gt;
  
  
  Set up ReAct style prompt
&lt;/h1&gt;

&lt;p&gt;prompt = hub.pull("hwchase17/react-json")&lt;br&gt;
prompt = prompt.partial(&lt;br&gt;
    tools=render_text_description(tools),&lt;br&gt;
    tool_names=", ".join([t.name for t in tools]),&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  Configure the agent
&lt;/h1&gt;

&lt;p&gt;chat_model_with_stop = chat_model.bind(stop=["\nObservation"])&lt;br&gt;
agent = (&lt;br&gt;
    {&lt;br&gt;
        "input": lambda x: x["input"],&lt;br&gt;
        "agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),&lt;br&gt;
    }&lt;br&gt;
    | prompt&lt;br&gt;
    | chat_model_with_stop&lt;br&gt;
    | ReActJsonSingleInputOutputParser()&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  Create AgentExecutor
&lt;/h1&gt;

&lt;p&gt;agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)&lt;/p&gt;

&lt;p&gt;agent_executor.invoke(&lt;br&gt;
    {&lt;br&gt;
        "input": "Who is the current holder of the speed skating world record on 500 meters? What is her current age raised to the 0.43 power?"&lt;br&gt;
    }&lt;br&gt;
)&lt;br&gt;
`&lt;br&gt;
The agent processes the input, performing necessary searches and calculations:&lt;/p&gt;

&lt;p&gt;Thought: Identify the age of the current speedskating world record holder using the search tool.&lt;br&gt;
Action:&lt;/p&gt;

&lt;p&gt;`{&lt;br&gt;
    "action": "search",&lt;br&gt;
    "action_input": "speed skating world record holder 500m age"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Observation: ...&lt;br&gt;
`&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents Showdown: Evaluating Open-Source LLMs as Reasoning Agents
&lt;/h2&gt;

&lt;p&gt;Evaluation Methodology&lt;br&gt;
We assess the performance of open-source LLMs as general-purpose reasoning agents by testing their logic and basic tool use (calculator and internet search). Our evaluation dataset merges samples from three sources:&lt;/p&gt;

&lt;p&gt;HotpotQA for internet search capability: originally a retrieval dataset, it serves here for general question answering with internet access. Some questions require aggregating information from multiple sources, meaning several internet search steps in our context.&lt;br&gt;
GSM8K for calculator usage: testing grade-school math skills solvable by basic arithmetic operations.&lt;br&gt;
GAIA for diverse tool requirements: from this challenging General AI Assistants benchmark, we selected questions solvable with just search and calculator tools.&lt;br&gt;
GPT-4, serving as a judge, evaluates these using a Prometheus prompt format, rating on a 5-point Likert Scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models in the Test&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We evaluate several leading open-source models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/meta-llama/Llama-2-70b-chat-hf"&gt;Llama2-70b-chat&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1"&gt;Mixtral-8x7B-Instruct-v0.1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//OpenHermes-2.5-Mistral-7B"&gt;OpenHermes-2.5-Mistral-7B&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/HuggingFaceH4/zephyr-7b-beta"&gt;Zephyr-7b-beta&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//SOLAR-10.7B-Instruct-v1.0"&gt;SOLAR-10.7B-Instruct-v1.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These models are tested in LangChain's ReAct framework, prompting them to structure their function calls as follows:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{&lt;br&gt;
  "action": $TOOL_NAME,&lt;br&gt;
  "action_input": $INPUT&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
For perspective, GPT-3.5 and GPT-4 are also evaluated using LangChain's OpenAI-specific agent, optimized for their function-calling format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results and Insights
&lt;/h2&gt;

&lt;p&gt;The open-source models, not specifically tuned for the output format, faced a minor disadvantage compared to OpenAI models.&lt;/p&gt;

&lt;p&gt;Nevertheless, some models showed impressive results. For instance, Mixtral-8x7B outperformed GPT-3.5, especially noteworthy considering it wasn't fine-tuned for agent workflows. Challenges included improper formatting of tool calls in some instances.&lt;/p&gt;

&lt;p&gt;Here is a benchmark of the models on evaluation dataset (the average scores originally on a scale of 1-5 have been converted to a scale of 0-100% for readability):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3qtktcsnw2czrrgcxo5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3qtktcsnw2czrrgcxo5.png" alt="Image description" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I encourage open-source developers to fine-tune Mixtral for agent tasks, aiming to surpass GPT-4.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concluding Observations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The GAIA benchmark, despite being a subset test with limited tools, appears to be a strong indicator of model performance in agent workflows.&lt;/li&gt;
&lt;li&gt;Agent workflows enhance LLM performance. For instance, GPT-4's performance on GSM8K improved from 92% (5-shot CoT) to 95% with the addition of a calculator. Similarly, Mixtral-8x7B jumped from 57.6% (5-shot) to 73% in zero-shot with the same enhancement.
This analysis suggests that fine-tuning and tool integration are key for advancing LLM capabilities in agent frameworks.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Meta Is Training LLaMA 3; Mark Zuckerberg Just Shared an Update</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Thu, 18 Jan 2024 20:07:16 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/meta-is-training-llama-3-mark-zuckerberg-just-shared-an-update-3b3c</link>
      <guid>https://forem.com/marko_vidrih/meta-is-training-llama-3-mark-zuckerberg-just-shared-an-update-3b3c</guid>
      <description>&lt;p&gt;In a groundbreaking &lt;a href="https://twitter.com/nifty0x/status/1748070924681720095"&gt;announcement&lt;/a&gt; that could shape the future of artificial intelligence, Meta's CEO Mark Zuckerberg has unveiled the latest advancements in their AI technology with Llama 3. This new AI system promises to revolutionize various aspects of technology and daily life, marking a significant stride in AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Vision for General Intelligence and More
&lt;/h2&gt;

&lt;p&gt;Llama 3, as explained by Zuckerberg, is not just another step in AI but a giant leap towards achieving Full General Intelligence. This means developing AI capabilities that replicate and exceed human cognitive abilities in various domains. The focus areas include AI for personal assistants, tools for creators, business solutions, advanced reasoning, strategic planning, innovative coding techniques, and enhanced memory functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Democratizing AI Through Open Sourcing
&lt;/h2&gt;

&lt;p&gt;In a move to make AI technology more accessible, Meta plans to open source Llama 3. This strategy aims to democratize AI, making it a tool for the masses rather than a privilege for the few. By doing so, Meta is set to empower developers, researchers, and enthusiasts around the globe, enabling them to harness the power of advanced AI in their respective fields.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure to Power AI's Future
&lt;/h2&gt;

&lt;p&gt;To back these ambitious plans, Meta is massively scaling up its infrastructure. The company plans to deploy 350,000 H100s by the end of this year, with the total computing power almost reaching 600,000 H100 equivalents when combined with other GPUs ($15–20B worth of compute). This massive infrastructure is designed to support the heavy computational demands of advanced AI models like Llama 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redefining Human-AI Interaction with New Devices
&lt;/h2&gt;

&lt;p&gt;Zuckerberg envisions a future where interaction with AI is an integral part of our daily life. To facilitate this, he suggests that new types of devices, particularly AI-integrated glasses, will become commonplace. These devices are expected to offer seamless, intuitive communication with AI, changing the way we interact with technology.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Road Ahead: Responsible AI Training and More Models
&lt;/h2&gt;

&lt;p&gt;Looking ahead, Meta is committed to responsibly training more AI models. This approach emphasizes the importance of ethical considerations in AI development, ensuring that these advanced systems are not only powerful but also safe and beneficial for society.&lt;/p&gt;

&lt;p&gt;With Llama 3 and its subsequent developments, Meta is not just envisioning the future of AI - it's actively building it. As Zuckerberg puts it, "We are just getting started." This announcement is a window into a future where AI is deeply integrated into every aspect of our lives, offering unprecedented opportunities for innovation, growth, and human advancement.&lt;/p&gt;

</description>
      <category>llama</category>
      <category>metaai</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The Best Open-Source 7B LLM</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Mon, 15 Jan 2024 09:08:18 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/the-best-open-source-7b-llm-jcj</link>
      <guid>https://forem.com/marko_vidrih/the-best-open-source-7b-llm-jcj</guid>
      <description>&lt;p&gt;OpenChat just released the world's best &lt;a href="https://github.com/imoneoi/openchat?tab=readme-ov-file"&gt;open-source 7B LLM&lt;/a&gt;, surpassing Grok0, ChatGPT (March), and Grok1.&lt;/p&gt;

&lt;p&gt;OpenChat is a library of open-source language models fine-tuned with C-RLFT.&lt;/p&gt;

&lt;p&gt;C-RLFT, or Conditioned-Reinforcement Learning Fine-Tuning, works by categorizing different data sources as separate reward labels. It's essentially a fine-tuning process for language models using mixed-quality data. &lt;/p&gt;

&lt;p&gt;Instead of treating all training data equally or needing high-quality preference data, C-RLFT assigns a class or condition to each data source, leveraging these as indicators of data quality.&lt;/p&gt;

&lt;p&gt;The model is accessible on platforms like HuggingFace, GitHub, and through a live demo. Detailed instructions for independent deployment, including setup for an accelerated vLLM backend and API key authentication, are available on GitHub. The model is also available on consumer GPUs, like the RTX 3090.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>llm</category>
    </item>
    <item>
      <title>Build Your Own Embedding Models Using LLMs</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Wed, 10 Jan 2024 09:58:31 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/build-your-own-embedding-models-using-llms-48ak</link>
      <guid>https://forem.com/marko_vidrih/build-your-own-embedding-models-using-llms-48ak</guid>
      <description>&lt;p&gt;In our ongoing exploration of the latest AI advancements, this article focuses on the vital role of embeddings in deep learning, particularly when employing large language models (LLMs). The quality of embeddings directly affects the performance of the models in different applications.&lt;/p&gt;

&lt;p&gt;Creating bespoke embedding models for specific applications is ideal. Nonetheless, developing these models is fraught with challenges. Therefore, developers often resort to pre-existing, broadly-applicable embedding models.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://arxiv.org/abs/2401.00368"&gt;novel approach&lt;/a&gt; by Microsoft researchers offers a promising solution. It simplifies and reduces the costs of developing customized embedding models. Leveraging open-source LLMs in place of traditional BERT-like encoders, this method streamlines retraining. It also employs Microsoft's own LLMs to autonomously produce labeled training data, paving the way for innovative LLM applications and enabling entities to develop tailored LLMs for their specific needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Complexities of Embedding Model Development 
&lt;/h2&gt;

&lt;p&gt;Embedding models are crutial in translating input data into numerical representations that encapsulate key attributes. Word embeddings, for instance, encapsulate the semantic essence of words, while sentence embeddings delineate the interplay of words within a sentence. Similarly, image embeddings reflect the visual attributes of their subjects. These embeddings are instrumental in tasks like comparing the likeness of words, sentences, or texts.&lt;/p&gt;

&lt;p&gt;One significant application of embeddings is in &lt;a href="https://vidrihmarko.medium.com/understanding-retrieval-augmented-generation-rag-is-this-new-era-for-prompt-engineering-46870483e441"&gt;retrieval augmented generation (RAG)&lt;/a&gt; with LLMs. Here, embeddings assist in identifying and retrieving documents relevant to a given prompt. The LLM then integrates the content of these documents into its response, enhancing accuracy and reducing reliance on information outside its training dataset.&lt;/p&gt;

&lt;p&gt;The efficacy of RAG hinges heavily on the embedding model's quality. Ineffective embeddings may not accurately match documents to user prompts, hindering the retrieval of pertinent documents.&lt;/p&gt;

&lt;p&gt;Customizing embedding models with specific data is one approach to enhance their relevance for particular applications. However, the prevalent method involves a complex, multi-stage training process, initially using large-scale, weakly-supervised text pairs for contrastive learning, followed by fine-tuning with a smaller, high-quality, and meticulously labeled dataset.&lt;/p&gt;

&lt;p&gt;This method demands significant effort to curate relevant text pairs and often relies on manually compiled datasets that are limited in scope and linguistic variety. Hence, many developers stick with generic embedding models, which may not fully meet their application needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Revolutionizing Embedding Models with LLMs 
&lt;/h2&gt;

&lt;p&gt;Microsoft's innovative technique diverges from the standard two-stage process, instead proposing a single-stage training approach using proprietary LLMs like GPT-4. This method starts with GPT-4 generating a range of potential embedding tasks. These tasks are then used to prompt the model to create training examples.&lt;/p&gt;

&lt;p&gt;For instance, the initial stage provided a list of abstract task descriptions, such as locating legal case law relevant to a specific argument or finding recipes based on given ingredients.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CMo0wiyi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/62boxk5dqvs6zmfgqhdf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CMo0wiyi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/62boxk5dqvs6zmfgqhdf.png" alt="Prompt for generating high-level retrieval tasks (source: arxiv)" width="768" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next step involved submitting one of these tasks to GPT-4, which then generated a JSON structure containing a specific user prompt and corresponding positive and negative examples, each about 150 words. The results were impressively accurate, save for a minor discrepancy in the hard negative example, which could potentially skew the embeddings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---FCobF74--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hi4hbqgonj8kznshgj72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---FCobF74--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hi4hbqgonj8kznshgj72.png" alt="Prompt for generating examples for a retrieval task (source: arxiv)" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Despite the researchers not releasing their source code or data, this &lt;a href="https://github.com/bendee983/bdtechtalks/blob/main/Generate_examples_for_embedding_training.ipynb"&gt;Python notebook&lt;/a&gt; offers a glimpse into this streamlined process, highlighting its adaptability and potential for customization.&lt;/p&gt;

&lt;p&gt;To broaden the dataset's diversity, the team designed various prompt templates and synthesized them, generating over 500,000 examples with 150,000 unique instructions using GPT-3.5 and GPT-4 through Azure OpenAI Service. The total token usage was around 180 million, costing approximately $5,000.&lt;/p&gt;

&lt;p&gt;Interestingly, the training employed an open-source auto-regressive model rather than a bidirectional encoder like BERT, which is typical. The rationale is that these models, already pre-trained on vast datasets, can be fine-tuned for embedding tasks at minimal costs.&lt;/p&gt;

&lt;p&gt;They validated their method on Mistral-7B using synthetic data and 13 public datasets. Through techniques like LoRA, they reduced training expenses and achieved state-of-the-art results on renowned benchmark datasets, even surpassing OpenAI's Ada-002 and Cohere's models in RAG and embedding quality assessments.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLMs and Future of Embeddings 
&lt;/h2&gt;

&lt;p&gt;The study underscores that extensive auto-regressive pre-training allows LLMs to develop robust text representations, making only minor fine-tuning necessary to convert them into efficient embedding models.&lt;/p&gt;

&lt;p&gt;The findings also indicate the feasibility of using LLMs to generate apt training data for fine-tuning embedding models cost-effectively. This has significant implications for future LLM applications, enabling organizations to develop custom embeddings for their specific needs.&lt;/p&gt;

&lt;p&gt;The researchers suggest that generative language modeling and text embeddings are intrinsically linked, both requiring deep language comprehension by the model. They propose that a robust LLM should be capable of autonomously generating training data for an embedding task and then be fine-tuned with minimal effort. While their experiments offer promising insights, further research is needed to fully exploit this potential.&lt;/p&gt;




&lt;p&gt;Follow me on social media:&lt;br&gt;
&lt;a href="https://twitter.com/nifty0x"&gt;https://twitter.com/nifty0x&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/marko-vidrih/"&gt;https://www.linkedin.com/in/marko-vidrih/&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Mixtral-8x7b Simplified</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Mon, 08 Jan 2024 09:07:47 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/mixtral-8x7b-simplified-1j4j</link>
      <guid>https://forem.com/marko_vidrih/mixtral-8x7b-simplified-1j4j</guid>
      <description>&lt;p&gt;MistralAI's &lt;a href="https://mistral.ai/news/mixtral-of-experts/"&gt;Mixtral-8x7b&lt;/a&gt; stands out in the crowd, trailing just behind the giants like OpenAI and Anthropic. What's even more exciting is that it's an &lt;a href="https://github.com/mistralai/mistral-src/tree/moe"&gt;open-source project&lt;/a&gt;! My focus today is to break down its architecture using Neural Circuit Diagrams, offering you a peek into the world of cutting-edge transformers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z1IAoxwZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sxig6q87gxxjrvm2rd56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z1IAoxwZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sxig6q87gxxjrvm2rd56.png" alt="Chatbot Arena" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Simplicity in Design, Complexity in Performance 
&lt;/h2&gt;

&lt;p&gt;At its core, Mixtral-8x7b is a decoder-only transformer. It begins with tokenized inputs, morphing them into vectors through a series of decoder layers, culminating in the prediction of word probabilities. Despite its seemingly straightforward structure, the model excels in text infill and prediction, making it a formidable player in the AI arena.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LWkkDq7b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fspj21erhlpxrszydsd6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LWkkDq7b--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fspj21erhlpxrszydsd6.png" alt="The overall model converts tokens to vectors, processes them, and converts them back to word probabilities. Credit: Vincent Abbott" width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Decoding the Decoder 
&lt;/h2&gt;

&lt;p&gt;Each decoder layer in Mixtral is a symphony of two major components: &lt;br&gt;
(i) an attention mechanism and &lt;br&gt;
(ii) a multi-layer perceptron. &lt;/p&gt;

&lt;p&gt;The attention mechanism is focused on context, pulling in relevant information to make sense of the data. The multi-layer perceptron, on the other hand, dives deep into individual word vectors. Together, wrapped in residual connections for deeper training, they uncover intricate patterns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YN-KoGRw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/72s3ekw3xesnnn1be2do.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YN-KoGRw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/72s3ekw3xesnnn1be2do.png" alt="The decoder layers are akin to the original transformer's, but exclusively use self-attention. Credit: Vincent AbbottThe" width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Evolution of Attention 
&lt;/h2&gt;

&lt;p&gt;Mixtral doesn't stray far from the original transformer's attention mechanism, but with a twist. A notable mention is &lt;a href="https://hazyresearch.stanford.edu/blog/2023-01-12-flashattention-long-sequences"&gt;FlashAttention&lt;/a&gt; by Hazy Research, which zips through computations by optimizing attention for GPU kernels. My journey with Neural Circuit Diagrams has been instrumental in understanding these advancements, particularly in algorithm acceleration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dSCz9ndM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mynxcaagvbwelpehh4e1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dSCz9ndM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mynxcaagvbwelpehh4e1.png" alt="Attention mechanisms have gradually evolved since popularized by 2017's Attention is All You Need." width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sparse Mixture of Experts
&lt;/h2&gt;

&lt;p&gt;The real showstopper for Mixtral is its Sparse &lt;a href="https://medium.com/gopenai/mixture-of-experts-moe-in-ai-models-explained-2163335eaf85"&gt;Mixture of Experts &lt;/a&gt;(SMoE). Traditional MLP layers are resource-hungry, but SMoEs change the game by selectively activating the most relevant layers for each input. This not only cuts down computational costs but also allows for learning more complex patterns efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZDcmEaFC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wtbsmyh1lgidovvybczi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZDcmEaFC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wtbsmyh1lgidovvybczi.png" alt="A gating mechanism decides which layers to execute, leading to a computationally efficient algorithm. Credit: Vincent Abbott" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Concluding Thoughts
&lt;/h2&gt;

&lt;p&gt;A Milestone for Open-Source AI In essence, Mixtral is a testament to the power and potential of open-source AI. By simplifying the original transformer and incorporating gradual innovations in attention mechanisms and SMoEs, it has set a new benchmark for machine learning development. It's a prime example of how open-source initiatives and innovative architectures like SMoEs are pushing the boundaries forward.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aL8j2A6H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5tkv8ybbiepsi2jaxf5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aL8j2A6H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5tkv8ybbiepsi2jaxf5s.png" alt="The overall attention architecture, expressed using Neural Circuit Diagrams. Credit: Vincent Abbott" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, that's a wrap on the Mixtral-8x7b! Whether you're a budding AI enthusiast or a seasoned pro, there's no denying that Mixtral's approach to architecture and design is a fascinating stride in the journey of machine learning. Stay tuned for more exciting developments in this space!&lt;/p&gt;




&lt;p&gt;Follow me on social media&lt;br&gt;
&lt;a href="https://twitter.com/nifty0x"&gt;https://twitter.com/nifty0x&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/marko-vidrih/"&gt;https://www.linkedin.com/in/marko-vidrih/&lt;/a&gt;&lt;br&gt;
Project I'm currently working on&lt;br&gt;
&lt;a href="https://creatus.ai/"&gt;https://creatus.ai/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Helping Gen Alpha Manage Their Millions</title>
      <dc:creator>Marko Vidrih</dc:creator>
      <pubDate>Wed, 03 Jan 2024 10:29:33 +0000</pubDate>
      <link>https://forem.com/marko_vidrih/helping-gen-alpha-manage-their-millions-k5n</link>
      <guid>https://forem.com/marko_vidrih/helping-gen-alpha-manage-their-millions-k5n</guid>
      <description>&lt;p&gt;What was I doing at 12 years old?&lt;/p&gt;

&lt;p&gt;A lot of random things, but definitely not making tech money.&lt;/p&gt;

&lt;p&gt;But 78% of Generation Alpha (born 2010–2025) already made bank in the past year, half of whom did it with technology. Some even raked in millions. &lt;em&gt;Gasp in Millennial&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Watch out for these young hustlers — from food to fashion to learning, any business that captures Gen Alpha’s mindshare will see huge growth momentum in the years to come.&lt;/p&gt;

&lt;p&gt;The space I see the biggest opportunity in: &lt;strong&gt;Financial education&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Money-making Gen Alphas might still be too young to manage their wallets responsibly, but financial literacy is a top priority for them (and their Millennial parents).&lt;/p&gt;

&lt;p&gt;Banks and startups are already jumping on the trend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Greenlight, a fintech unicorn that offers debit cards to kids, reached $100m ARR as of 2021;&lt;/li&gt;
&lt;li&gt;Capital One’s teen checking account, MONEY, received raving reviews;&lt;/li&gt;
&lt;li&gt;GoHenry, a UK-based banking app for children, was acquired by Acorns last year.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These products are pretty uniform, bundling a kid’s debit card and a banking app with educational resources. So there’s space to build more differentiated services and experiences. A few ideas:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Niche down demographically
&lt;/h2&gt;

&lt;p&gt;There’ll be 2.2B of Gen Alphas by 2025, which means you’ll find no shortage of subgroups who need specialized financial education:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kids on the spectrum:&lt;/strong&gt; Optimize the learning experience for different needs, like ADHD, autism, or dyslexia. The neurodivergent population is 5x more likely to become entrepreneurs, so start’em early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multicultural kids:&lt;/strong&gt; Varying money attitudes based on race and culture can impact children’s financial future. With Gen Alpha being the most racially diverse, consider building specialized education for them and their multicultural families.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Cause-based investment
&lt;/h2&gt;

&lt;p&gt;Gen Alphas are known to be vocal about social and environmental issues. So combine financial ed with causes they care about.&lt;/p&gt;

&lt;p&gt;Changebowl, an Acorns-style investment app, rounds up your spare change and donates to nonprofits of your choosing.&lt;/p&gt;

&lt;p&gt;The company website is no longer active, so you can take that idea and revamp it to teach youngsters about ETFs, charity, and impact investing.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Financial mentors for creators
&lt;/h2&gt;

&lt;p&gt;The biggest Gen Alpha earners made their fortune on social media, or gaming platforms like Roblox. The top 10 Roblox creators took home an average of $23m last year (seriously).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_egqKN7M--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ee3suzlo9d5xljifo4g5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_egqKN7M--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ee3suzlo9d5xljifo4g5.jpg" alt="_Nearly half of all Roblox users belong to Gen Alpha. Source: Backlinko_" width="701" height="438"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Nearly half of all Roblox users belong to Gen Alpha. Source: Backlinko&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With ever-evolving platforms and tech, parents of young creators might not be equipped to offer financial lessons. They’ll need specialized mentoring.&lt;/p&gt;

&lt;p&gt;Build a marketplace that pairs Gen Alpha with vetted, financially savvy Gen Z creators, and offer peer-to-peer tutoring on how to save, spend and grow their wealth smartly.&lt;/p&gt;

&lt;p&gt;Or, design courses on the business side of being a young creator, like how to build a brand, negotiate deals, etc. Use AI to simplify complex subjects and make the experience fun and interactive.&lt;/p&gt;

</description>
      <category>career</category>
      <category>community</category>
      <category>startup</category>
      <category>softwaredevelopment</category>
    </item>
  </channel>
</rss>
